Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Stream due to https://github.com/iree-org/iree/pull/18907 #19559

Open
MaheshRavishankar opened this issue Dec 24, 2024 · 2 comments
Open

Comments

@MaheshRavishankar
Copy link
Contributor

I have been hitting unrelated issues with #18907 (which is meant to test LLVM PR llvm/llvm-project#113501 ). Has been tricky to track down the core issue cause the change is almost a NFC, and definitely is not a root cause of the errors being hit. Using this issue to track down the root cause. So far it seems to point to OptimizeIntArithmetic pass and some folder.

Repro instructions

Input IR

toy_llama_tp2.mlir.txt

Compile command

iree-compile  --iree-input-type=auto \
                --iree-vm-bytecode-module-output-format=flatbuffer-binary \
                --mlir-print-debuginfo \
                --mlir-print-op-on-diagnostic=false \
                --iree-hal-target-device=llvm-cpu[0] \
                --iree-hal-target-device=llvm-cpu[1] \
                --iree-llvmcpu-target-cpu=host \
                -o toy_llama_tp2.vmfb \

The IR dump after all is huge, but the difference seems to start from the optimize int arithmetic pass. Here is the IR dump before and after this pass for ToT and for #18907

ToT.optimize_int_arithmetic.mlir.txt
pr18907.optimize_int_arithmetic.mlir.txt

Specifically

WithToT

  %424 = stream.tensor.import on(#hal.device.affinity<@__device_0>) %arg3 : !hal.buffer_view -> tensor<?x12288xf16>{%91} in !stream.resource<external>{%365} loc("toy_llama_tp2.mlir":2822:12)
  %425 = stream.timepoint.await %65 => %424 : !stream.resource<external>{%365} loc("toy_llama_tp2.mlir":2822:12)
  %426 = stream.async.update on(#hal.device.affinity<@__device_0>) %364, %425[%c0 to %365] : !stream.resource<*>{%365} -> %425 as !stream.resource<external>{%365} loc("toy_llama_tp2.mlir":2822:12)
  %427 = stream.async.transfer %426 : !stream.resource<external>{%365} from(#hal.device.affinity<@__device_0>) -> to(#hal.device.affinity<@__device_0>) !stream.resource<*>{%365} loc("toy_llama_tp2.mlir":2822:12)
  %428 = stream.tensor.import on(#hal.device.affinity<@__device_1>) %arg4 : !hal.buffer_view -> tensor<?x12288xf16>{%91} in !stream.resource<external>{%368} loc("toy_llama_tp2.mlir":2825:12)
  %429 = stream.timepoint.await %85 => %428 : !stream.resource<external>{%368} loc("toy_llama_tp2.mlir":2825:12)
  %430 = stream.async.update on(#hal.device.affinity<@__device_1>) %367, %429[%c0 to %368] : !stream.resource<*>{%368} -> %429 as !stream.resource<external>{%368} loc("toy_llama_tp2.mlir":2825:12)
  %431 = stream.async.transfer %430 : !stream.resource<external>{%368} from(#hal.device.affinity<@__device_1>) -> to(#hal.device.affinity<@__device_1>) !stream.resource<*>{%368} loc("toy_llama_tp2.mlir":2825:12)```

With #18907

  %165 = arith.muli %91, %c6 : index loc("toy_llama_tp2.mlir":681:21)
  %365 = arith.divui %165, %c6 : index loc("toy_llama_tp2.mlir":2322:12)
  ....

 %425 = stream.tensor.sizeof on(#hal.device.affinity<@__device_0>) tensor<?x12288xf16>{%91} : index loc("toy_llama_tp2.mlir":2822:12)
  %426 = stream.tensor.import on(#hal.device.affinity<@__device_0>) %arg3 : !hal.buffer_view -> tensor<?x12288xf16>{%91} in !stream.resource<external>{%425} loc("toy_llama_tp2.mlir":2822:12)
  %427 = stream.timepoint.await %65 => %426 : !stream.resource<external>{%425} loc("toy_llama_tp2.mlir":2822:12)
  %428 = stream.async.update on(#hal.device.affinity<@__device_0>) %364, %427[%c0 to %366] : !stream.resource<*>{%366} -> %427 as !stream.resource<external>{%425} loc("toy_llama_tp2.mlir":2822:12)
  %429 = stream.async.slice on(#hal.device.affinity<@__device_0>) %428[%c0 to %366] : !stream.resource<external>{%425} -> !stream.resource<external>{%366} loc("toy_llama_tp2.mlir":2822:12)
  %430 = stream.async.transfer %429 : !stream.resource<external>{%366} from(#hal.device.affinity<@__device_0>) -> to(#hal.device.affinity<@__device_0>) !stream.resource<*>{%366} loc("toy_llama_tp2.mlir":2822:12)
  %431 = stream.tensor.sizeof on(#hal.device.affinity<@__device_1>) tensor<?x12288xf16>{%91} : index loc("toy_llama_tp2.mlir":2825:12)
  %432 = stream.tensor.import on(#hal.device.affinity<@__device_1>) %arg4 : !hal.buffer_view -> tensor<?x12288xf16>{%91} in !stream.resource<external>{%431} loc("toy_llama_tp2.mlir":2825:12)
  %433 = stream.timepoint.await %85 => %432 : !stream.resource<external>{%431} loc("toy_llama_tp2.mlir":2825:12)
  %434 = stream.async.update on(#hal.device.affinity<@__device_1>) %368, %433[%c0 to %369] : !stream.resource<*>{%369} -> %433 as !stream.resource<external>{%431} loc("toy_llama_tp2.mlir":2825:12)
  %435 = stream.async.slice on(#hal.device.affinity<@__device_1>) %434[%c0 to %369] : !stream.resource<external>{%431} -> !stream.resource<external>{%369} loc("toy_llama_tp2.mlir":2825:12)

There are some extra stream.async.slice in PR #18907 . The before IR is essentially equivalent, but might be missing this folder

  %165 = arith.muli %91, %c6 : index loc("toy_llama_tp2.mlir":681:21)
  %365 = arith.divui %165, %c6 : index loc("toy_llama_tp2.mlir":2322:12)

@MaheshRavishankar
Copy link
Contributor Author

cc @benvanik if you have cycles and can try this example out, would help if you can help give more insight as to whats going on here and how to fix this. (I have the full dump IR after all if you want to just take a look at that, but is too big to upload here)

@MaheshRavishankar
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant