Error in Stream due to https://github.com/iree-org/iree/pull/18907 #19559

MaheshRavishankar · 2024-12-24T19:43:11Z

I have been hitting unrelated issues with #18907 (which is meant to test LLVM PR llvm/llvm-project#113501 ). Has been tricky to track down the core issue cause the change is almost a NFC, and definitely is not a root cause of the errors being hit. Using this issue to track down the root cause. So far it seems to point to OptimizeIntArithmetic pass and some folder.

Repro instructions

Input IR

toy_llama_tp2.mlir.txt

Compile command

iree-compile  --iree-input-type=auto \
                --iree-vm-bytecode-module-output-format=flatbuffer-binary \
                --mlir-print-debuginfo \
                --mlir-print-op-on-diagnostic=false \
                --iree-hal-target-device=llvm-cpu[0] \
                --iree-hal-target-device=llvm-cpu[1] \
                --iree-llvmcpu-target-cpu=host \
                -o toy_llama_tp2.vmfb \

The IR dump after all is huge, but the difference seems to start from the optimize int arithmetic pass. Here is the IR dump before and after this pass for ToT and for #18907

ToT.optimize_int_arithmetic.mlir.txt
pr18907.optimize_int_arithmetic.mlir.txt

Specifically

WithToT

  %424 = stream.tensor.import on(#hal.device.affinity<@__device_0>) %arg3 : !hal.buffer_view -> tensor<?x12288xf16>{%91} in !stream.resource<external>{%365} loc("toy_llama_tp2.mlir":2822:12)
  %425 = stream.timepoint.await %65 => %424 : !stream.resource<external>{%365} loc("toy_llama_tp2.mlir":2822:12)
  %426 = stream.async.update on(#hal.device.affinity<@__device_0>) %364, %425[%c0 to %365] : !stream.resource<*>{%365} -> %425 as !stream.resource<external>{%365} loc("toy_llama_tp2.mlir":2822:12)
  %427 = stream.async.transfer %426 : !stream.resource<external>{%365} from(#hal.device.affinity<@__device_0>) -> to(#hal.device.affinity<@__device_0>) !stream.resource<*>{%365} loc("toy_llama_tp2.mlir":2822:12)
  %428 = stream.tensor.import on(#hal.device.affinity<@__device_1>) %arg4 : !hal.buffer_view -> tensor<?x12288xf16>{%91} in !stream.resource<external>{%368} loc("toy_llama_tp2.mlir":2825:12)
  %429 = stream.timepoint.await %85 => %428 : !stream.resource<external>{%368} loc("toy_llama_tp2.mlir":2825:12)
  %430 = stream.async.update on(#hal.device.affinity<@__device_1>) %367, %429[%c0 to %368] : !stream.resource<*>{%368} -> %429 as !stream.resource<external>{%368} loc("toy_llama_tp2.mlir":2825:12)
  %431 = stream.async.transfer %430 : !stream.resource<external>{%368} from(#hal.device.affinity<@__device_1>) -> to(#hal.device.affinity<@__device_1>) !stream.resource<*>{%368} loc("toy_llama_tp2.mlir":2825:12)```

With #18907

  %165 = arith.muli %91, %c6 : index loc("toy_llama_tp2.mlir":681:21)
  %365 = arith.divui %165, %c6 : index loc("toy_llama_tp2.mlir":2322:12)
  ....

 %425 = stream.tensor.sizeof on(#hal.device.affinity<@__device_0>) tensor<?x12288xf16>{%91} : index loc("toy_llama_tp2.mlir":2822:12)
  %426 = stream.tensor.import on(#hal.device.affinity<@__device_0>) %arg3 : !hal.buffer_view -> tensor<?x12288xf16>{%91} in !stream.resource<external>{%425} loc("toy_llama_tp2.mlir":2822:12)
  %427 = stream.timepoint.await %65 => %426 : !stream.resource<external>{%425} loc("toy_llama_tp2.mlir":2822:12)
  %428 = stream.async.update on(#hal.device.affinity<@__device_0>) %364, %427[%c0 to %366] : !stream.resource<*>{%366} -> %427 as !stream.resource<external>{%425} loc("toy_llama_tp2.mlir":2822:12)
  %429 = stream.async.slice on(#hal.device.affinity<@__device_0>) %428[%c0 to %366] : !stream.resource<external>{%425} -> !stream.resource<external>{%366} loc("toy_llama_tp2.mlir":2822:12)
  %430 = stream.async.transfer %429 : !stream.resource<external>{%366} from(#hal.device.affinity<@__device_0>) -> to(#hal.device.affinity<@__device_0>) !stream.resource<*>{%366} loc("toy_llama_tp2.mlir":2822:12)
  %431 = stream.tensor.sizeof on(#hal.device.affinity<@__device_1>) tensor<?x12288xf16>{%91} : index loc("toy_llama_tp2.mlir":2825:12)
  %432 = stream.tensor.import on(#hal.device.affinity<@__device_1>) %arg4 : !hal.buffer_view -> tensor<?x12288xf16>{%91} in !stream.resource<external>{%431} loc("toy_llama_tp2.mlir":2825:12)
  %433 = stream.timepoint.await %85 => %432 : !stream.resource<external>{%431} loc("toy_llama_tp2.mlir":2825:12)
  %434 = stream.async.update on(#hal.device.affinity<@__device_1>) %368, %433[%c0 to %369] : !stream.resource<*>{%369} -> %433 as !stream.resource<external>{%431} loc("toy_llama_tp2.mlir":2825:12)
  %435 = stream.async.slice on(#hal.device.affinity<@__device_1>) %434[%c0 to %369] : !stream.resource<external>{%431} -> !stream.resource<external>{%369} loc("toy_llama_tp2.mlir":2825:12)

There are some extra stream.async.slice in PR #18907 . The before IR is essentially equivalent, but might be missing this folder

  %165 = arith.muli %91, %c6 : index loc("toy_llama_tp2.mlir":681:21)
  %365 = arith.divui %165, %c6 : index loc("toy_llama_tp2.mlir":2322:12)

The text was updated successfully, but these errors were encountered:

MaheshRavishankar · 2024-12-24T19:44:43Z

cc @benvanik if you have cycles and can try this example out, would help if you can help give more insight as to whats going on here and how to fix this. (I have the full dump IR after all if you want to just take a look at that, but is too big to upload here)

MaheshRavishankar · 2025-01-03T00:27:33Z

Testing with the folder seems to have fixed the issue

iree-org/llvm-project@2039e93...0b5c7e5#diff-b5a9833b8bcae3234467be62e01f5989da5ebe4d7e32ad268e4c190eca8c892a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Stream due to https://github.com/iree-org/iree/pull/18907 #19559

Error in Stream due to https://github.com/iree-org/iree/pull/18907 #19559

MaheshRavishankar commented Dec 24, 2024

MaheshRavishankar commented Dec 24, 2024

MaheshRavishankar commented Jan 3, 2025

Error in Stream due to https://github.com/iree-org/iree/pull/18907 #19559

Error in Stream due to https://github.com/iree-org/iree/pull/18907 #19559

Comments

MaheshRavishankar commented Dec 24, 2024

MaheshRavishankar commented Dec 24, 2024

MaheshRavishankar commented Jan 3, 2025