Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

70B-Prefill-2048-input-token unshared: ABORTED; the semaphore was aborted; while invoking native function hal.fence.await; while calling import; #19569

Closed
pdhirajkumarprasad opened this issue Dec 30, 2024 · 2 comments
Assignees
Labels
bug 🐞 Something isn't working

Comments

@pdhirajkumarprasad
Copy link

What happened?

When running 70B prefill, with 2048 token on unsharded model, it's failing with following error:

iree/runtime/src/iree/hal/drivers/hip/event_semaphore.c:673: ABORTED; the semaphore was aborted; while invoking native function hal.fence.await; while calling import; 
[ 0] bytecode module.prefill_bs4:90 prefill_70b_unsharded.mlir:727:3
Abort (core dumped)

it's coming with iree-benchmark-module, iree-run-module works fine

command :

python3 -m sharktank.examples.export_paged_llm_v1 \
  --bs=4 \
  --irpa-file=/data/llama3.1/weights/70b/fp16/llama3.1_70b_f16.irpa \
  --output-mlir=prefill_70b_unsharded.mlir \
  --output-config=prefill_70b_unsharded.json \
  --skip-decode

iree-compile prefill_70b_unsharded.mlir \
  --iree-hip-target=gfx942 \
  -o=prefill_70b_unsharded.vmfb \
  --iree-hal-target-device=hip \
  --iree-dispatch-creation-enable-aggressive-fusion=true \
  --iree-global-opt-propagate-transposes=true \
  --iree-opt-aggressively-propagate-transposes=true \
  --iree-opt-data-tiling=false \
  --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \
  --iree-hal-indirect-command-buffers=true \
  --iree-stream-resource-memory-model=discrete \
  --iree-hip-legacy-sync=false \
  --iree-hal-memoization=true \
  --iree-opt-strip-assertions 


ROCR_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
iree-benchmark-module \
  --hip_use_streams=true \
  --module=prefill_70b_unsharded.vmfb \
  --parameters=model=/data/llama3.1/weights/70b/fp16/llama3.1_70b_f16.irpa \
  --device=hip://4 \
  --function=prefill_bs4 \
  --input=@/shark-dev/70b//prefill_args_bs4_2048_stride_32/tokens.npy \
  --input=@/shark-dev/70b//prefill_args_bs4_2048_stride_32/seq_lens.npy \
  --input=@/shark-dev/70b//prefill_args_bs4_2048_stride_32/seq_block_ids.npy \
  --input=@/shark-dev/70b//prefill_args_bs4_2048_stride_32/cs_f16.npy --benchmark_repetitions=8

Go to Shark MI300X machine and use above command.

Build : a43d893

Steps to reproduce your issue

No response

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

No response

@pdhirajkumarprasad pdhirajkumarprasad added the bug 🐞 Something isn't working label Dec 30, 2024
@AWoloszyn AWoloszyn self-assigned this Dec 31, 2024
@AWoloszyn
Copy link
Contributor

Looking at rocm-smi, there is another process running on --hip://4, which is taking up 23% of the available ram. With 70b running iree-run-module (or benchmark-module with -benchmark-repetitions=1) we hit 98% memory usage. A larger number of repetitions increases the memory usage enough to hit the limit.

Using a GPU that is not currently in-use lets it complete.

@pdhirajkumarprasad
Copy link
Author

The issue is not seen anymore so closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants