Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.1 8b decode fails during benchmarking #19533

Open
aviator19941 opened this issue Dec 19, 2024 · 1 comment
Open

Llama 3.1 8b decode fails during benchmarking #19533

aviator19941 opened this issue Dec 19, 2024 · 1 comment
Labels
bug 🐞 Something isn't working

Comments

@aviator19941
Copy link
Contributor

What happened?

Compiling the updated 8b f16 bs4 TP1 IR with 4c00a22 compiles, but fails to benchmark with this error:

Running ../iree-build-no-trace/tools/iree-benchmark-module
Run on (192 X 3810.79 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x96)
  L1 Instruction 32 KiB (x96)
  L2 Unified 1024 KiB (x96)
  L3 Unified 32768 KiB (x16)
Load Average: 9.97, 10.13, 12.66
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
:0:rocdevice.cpp            :2984: 2446864599827 us: [pid:1329572 tid:0x75d371000640] Callback: Queue 0x75d370500000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Aborted (core dumped)

Steps to reproduce your issue

  1. wget the 8b f16 bs4 IR

  2. Compile using 4c00a22:

../iree-build-no-trace/tools/iree-compile \
8b_f16_bs4_tp1_tokens_128_stride_32.mlir  \
--iree-hip-target=gfx942  \
-o=8b_f16_bs4_tp1_tokens_128_stride_32.vmfb \
--iree-hal-target-device=hip \
--iree-dispatch-creation-enable-aggressive-fusion=true  \
--iree-global-opt-propagate-transposes=true  \
--iree-opt-aggressively-propagate-transposes=true  \
--iree-opt-data-tiling=false   \
--iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))'   \
--iree-hal-indirect-command-buffers=true   \
--iree-stream-resource-memory-model=discrete   \
--iree-hip-legacy-sync=false   \
--iree-hal-memoization=true   \
--iree-opt-strip-assertions
  1. Benchmark decode:
ROCR_VISIBLE_DEVICES=0,1,2,3,4,5,6,7    \
../iree-build-no-trace/tools/iree-benchmark-module   \
--hip_use_streams=true   \
--module=8b_f16_bs4_tp1_tokens_128_stride_32.vmfb   \
--parameters=model=/data/llama3.1/weights/8b/fp16/llama3.1_8b_fp16.irpa   \
--device=hip://4   \
--function=decode_bs4   \
--input=@/data/llama3.1/weights/8b/decode_args_bs4_128_stride_32/next_tokens.npy   \
--input=@/data/llama3.1/weights/8b/decode_args_bs4_128_stride_32/seq_lens.npy   \
--input=@/data/llama3.1/weights/8b/decode_args_bs4_128_stride_32/start_positions.npy   \
--input=@/data/llama3.1/weights/8b/decode_args_bs4_128_stride_32/seq_block_ids.npy   \
--input=@/data/llama3.1/weights/8b/decode_args_bs4_128_stride_32/cs_f16.npy   \
--benchmark_repetitions=3
  1. See benchmarking error:
Running ../iree-build-no-trace/tools/iree-benchmark-module
Run on (192 X 3810.79 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x96)
  L1 Instruction 32 KiB (x96)
  L2 Unified 1024 KiB (x96)
  L3 Unified 32768 KiB (x16)
Load Average: 9.97, 10.13, 12.66
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
:0:rocdevice.cpp            :2984: 2446864599827 us: [pid:1329572 tid:0x75d371000640] Callback: Queue 0x75d370500000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Aborted (core dumped)

What component(s) does this issue relate to?

Runtime

Version information

4c00a22

Additional context

No response

@aviator19941 aviator19941 added the bug 🐞 Something isn't working label Dec 19, 2024
@aviator19941 aviator19941 changed the title Llama 3.1 8b decode fails to benchmark Llama 3.1 8b decode fails during benchmarking Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant