Add Tuning Support (Umbrella Issue) #16952

kuhar · 2024-04-02T22:14:08Z

This is an umbrella issue for implementing a tuning infrastructure. By tuning we mean a type of Profile Guided Optimization flow where we compile a program/model with extra instrumentation and use the runtime performance numbers to tweak the compilation parameters to achieve better performance. Concretely, this translates to benchmarking dispatches and using the results to apply different #ireee_codegen.compilation_info attributes to root ops, which includes lowering config (with tile sizes), translation info (with the codegen pipeline, workgroup/subgroup sizes, and mma schedule).

The main tuning loop will driven by a python script with the bulk of the implementation split across a few existing tools. We plan to implement it as follows:

iree-compile allows for dumping instrumented benchmarks to a directory. This is similar to the existing flag --iree-hal-dump-executable-benchmarks-to=, with each benchmark being dumped in a separate file, possibly with some top-level shared manifest file if necessary.
1. This will require adding a compiler instrumentation pass to add the instrumentation.
2. The compiler annotates root ops that can be tuned.
iree-run-module dumps profile data using the collected trace. This includes precise dispatch mapping and information about (dynamic) shapes, workgroup counts, etc.
Tuning script parses the dumped trace and instrumented benchmarks and locates the root ops. It then detects if the root op is supported or not. The tuning script knows how to generate tuning configurations for a number of supported root ops (e.g., matmul, convolution, contraction). The tuning configs are materialized as transform dialect/PDL specs.
1. We should allow for the evaluation order to be customizable/pluggable.
Tuning script launches iree-compile as a separate process. First, the existing configuration is stripped and replaced with the one from the tuning spec, and then compilation resumes from the level of executable sources. The compilation either succeeds or the verifier rejects the compilation info. It is the responsibility of the compiler to reconcile the compilation info across all ops in the module.
Tuning script benchmarks a number of dispatch candidates and selects the best one using the collected instrumentation (time). The tuning spec is added to the output file.

flowchart TD;
A[Input program] --> B(iree-compile)
B --> C[Instrumented vmfb]
C --> D(iree-run-module)
D --> E[Profile data]
B --> F[Instrumented benchmarks]

subgraph TuningLoop
  G(Tuner) --> H[Tuning spec]
  H --> I(iree-compile)
  I --> J[Instrumented dispatch vmfb]
  J --> K(iree-benchmark-module)
  K --> L[Benchmark result]
  L --> G
end
E --> G
F --> G

G --> O[Final tuning spec]

In the v0 for SD-family of models, we do not have to support dynamic shapes. Initially, the dispatches to tune will be selected by the user; later we can extend the tuning script to identify those automatically based on the generated trace.

The text was updated successfully, but these errors were encountered:

kuhar · 2024-04-02T22:14:48Z

cc: @benvanik @stellaraccident @antiagainst @MaheshRavishankar @yzhang93

stellaraccident · 2024-04-03T01:35:29Z

Nice / thank you! Various people have been doing this in a pretty ad-hoc way for years, and it is definitely profitable to do. Would be really nice to have it be a good and supported flow!

Support disabling workgrouop reordering and shared memory optimization passes based on translation info config entries. Because these are just named unit attributes, they do not require custom attributes defined in tablegen. These are intended for tuning. Issue: iree-org#16952

Support disabling workgroup reordering and shared memory optimization passes based on translation info config entries. Because these are just named unit attributes, they do not require custom attributes defined in tablegen. These are intended for tuning. Issue: #16952

…rg#17340) Support disabling workgroup reordering and shared memory optimization passes based on translation info config entries. Because these are just named unit attributes, they do not require custom attributes defined in tablegen. These are intended for tuning. Issue: iree-org#16952

…rg#17340) Support disabling workgroup reordering and shared memory optimization passes based on translation info config entries. Because these are just named unit attributes, they do not require custom attributes defined in tablegen. These are intended for tuning. Issue: iree-org#16952 Signed-off-by: Lubo Litchev <[email protected]>

kuhar · 2024-09-06T13:17:14Z

The scripts that drive the tuning loop landed in the sharktank repo: nod-ai/shark-ai#141 and nod-ai/shark-ai#158.
This is a temporary location, as the tuner in the current form only supports the LLVMGPUVectoDistribute pipeline and was only tested on the SDXL model. From there, we should expand to more targets and models, and then 'graduate' the code to a location under iree-org.

kuhar · 2024-11-20T00:06:05Z

Adding a few new issues related to the design of the tuner:

kuhar added the performance ⚡ Performance/optimization related work across the compiler and runtime label Apr 2, 2024

kuhar mentioned this issue Apr 3, 2024

KERNEL: Check matmul CodeGen configurations nod-ai/SHARK-ModelDev#491

Closed

kuhar mentioned this issue May 10, 2024

[LLVMGPU] Add translation_info config knobs to disable passes #17340

Merged

kuhar assigned kuhar and RattataKing May 27, 2024

kuhar unassigned RattataKing Sep 6, 2024

kuhar added the tuner label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tuning Support (Umbrella Issue) #16952

Add Tuning Support (Umbrella Issue) #16952

kuhar commented Apr 2, 2024 •

edited

Loading

kuhar commented Apr 2, 2024

stellaraccident commented Apr 3, 2024

kuhar commented Sep 6, 2024

kuhar commented Nov 20, 2024 •

edited

Loading

Add Tuning Support (Umbrella Issue) #16952

Add Tuning Support (Umbrella Issue) #16952

Comments

kuhar commented Apr 2, 2024 • edited Loading

kuhar commented Apr 2, 2024

stellaraccident commented Apr 3, 2024

kuhar commented Sep 6, 2024

kuhar commented Nov 20, 2024 • edited Loading

kuhar commented Apr 2, 2024 •

edited

Loading

kuhar commented Nov 20, 2024 •

edited

Loading