Skip to content

Commit

Permalink
README update for v0.2.1.
Browse files Browse the repository at this point in the history
  • Loading branch information
lukstafi committed Jul 21, 2023
1 parent cbec799 commit 5f384dc
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,14 @@ Warning disclaimer: this project is still "not announced". The features describe

## Future milestones

* **v0.2.1 cuda**: a super-naive no-model-parallelism CUDA backend.
* **v0.2.2 tiling**: a slightly-less-naive CUDA backend with some model parallelism, and maybe tiling.
* **v0.3 LLVM and Triton**:
* **v0.3.1 triton-C**: a Triton backend.
* **v0.3.2 llvm**: an LLVM backend as an alternative to the GCCJIT backend. This one would be a more CPU-centric release. Blocked by LLVM 17 being released (i.e. aiming to start work around end of July).
* **v0.3.2 triton-llvm**: an LLVM-based Triton backend.
* **v0.4 usability**: examples covering most of Andrej Karpathy's "Neural Networks Zero to Hero" series; data loading; checkpointing.
* **v0.2.2 reliable speedups**: remove param update races. For Cuda, apply within-block parallelism on a per-loop basis.
* **v0.3 usability**: examples covering most of Andrej Karpathy's "Neural Networks Zero to Hero" series; data loading; checkpointing.
* **v0.4 LLVM and Triton?**:
* **v0.4.1 triton-C**: a Triton backend.
* **v0.4.2 llvm**: an LLVM backend as an alternative to the GCCJIT backend. This one would be a more CPU-centric release. Blocked by LLVM 17 being released (i.e. aiming to start work around end of July).
* **v0.4.3 triton-llvm**: an LLVM-based Triton backend.
* **v0.5 documentation**: `.mli` files and maybe more documentation.
* **v0.6 scale**: basic distributed computation; runtime-autotuning optimization settings; data ingestion.
* **v0.6 scale**: model parallelism; data ingestion; maybe basic distributed computation; maybe autotuning optimization settings.
* **v1 completeness**: whatever not-yet-implemented features that still seem needed and impact the framework design. (E.g. at the time of v0.1.X, convolutions, reshaping, concatenation are not easily expressible.)

### Releases
Expand All @@ -47,6 +46,7 @@ For details, see [CHANGES](CHANGES.md).

* **v0.2 inching toward GPU**:
* **v0.2.0 stack-as-device**: for multicore CPU, improve cache locality and reduce cache contention by treating the C function stack as the "device memory".
* **v0.2.1 naive-cuda**: a Cuda backend where "task id" becomes parallelization over blocks, and a new dedicated axis "sample num" becomes parallelization over threads in a block.
* **v0.1 GCCJIT backend**:
* **v0.1.2**: multicore computations using a thread-local "task id" index.
* **v0.1.1**: inlining scalar constants, improved inlining for virtual nodes.
Expand Down

0 comments on commit 5f384dc

Please sign in to comment.