Skip to content

Merge buffers, C-syntax backend builder, improved syntax extensions

Compare
Choose a tag to compare
@lukstafi lukstafi released this 05 Sep 15:12
· 214 commits to master since this release

From the CHANGELOG:

Added

  • A new backend "cc": C based on a configurable C compiler command, defaulting to cc.
  • Merge buffers representational abstraction (one per virtual device):
    • backends just need to support device-to-device transfers,
    • merging gets implemented in "user space".
  • CUDA streaming multiprocessor parallelism via streams <-> virtual devices.
  • Support for cuda-gdb and compute-sanitizer (pass the right arguments to cudajit).
  • Inline declarations for (non-differentiable) tensors in the %cd syntax.
  • A minimal wrapper Sync_backend creating CPU backends with a single device only, where all calls are synchronous. (It's a baseline and helps debugging.)
  • In progress: proper (condition variables based) scheduler. The legacy scheduler (pipes based) kept for now as baseline and to help debugging.
  • Documentation for the syntax extensions.
  • %op syntax: when under a ~config parameter, refine the inline declared params' labels with config.label.
  • %op syntax: incorporate the input tensor's (if any) label in the resulting tensor's label.
  • Comments in config files using the line prefix ~~.

Changed

  • Terminology in the API: Renamed almost all uses of "jit" into uses of "compile" and / or "link".
  • Split the compile-to-ptx phase from the build-module and build-kernel-launcher phase.
  • Migrated the CUDA backend to ppx_minidebug-based execution tracing.
  • Fixes for mixed precision computations.
  • Further terminology refactoring: Renamed Low_level.compile to Low_level.lower;
    • and Low_level.compiled to Low_level.optimized, making it a record.
  • Further refactoring of the Backends API:
    • split the device type into virtual device and physical_device,
    • removed the direct support for merge, instead relying on merge buffers.
  • Updated to cudajit 0.4.
  • A template for C-syntax backends, refactoring CC and CUDA backends.
  • Improvements to handling of tensor node labels, and to the Tnode.debug_name function.
  • Output files generated by backends, and files generated by logging, in separate subdirectories.
  • C-syntax logging: also output the pre-assignment value when logging an assignment.
  • Migrated to ppx_minidebug 2.0 with the benefits it brings: no runtime passing, Utils.settings.log_level unified with ppx_minidebug's log levels.

Fixed

  • Allow verifying that non-embedded tensor nodes of the tensor(s) associated with a linked code are already in the context passed to link (resp. link_batch), since they won't get introduced into the context. It is the responsibility of helper functions (such as those in Train) to ensure the check.
  • Fixed both known and newly discovered shortcomings of the syntax extensions.
  • In particular, %op syntax: lift ~config applications out of (tensor) functions.
  • Multiple other tiny fixes.