v0.18.0
oleksandr-pavlyk
released this
30 Sep 10:42
·
426 commits
to master
since this release
This release reaches an important milestone of making offloading fully asynchronous.
Calls to dpctl.tensor
submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.
The full list of changes that went into this release are:
Added
- Implement
tensor.take_along_axis
per Python Array API specification gh-1778 - Implement
tensor.put_along_axis
to complementtensor.take_along_axis
gh-1798 - Support for 'device=tensor.kDLCPU' in
tensor.from_dlpack
function andtensor.usm_ndarray.__dlpack__
method gh-1781 - Support DLPack on Windows gh-1746
- Implement
tensor.nextafter
function per Python Array API specification gh-1730 - Implement
tensor.count_nonzero
andtensor.diff
functions from Python array API specification gh-1732, gh-1780 - Add support for
order="K"
to*_like
array creation functions, and change defaultorder
keyword value from'C'
to'K'
gh-1808 - Support for 'max dimensions' in Array API capabilities info data gh-1774
- Add support for device aspect 'emulated' gh-1691
dpctl::tensor::usm_memory
class defined indpctl4pybind11.hpp
adds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782- Add support for COVERAGE build type in project's CMake script gh-1692
Change
- Change ownership of USM allocation by
dpctl.memory
objects, make executions ofdpctl.tensor
operations asynchronous gh-1705 - Add support for Python scalars by
tensor.where
function gh-1719 - Optimize division by Python scalar in statistical functions
tensor.mean
,tensor.std
,tensor.var
gh-1820 - Use transcendental functions from
sycl
namespace instead ofstd
namespace gh-1707 - Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
- Array creation function
tensor.zeros
to use asynchronousmemset
operation gh-1806 - The setter of
tensor.usm_ndarray.shape
property now supports Python scalar value gh-1786 - Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
- No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
- Update version of 'pybind11' used gh-1758, gh-1812
- Handle possible exceptions by
usm_host_allocator
used withstd::vector
gh-1791 - Use
dpctl::tensor::offset_utils::sycl_free_noexcept
instead ofsycl::free
inhost_task
tasks associated with life-time management of temporary USM allocations gh-1797 - Add
"same_kind"
-style casting for in-place mathematical operators oftensor.usm_ndarray
gh-1827, gh-1830
Fixed
- Fix setting of release variable Sphinx config file gh-1685
- Handle possible NULL return value from device aspect queries
DPCTLDevice_GetMaxWorkGroupSize1d
andDPCTLDevice_GetMaxWorkGroupSize2d
gh-1690 - Add license header to conda script files gh-1695
- Fix
tensor.round
behavior on CUDA devices gh-1700 - Add missing
#include <sstream>
gh-1701 - Fix for issue 1724 gh-1728
- Correct USM type for return array of
tensor.extract
function gh-1727 - Fix for
tensor.unique_all
andtensor.unique_inverse
to always return index arrays with default indexing data type gh-1741 - Propagate read-only flag from
__sycl_usm_array_interface__
intensor.asarray
function gh-1756 tensor.clip
to handle Python scalars which are out of bound for the data type of integral array gh-1759- Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
- Element-wise
tensor.divide
and comparison operations allow greater range of Python integer and integer array combinations gh-1771 - Fix for unexpected behavior when using floating point types for array indexing gh-1792
- Enable
pytest --pyargs dpctl.tests
gh-1833
Maintenance
- Improve performance of
test_sort_complex_fp_nan
gh-1704 - Improve exception wording raised by
tensor.broadcast_arrays()
gh-1720 - Remove
template
keyword in method call ofsycl::kernel_bundle
gh-1726 - Backport changelog edits from maintenance/0.17.x gh-1736
- Replace uses of 'intel' channels in docs and readme file gh-1737
- Update references to deprecated environment variable
SYCL_DEVICE_FILTER
gh-1740 - Correction for installation instruction steps gh-1754
- Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
- Add missing include to fix build break with newer LLVM gh-1776
- Add
#include <utility>
for definition ofstd::move
used gh-1787 - Change to CMake script to accomodate DPC++ transition from PI to UR architecture gh-1788
- Document
tensor._flags.Flags
class gh-1794 - Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
- Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
- Clean-up uses of
Strided1DIndexer
class gh-1805 - Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
- Do not add
sycl::event
associated with compute task to vector of events representing execution ofhost_task
gh-1807 - Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on
libze1
package which provides Level-Zero loader library gh-1801, gh-1840 - Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
- Remove recommendation to install wheels from Anaconda PyPI index gh-1819
- Removed use of post-link and pre-unlink conda scripts in
dpctl
gh-1821 - Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
- A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly:
gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, gh-1721, gh-1743, gh-1739, gh-1747, gh-1748, gh-1750, gh-1752, gh-1767, gh-1768, gh-1775, gh-1783, gh-1790, gh-1795, gh-1796, gh-1800, gh-1760, gh-1803, gh-1777, gh-1813, gh-1817, gh-1818