Releases: IntelPython/dpctl
v0.18.3
v0.18.2
This is a bug-fix release, see https://github.com/IntelPython/dpctl/milestone/15.
It backports fixes for
tensor.result_type
behavior for scalars (see gh-1874) and- errors when using
dpctl
in virtual environment on Linux (gh-1892).
Changes from PR gh-1899 were also backported.
v0.18.1
This is incremental release where only installation instructions in README were updated to reflect the change in location of index with Python packages built by Intel(R) relative to 0.18.0 release.
v0.18.0
This release reaches an important milestone of making offloading fully asynchronous.
Calls to dpctl.tensor
submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.
The full list of changes that went into this release are:
Added
- Implement
tensor.take_along_axis
per Python Array API specification gh-1778 - Implement
tensor.put_along_axis
to complementtensor.take_along_axis
gh-1798 - Support for 'device=tensor.kDLCPU' in
tensor.from_dlpack
function andtensor.usm_ndarray.__dlpack__
method gh-1781 - Support DLPack on Windows gh-1746
- Implement
tensor.nextafter
function per Python Array API specification gh-1730 - Implement
tensor.count_nonzero
andtensor.diff
functions from Python array API specification gh-1732, gh-1780 - Add support for
order="K"
to*_like
array creation functions, and change defaultorder
keyword value from'C'
to'K'
gh-1808 - Support for 'max dimensions' in Array API capabilities info data gh-1774
- Add support for device aspect 'emulated' gh-1691
dpctl::tensor::usm_memory
class defined indpctl4pybind11.hpp
adds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782- Add support for COVERAGE build type in project's CMake script gh-1692
Change
- Change ownership of USM allocation by
dpctl.memory
objects, make executions ofdpctl.tensor
operations asynchronous gh-1705 - Add support for Python scalars by
tensor.where
function gh-1719 - Optimize division by Python scalar in statistical functions
tensor.mean
,tensor.std
,tensor.var
gh-1820 - Use transcendental functions from
sycl
namespace instead ofstd
namespace gh-1707 - Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
- Array creation function
tensor.zeros
to use asynchronousmemset
operation gh-1806 - The setter of
tensor.usm_ndarray.shape
property now supports Python scalar value gh-1786 - Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
- No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
- Update version of 'pybind11' used gh-1758, gh-1812
- Handle possible exceptions by
usm_host_allocator
used withstd::vector
gh-1791 - Use
dpctl::tensor::offset_utils::sycl_free_noexcept
instead ofsycl::free
inhost_task
tasks associated with life-time management of temporary USM allocations gh-1797 - Add
"same_kind"
-style casting for in-place mathematical operators oftensor.usm_ndarray
gh-1827, gh-1830
Fixed
- Fix setting of release variable Sphinx config file gh-1685
- Handle possible NULL return value from device aspect queries
DPCTLDevice_GetMaxWorkGroupSize1d
andDPCTLDevice_GetMaxWorkGroupSize2d
gh-1690 - Add license header to conda script files gh-1695
- Fix
tensor.round
behavior on CUDA devices gh-1700 - Add missing
#include <sstream>
gh-1701 - Fix for issue 1724 gh-1728
- Correct USM type for return array of
tensor.extract
function gh-1727 - Fix for
tensor.unique_all
andtensor.unique_inverse
to always return index arrays with default indexing data type gh-1741 - Propagate read-only flag from
__sycl_usm_array_interface__
intensor.asarray
function gh-1756 tensor.clip
to handle Python scalars which are out of bound for the data type of integral array gh-1759- Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
- Element-wise
tensor.divide
and comparison operations allow greater range of Python integer and integer array combinations gh-1771 - Fix for unexpected behavior when using floating point types for array indexing gh-1792
- Enable
pytest --pyargs dpctl.tests
gh-1833
Maintenance
- Improve performance of
test_sort_complex_fp_nan
gh-1704 - Improve exception wording raised by
tensor.broadcast_arrays()
gh-1720 - Remove
template
keyword in method call ofsycl::kernel_bundle
gh-1726 - Backport changelog edits from maintenance/0.17.x gh-1736
- Replace uses of 'intel' channels in docs and readme file gh-1737
- Update references to deprecated environment variable
SYCL_DEVICE_FILTER
gh-1740 - Correction for installation instruction steps gh-1754
- Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
- Add missing include to fix build break with newer LLVM gh-1776
- Add
#include <utility>
for definition ofstd::move
used gh-1787 - Change to CMake script to accomodate DPC++ transition from PI to UR architecture gh-1788
- Document
tensor._flags.Flags
class gh-1794 - Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
- Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
- Clean-up uses of
Strided1DIndexer
class gh-1805 - Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
- Do not add
sycl::event
associated with compute task to vector of events representing execution ofhost_task
gh-1807 - Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on
libze1
package which provides Level-Zero loader library gh-1801, gh-1840 - Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
- Remove recommendation to install wheels from Anaconda PyPI index gh-1819
- Removed use of post-link and pre-unlink conda scripts in
dpctl
gh-1821 - Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
- A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly:
gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, [gh-1721](https...
0.17.0
This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions,
and complies with revision 2023.12 of Python Array API specification.
Added
- Added pybind11 caster for
sycl::half
to map to/from Pythonfloat
to"dpctl4pybind11.hpp"
header: gh-1655 - Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
- Implemented
tensor.cumulative_sum
,tensor.cumulative_prod
andtensor.cumulative_logsumexp
: gh-1602
Changed
- Expanded documentation for
dpctl
: gh-1619 - Expanded
utils.intel_device_info
functionality: gh-1656 - Improved performance of elementwise operations: gh-1651
- Efficiency improvement by avoiding unnecessary copying of
sycl::queue
: gh-1645 dpctl
uses pybind11 2.12.0: gh-1640- Improved performance of
tensor.reshape
operation withorder="F"
when copying is needed, or requested: gh-1677
Fixed
- Fixed initialization of byte type constants in
dpctl_capi
Python/C API loader class in"dpctl4pybind11.hpp"
: gh-1665 - Fixed crash in
tensor.sort
reported for a CPU device and a CUDA device: gh-1676 - Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
- Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
- Support use of index arrays of different integral types in indexing operations: gh-47
- Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
- Corrected
tensor.tile
for scalar inputs and empty repetitions: gh-1628 - Fixed support for
out
keyword intensor.matmul
: gh-1610 - Fixed bug in basic slicing of empty arrays: gh-1680
- Fixed bug in
tensor.bitwise_invert
for boolean input array: gh-1681 - Fixed bug in
tensor.repeat
on zero-size input arrays: gh-1682
New Contributors
- @bdmoore1 made their first contribution in #1659
- @ekomarova made their first contribution in #1666
Full Changelog: https://github.com/IntelPython/dpctl/blob/master/CHANGELOG.md
v0.16.1
This release includes bug fixes and provides a change needed by numba_dpex
project to support dispatching kernels
consuming instances of sycl::local_accessor
template type.
Changed
- Changed behavior of
dpctl.tensor.usm_ndarray.__dlpack_device__
method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
- Array creation functions and the
usm_ndarray
constructor indpctl.tensor
submodule now use cached default-selected device to improve performance: #1606 - Changed treatment of
axis
keyword fordpctl.tensor.tensordot
anddpctl.tensor.vecdot
to align with Python Array API 2023.12 specification: #1608 - Changed implementation of
DPCTLQueue_SubmitRange
,DPCTLQueue_SubmitNDRange
in DPCTLSyclInterface library to supportsycl::local_accessor
arguments needed bynumba_dpex
; the enumDPCTLKernelArgT\ ype
to correspond to C++ disjoint types: #1609, #1611, #1612
Fixed
- Fixed a crash on Windows platform during execution of getter of
dpctl.SyclPlatfom.default_context
property: : #1604 - Fixed kernel submission error on NVidia CUDA GPUs during
dpctl.tensor.matmul
operation: #1605 - Fixed corruption of context cache table entries: #1607
- Fixed incorrect result from
dpctl.tensor.tensordot
reported in issue #1570: #1608 - Fixed output of
python -m dpctl --library
to fix specified library name: #1615
v0.16.0
This release is virtually identical to 0.15.1 as far as features are concerned.
This release is meant to be built with DPC++ 2024.1.0, that no longer support older integrated Gen9 Intel GPUs, such as those that came with Intel Core 10th generation and older.
v0.15.1
Summary
This release reaches milestone of 100% compliance of dpctl.tensor
functions with Python Array API 2022.12 standard for the main namespace.
Added
- Added reduction functions
dpctl.tensor.min
,dpctl.tensor.max
,dpctl.tensor.argmin
,dpctl.tensor.argmax
, anddpctl.tensor.prod
per Python Array API specifications: #1399 - Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of
dpctl.tensor.usm_ndarray
type: #1431, #1447 - Added new elementwise functions
dpctl.tensor.cbrt
,dpctl.tensor.rsqrt
,dpctl.tensor.exp2
,dpctl.tensor.copysign
,dpctl.tensor.angle
, anddpctl.tensor.reciprocal
: #1443, #1474 - Added statistical functions
dpctl.tensor.mean
,dpctl.tensor.std
,dpctl.tensor.var
per Python Array API specifications: #1465 - Added sorting functions
dpctl.tensor.sort
anddpctl.tensor.argsort
, and set functionsdpctl.tensor.unique_values
,dpctl.tensor.unique_counts
,dpctl.tensor.unique_inverse
,dpctl.tensor.unique_all
: #1483 - Added linear algebra functions from the Array API namespace
dpctl.tensor.matrix_transpose
,dpctl.tensor.matmul
,dpctl.tensor.vecdot
, anddpctl.tensor.tensordot
: #1490, #1525, #1541 - Added
dpctl.tensor.clip
function: #1444, #1505 - Added custom reduction functions
dpt.logsumexp
(reduction using binary functiondpctl.tensor.logaddexp
),dpt.reduce_hypot
(reduction using binary functiondpctl.tensor.hypot
): #1446 - Added inspection API to query capabilities of Python Array API specification implementation: #1469
- Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
- Added
dpctl.utils.intel_device_info
function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445 - Added support for two new device descriptors,
dpctl.SyclDevice.max_mem_alloc_size
anddpctl.SyclDevice.max_clock_frequency
: #1530
Changed
- Functions
dpctl.tensor.result_type
anddpctl.tensor.can_cast
became device-aware: #1488, #1473 - Implementation of method
dpctl.SyclEvent.wait_for
changed to usesycl::event::wait
instead ofsycl::event::wait_and_throw
: gh-1436 dpctl.tensor.astype
was changed to supportdevice
keyword as per Python Array API specification: #1511- C++ header files in
libtensor/include/kernels
containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516
Fixed
v0.15.0
Summary
The 0.15.0 represents a milestone in which dpctl.tensor.usm_ndarray
object now implements all special Python operators, except __matmul__
and __rmatmul__
.
The dpctl.tensor
increases its array-API conformance test suite pass rate to 81.8%, (passed: 916, failed: 84, skipped: 119).
Details
Added
- Added
dpctl.tensor.floor
,dpctl.tensor.ceil
,dpctl.tensor.trunc
elementwise functions. - Added
dpctl.tensor.hypot
,dpctl.tensor.logaddexp
elementwise functions. - Added trigonometric (
dpctl.tensor.sin
,dpctl.tensor.cos
,dpctl.tensor.tan
) and hyperbolic (dpctl.tensor.sinh
,dpctl.tensor.cosh
,dpctl.tensor.tanh
) elementwise functions and their inverses (dpctl.tensor.asin
,dpctl.tensor.asinh
,dpctl.tensor.acos
,dpctl.tensor.acosh
,dpctl.tensor.atan
,dpctl.tensor.atanh
). - Added
dpctl.tensor.round
function. - Added
dpctl.tensor.sign
anddpctl.tensor.remainder
elementwise functions. - Added bitwise elementwise functions
dpctl.tensor.bitwise_and
,dpctl.tensor.bitwise_xor
,dpctl.tensor.bitwise_or
,dpctl.tensor.bitwise_invert
- Added bitwise shift functions
dpctl.tensor.bitwise_left_shift
anddpctl.tensor.bitwise_right_shift
. - Added
dpctl.tensor.atan2
anddpctl.tensor.signbit
elementwise functions. - Added
dpctl.tensor.minumum
anddpctl.tensor.maximum
binary elementwise functions. - Supported equality checking and hashing for
dpctl.SyclPlatform
. - Implemented
types
property for all unary and binary elementwise functions #1361 - Added
dpctl.tensor.repeat
anddpctl.tensor.tile
functions. - Added
dpctl.tensor.matrix_transpose
function.
Changed
- Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for
dpctl.tensor.usm_ndarray
type #1324. - Removed
dpctl.tensor.numpy_usm_shared
obsolete class and associated tests which were being skipped #1310 - Transitioned
dpctl
codebase to Cython 3. - Improved performance of boolean reduction functions
dpctl.tensor.all
anddpctl.tensor.any
. - Improved performance of summation function
dpctl.tensor.sum
. - Improved in-place arithmetic operations for addition, subtraction and multiplication.
- Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
- Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
- Removed deprecated
DPCTLDevice_GetMaxWorkItemSizes
function from the SyclInterface library. - Improved performance of
dpctl.tensor.reshape
in the case when a copy is being made. - Improved performance of
dpctl.tensor.roll
function.
Fixed
v0.14.5
This release builds on 0.14.3 and 0.14.4 releases and addresses some performance gaps as well as implements several new elementwise functions.
Added
- Added
dpctl.tensor.log2
anddpctl.tensor.log10
: #1267 - Added
dpctl.tensor.negative
,dpctl.tensor.positive
,dpctl.tensor.square
#1268 - Added
dpctl.tensor.logical_not
,dpctl.tensor.logical_and
,dpctl.tensor.logical_or
,dpctl.tensor.logical_xor
#1270
Changed
dpctl.tensor.astype
behavior fornewdtype=None
changes #1261dpctl.tensor.usm_ndaray
constructor default value ofdtype
keyword argument changed toNone
: #1265- Support for
out
arguments that overlap with inputs for unary elementwise functions#1281 - Copying from one array to another a no-op if both arrays view into the same memory #1284