Releases: IntelPython/dpctl
v0.14.4
This is hot-fix for 0.14.3 release.
Added
- Added
dpctl.tensor.less_equal
,dpctl.tensor.greater
,dpctl.tensor.greater_equal
: #1239
Changed
- Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244
Fixed
- Fixed handling of 0d arrays in
dpctl.tensor.sum
: #1238
v0.14.3
Added
- Added support of
axis=None
indpctl.tensor.concat
#1125 - Added caching for
dpctl.SyclDevice.filter_string
property #1127 - Added
dpctl.tensor.isdtype
from array API #1133 - Added
dpctl.tensor.unstack
,dpctl.tensor.moveaxis
,dpctl.tensor.swapaxes
#1137, #1174 - Allow for mutation of
dpctl.tensor.usm_ndarray.flags.writable
#1141 - Added
dpctl.tensor.where
from array API #1147 - Include libtensor headers in
dpctl
installation layout #1185 - Added new properties of
dpctl.tensor.usm_ndarray
object #1199 - Added a list of unary and binary elementwise functions from array API:
- #1203:
dpctl.tensor.add
,dpctl.tensor.divide
,dpctl.tensor.isnan
,dpctl.tensor.isinf
,dpctl.tensor.isfinite
,dpctl.tensor.cos
,dpctl.tensor.abs
,dpctl.tensor.equal
- #1205:
dpctl.tensor.sqrt
- #1209: implements
out
keyword argument - #1211:
dpctl.tensor.multiply
,dpctl.tensor.subtract
- #1214:
dpctl.tensor.not_equal
- #1216:
dpctl.tensor.exp
,dpctl.tensor.sin
- #1217:
dpctl.tensor.real
,dpctl.tensor.imag
,dpctl.tensor.proj
- #1218:
dpctl.tensor.log
,dpctl.tensor.log1p
,dpctl.tensor.expm1
- #1221:
dpctl.tensor.floor_divide
- #1235:
dpctl.tensor.less
- #1237: in-place support for addition, multiplication and subtraction
- #1203:
- Added
dpctl.tensor.all
anddpctl.tensor.any
#1204 - Added
dpctl.tensor.sum
#1210
Changed
- Updated examples of native Python extensions built using
dpctl
#1108 - Used security flags to compile and link native extensions of
dpctl
#1109 - Changed types of
dpctl.tensor.finfo
anddpctl.tensor.iinfo
output structure per array API spec #1110 - Consolidated multiple USM temporaries life-time management
host_task
s to improve test suite stability #1111 - MAINT: Improved cmake target dependency tracking #1112
- MAINT: Improved docstrings for existing
dpctl.tensor
functions #1123 - Changed default value of
mode
keyword indpctl.tensor.take
anddpctl.take.put
fromclip
towrap
#1132 - Added support for (nested) sequence of
dpctl.tensor.usm_ndarray
objects indpctl.tensor.asarray
#1139 - Improved exception handling in
dpctl.tensor.usm_ndarray.__setitem__
special method #1146 - Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
- Improved speed of
dpctl.tensor.usm_ndarray
printing functionality #1187 - Require DPC++ RT 2023.1 to build and run
dpctl
#1195 - Compile offloading native extensions with
-fno-sycl-id-queries-fit-in-int
fixing gh-1184, #1200 - Transition to conda-forge ecosystem #1213
Fixed
- Fix to add empty values check for
dpctl.tensor.place
#1105, #1106 - Fixed gh-1089 by improving
dpctl.tensor.asarray
handling of NumPy arrays viewing into host-accessible USM allocation objects. - MAINT: Fixed build break with newer GCC and SYCLOS #1118
- Fixed a bug in basic indexing of
dpctl.tensor.usm_ndarray
#1136
v0.14.2
Added
- Added
dpctl.SyclDevice.partition_max_sub_devices
property #1005 - Added
dpctl.program.SyclKernel.max_sub_group_size
property #1028 - Implemented printing of
usm_ndarray
#1013, #1043, #1060 - Implemented support for advanced indexing for
dpctl.tensor.usm_ndarray
#1095, #1097, #1099, #1101 - Implemented support for platform listing in
dpctl.__main__
script #1014 - Improved performance of
dpctl.tensor.asnumpy
#1026 - Added
UsmNDArray_Make*
C-API for constructingdpctl.tensor.usm_ndarray
from native allocations #1050, #1067 - Added support for
dpctl.SyclDevice.native_vector_width_*
device descriptors #1075 - Added
dpctl::tensor::usm_ndarray::get_shape_vector
anddpctl::tensor::usm_ndarray::get_strides_vector
methods #1090
Changed
-
Removed
dpctl.select_host_device
,dpctl.has_host_device
,dpctl.SyclDevice.is_host
, anddpctl.SyclDevice.has_aspect_host
since support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028 -
usm_ndarray
is made writable by default #1012, and writable flag is now checked by__setitem__
. -
Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016
-
Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040
-
Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066
-
The
dpctl.tensor.Device
class supportsprint_device_info
method #1029, equality comparison, and hashing #1048 -
Updated version of pybind11 used to 2.10.2 #1031
-
Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054
-
Changed return type of
DCPCTLUSM_GetPointerType
function in SyclInterface library #1061, #1065 -
Updated supported version of DLPack to 0.8 #1073
-
Implemented queue cache per context/device pair and deployed it in
dpctl.memory
,dpctl.tensor.from_dlpack
anddpctl.tensor
array creation functions #1076, #1079 -
Maintainance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074,#1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093
Fixed
- Fixed error gh-998 in forming Python exception, #999.
- A small memory leak fixed, #1000
- Improved dtype support in
dpctl.tensor.full
, PR #1002 - Added missing header file #1008 fixing gh-1007
- Fixed a typo in device-specific dtype mapping #1015
- Fixed default device integer type to align with NumPy's behavior on Windows #1017
- Fixed unexpected overflow in
dpctl.tensor.linspace
when one of the parameters is the largest floating point value #1034 - Constructors
dpctl.tensor.empty
,dpctl.tensor.zeros
, andusm_ndarray
constructor itself no longer allow to create array with data-types not supported by targeted device #1042 - Fixed parameter validation in
dpctl.SyclQueue
constructor #1052 - Fixed
usm_type
of the resulting array indpctl.tensor.tril
anddpctl.tensor.triu
functions #1062 - Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
- Fixed issue with empty argument of
dpctl.tensor.meshgrid
function #1080 - Fixed linking problem on Windows enabling
dpctl
to be functional on Windows for devices not supporting some data types #1083
Full Changelog: 0.14.0...0.14.2
v0.14.0
[0.14.0] - 11/18/2022
Added
- Implemented
dpctl.tensor.linspace
function from array-API #875. - Implemented
dpctl.tensor.eye
function from array-API #896. - Implemented
dpctl.tensor.tril
anddpctl.tensor.triu
functions from array-API #910. - Added data type objects to
dpctl.tensor
namespace,finfo
,iinfo
,can_cast
, andresult_type
functions #913. - Implemented
dpctl.tensor.meshgrid
creation function from array-API #920. - Implemented convenience class to represent output of
dpctl.tensor.usm_ndarray.flags
property #921. - Added new device attributes and kernel's device-specific attributes #894.
- Added
dpctl.utils.onetrace_enabled
context manager for targeted trace collection #903. - Added support for
stream
keyword in__dlpack__
method, enabling support for sendingusm_ndarray
using mpi4py #906. dpctl.tensor.asarray
can now transition data between incompatible devices, #951.- Introduced
"syclinterface/dpctl_sycl_types_casters.hpp"
header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960. - Added C-API to
dpctl.program.SyclKernel
anddpctl.program.SyclProgram
. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970. - Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
- Added experimental support for sharing data allocated on sub-devices via dlpack #984.
- Added
dpctl.SyclDevice.sub_group_sizes
property to retrieve supported sizes of sub-group by the device #985.
Changed
- Improved queue compatibility testing in
dpctl.tensor
's implementation module #900. - Added automatic measurement of array-API conformance test suite in CI #901.
- Improved performance of array metadata transfer from host to device #912.
- Used
os.add_dll_directory
on Windows to ensure thatDPCTLSyclInterface
library can be found #918. - Refactored
dpctl.tensor
's implementation module #941 to streamline adding new functionality. Streamlineddpctl::tensor::usm_ndarray
class implementation. - Added debugging messaging in case when
DPCTLDynamicLib::getSymbol
encounters errors #956. - Updated code base according to changes in DPC++ compiler #952, #957, #958.
- Changed
dpctl
to use pybind11 2.10.1 #967. - Extended
dpctl.tensor.full
to accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.
Fixed
- Improved SyclDevice constructor error message #893.
- Fixed issue gh-890 about
dpctl.tensor.reshape
function #915. - Fixed unexpected
UnboundLocalError
exception in #922. - Fixed bugs in
dpctl.tensor.arange
in #945. - Fixed issue with type inferencing in
dpctl.tensor.asarray
in #949. - Added missing docstrings for
dpctl.SyclDevice
properties #964.
v0.13.0
Added
- Implemented and deployed dedicated kernels for copying with casting #781, used in
__setitem__
, implementaion ofasarray
,dpctl.tensor.copy
functions. - Implemented dedicated copying kernel for
dpctl.tensor.reshape
function #810, added support forcopy
keyword #807. - Implemented dedicated kernel to copy with casting from
numpy.ndarray
intodpctl.tensor.usm_ndarray
#817. - Implemented
dpctl.tensor.permute_dims
function from array-API #787. - Implemented
dpctl.tensor.expand_dims
function from array-API #788. - Implemented
dpctl.tensor.squeeze
function from array-API #790. - Implemented
dpctl.tensor.broadcast_to
function from array-API #791. - Implemented
dpctl.tensor.broadcast_arrays
function from array-API #798. - Implemented
dpctl.tensor.flip
function from array-API #801. - Implemented
dpctl.tensor.usm_ndarray.mT
property per array-API #805. - Implemented
dpctl.tensor.roll
function from array-API #809. - Implemented
dpctl.tensor.arange
function from array-API #814. - Implemented
dpctl.tensor.zeros
function from array-API #816. - Implemented
dpctl.tensor.zeros
function from array-API #816. - Implemented
dpctl.tensor.ones
,dpctl.tensor.full
,dpctl.tensor.empty_like
,dpctl.tensor.zeros_like
,dpctl.tensor.ones_like
,dpctl.tensor.full_like
functions from array-API #822. - Implemented
DPCTLQueue_Memset
function in SyclInterface library #812, and exposed it fordpctl.memory.MemoryUSM*
classes #815. - Implemented
dpctl.utils.get_coerced_usm_type
to deduced usm type of the output array from types of input arrays in compute-follows-data execution model #797. - Added
dpctl.SyclDevice.profiling_timer_resolution
property #825. - Added
dpctl.SyclDevice.platform
anddpctl.SyclPlatform.default_context
properties #827. - Provided pybind11 example for functions working on
dpctl.tensor.usm_ndarray
container applying oneMKL functions #780, #793, #819. The example was expanded to demonstrate implementing iterative linear solvers (Chebyshev solver, and Conjugate-Gradient solver) by asynchronously submitting individual SYCL kernels from Python #821, #833, #838. - Wrote manual page about working with
dpctl.SyclQueue
#829. - Added cmake scripts to dpctl package layout and a way to query the location #853.
- Implemented
dpctl.tensor.concat
function from array-API #867. - Implemented
dpctl.tensor.stack
function from array-API #872.
Changed
- Enhanced coverage collection for SyclInterface library by also collecting it during pytest run and combining traces with those collected during C-test run #818. This change also allows to not rebuild SyclInterface library when building C-test executable.
- Exported
keep_args_alive
utility indpctl4pybind11.hpp
header #820. The utility usessycl::handler::host_task
to keep given Python arguments alive until eacsycl::event
from the given vector of events is complete. The host task is scheduled on the SYCL queue provided as the first argument. - Changed the size of struct underlying
dpctl.SyclEvent
to avoid storing Python object previously used to keep kernel arguments scheduled withdpctl.SyclQueue.submit
#823. - Fixed docstring for
dpctl.SyclTimer
#824. - Changed type of exceptions raised on failure to create
dpctl.SyclDevice
fromValueError
todpctl.SyclDeviceCreationError
#826. - Improved performance of pybind11 type casters #837.
- Changed implementation of
dpctl.SyclProgram
from using deprecatedsycl::program
tosycl::kernel_bundle
#845. - Removed deprecated device aspects, added new supported aspects #844.
- Updated vendored
dlpack.h
to version 0.7 #847.
Fixed
- Fixed
dpctl.lsplatform()
to work correctly when used from within Jupyter notebook #800. - Fixed script to drive debug build #835 and fixed code to compile in debug mode #836.
- Fixed filter selector string produced in outputs of
dpctl.lsplatform(verbosity=2)
anddpctl.SyclDevice.print_device_info
#866. - Fixed issue with slicing reported in gh-870 in #871.
New contributor: @npolina4 contributed #867, #872 and reported #870
v0.12.0
What's changed in 0.12.0
Added
- Properties added to MemoryUSM* objects. #647
- Added
dpctl.tensor.asarray
#646 - Implemented DLPack support for usm_ndarray #682
- Exported
dpctl.tensor.Device
class #708 #718 - Added testing of examples in CI #722
- Added user manuals to dpctl documentation #712 #773
Changed
- Folder dpctl-capi/ renamed to libsyclinterface/ in sources and documentation. #666
#768 - Added workflow to publish rendered documentation on PRs #673 #753 #726
- Synchronization functions and USM allocation functions release GIL #736 #766
dpctl.SyclEvent
destructor is made non-blocking #751
Fixed
- Fixed for issue in code of
dpctl.tensor.usm_ndarray.T
#653 - Fixed issue with
dpctl.tensor.reshape
's affect on contiguity flags of usm_ndarray #695 - Fixed handling of empty list by
dpctl.tensor.asarray
#694 - Fixed type inference with array of empty arrays in
dpctl.tensor.asarray
#697 - Fixed issue gh-698 with
dpctl.tensr.asarray
#709 - Fixed performance of item assignment from numpy array #724
DPCTLDeviceMgr_GetNumDevices
should not operate on rejected devices #737- Fixed issue gh-729 for
dpctl.tensor.reshape
applied to 0-element usm_ndarray #756 - Fixed issue gh-728 with
dpctl.tensor.astype
#757 - Fixed type in memory overlapping test #770
- Fixed issue with operator.pos for
dpctl.tensor.usm_ndarray
#783 - Only call
PyThread_Ensure
from host_task if the main-thread interpreter is initialized and not finalizing #776 #778 #721
Full Changelog: 0.11.4...0.12.0
0.11.4
What's Changed
- Fix tests for nested context factories expecting for integration environment by @PokhodenkoSA in #705
Full Changelog: 0.11.3...0.11.4
0.11.3
Fixed
Full Changelog: 0.11.2...0.11.3
0.11.2
Added
- Extending
dpctl.device_context
with nested contexts (#678)
Fixed
Full Changelog: 0.11.1...0.11.2