Skip to content

Releases: IntelPython/dpctl

v0.14.4

19 Jul 12:32
3794cbc
Compare
Choose a tag to compare

This is hot-fix for 0.14.3 release.

Added

  • Added dpctl.tensor.less_equal, dpctl.tensor.greater, dpctl.tensor.greater_equal: #1239

Changed

  • Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244

Fixed

  • Fixed handling of 0d arrays in dpctl.tensor.sum: #1238

v0.14.3

19 Jul 12:31
81553f8
Compare
Choose a tag to compare

Added

  • Added support of axis=None in dpctl.tensor.concat #1125
  • Added caching for dpctl.SyclDevice.filter_string property #1127
  • Added dpctl.tensor.isdtype from array API #1133
  • Added dpctl.tensor.unstack, dpctl.tensor.moveaxis, dpctl.tensor.swapaxes #1137, #1174
  • Allow for mutation of dpctl.tensor.usm_ndarray.flags.writable #1141
  • Added dpctl.tensor.where from array API #1147
  • Include libtensor headers in dpctl installation layout #1185
  • Added new properties of dpctl.tensor.usm_ndarray object #1199
  • Added a list of unary and binary elementwise functions from array API:
    • #1203: dpctl.tensor.add, dpctl.tensor.divide, dpctl.tensor.isnan, dpctl.tensor.isinf, dpctl.tensor.isfinite, dpctl.tensor.cos, dpctl.tensor.abs, dpctl.tensor.equal
    • #1205: dpctl.tensor.sqrt
    • #1209: implements out keyword argument
    • #1211: dpctl.tensor.multiply, dpctl.tensor.subtract
    • #1214: dpctl.tensor.not_equal
    • #1216: dpctl.tensor.exp, dpctl.tensor.sin
    • #1217: dpctl.tensor.real, dpctl.tensor.imag, dpctl.tensor.proj
    • #1218: dpctl.tensor.log, dpctl.tensor.log1p, dpctl.tensor.expm1
    • #1221: dpctl.tensor.floor_divide
    • #1235: dpctl.tensor.less
    • #1237: in-place support for addition, multiplication and subtraction
  • Added dpctl.tensor.all and dpctl.tensor.any #1204
  • Added dpctl.tensor.sum #1210

Changed

  • Updated examples of native Python extensions built using dpctl #1108
  • Used security flags to compile and link native extensions of dpctl #1109
  • Changed types of dpctl.tensor.finfo and dpctl.tensor.iinfo output structure per array API spec #1110
  • Consolidated multiple USM temporaries life-time management host_tasks to improve test suite stability #1111
  • MAINT: Improved cmake target dependency tracking #1112
  • MAINT: Improved docstrings for existing dpctl.tensor functions #1123
  • Changed default value of mode keyword in dpctl.tensor.take and dpctl.take.put from clip to wrap #1132
  • Added support for (nested) sequence of dpctl.tensor.usm_ndarray objects in dpctl.tensor.asarray #1139
  • Improved exception handling in dpctl.tensor.usm_ndarray.__setitem__ special method #1146
  • Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
  • Improved speed of dpctl.tensor.usm_ndarray printing functionality #1187
  • Require DPC++ RT 2023.1 to build and run dpctl #1195
  • Compile offloading native extensions with -fno-sycl-id-queries-fit-in-int fixing gh-1184, #1200
  • Transition to conda-forge ecosystem #1213

Fixed

  • Fix to add empty values check for dpctl.tensor.place #1105, #1106
  • Fixed gh-1089 by improving dpctl.tensor.asarray handling of NumPy arrays viewing into host-accessible USM allocation objects.
  • MAINT: Fixed build break with newer GCC and SYCLOS #1118
  • Fixed a bug in basic indexing of dpctl.tensor.usm_ndarray #1136

v0.14.2

28 Mar 13:54
Compare
Choose a tag to compare

Added

  • Added dpctl.SyclDevice.partition_max_sub_devices property #1005
  • Added dpctl.program.SyclKernel.max_sub_group_size property #1028
  • Implemented printing of usm_ndarray #1013, #1043, #1060
  • Implemented support for advanced indexing for dpctl.tensor.usm_ndarray #1095, #1097, #1099, #1101
  • Implemented support for platform listing in dpctl.__main__ script #1014
  • Improved performance of dpctl.tensor.asnumpy #1026
  • Added UsmNDArray_Make* C-API for constructing dpctl.tensor.usm_ndarray from native allocations #1050, #1067
  • Added support for dpctl.SyclDevice.native_vector_width_* device descriptors #1075
  • Added dpctl::tensor::usm_ndarray::get_shape_vector and dpctl::tensor::usm_ndarray::get_strides_vector methods #1090

Changed

  • Removed dpctl.select_host_device, dpctl.has_host_device, dpctl.SyclDevice.is_host, and dpctl.SyclDevice.has_aspect_host since support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028

  • usm_ndarrayis made writable by default #1012, and writable flag is now checked by __setitem__.

  • Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016

  • Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040

  • Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066

  • The dpctl.tensor.Device class supports print_device_info method #1029, equality comparison, and hashing #1048

  • Updated version of pybind11 used to 2.10.2 #1031

  • Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054

  • Changed return type of DCPCTLUSM_GetPointerType function in SyclInterface library #1061, #1065

  • Updated supported version of DLPack to 0.8 #1073

  • Implemented queue cache per context/device pair and deployed it in dpctl.memory, dpctl.tensor.from_dlpack and dpctl.tensor array creation functions #1076, #1079

  • Maintainance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074,#1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093

Fixed

  • Fixed error gh-998 in forming Python exception, #999.
  • A small memory leak fixed, #1000
  • Improved dtype support in dpctl.tensor.full, PR #1002
  • Added missing header file #1008 fixing gh-1007
  • Fixed a typo in device-specific dtype mapping #1015
  • Fixed default device integer type to align with NumPy's behavior on Windows #1017
  • Fixed unexpected overflow in dpctl.tensor.linspace when one of the parameters is the largest floating point value #1034
  • Constructors dpctl.tensor.empty, dpctl.tensor.zeros, and usm_ndarray constructor itself no longer allow to create array with data-types not supported by targeted device #1042
  • Fixed parameter validation in dpctl.SyclQueue constructor #1052
  • Fixed usm_type of the resulting array in dpctl.tensor.tril and dpctl.tensor.triu functions #1062
  • Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
  • Fixed issue with empty argument of dpctl.tensor.meshgrid function #1080
  • Fixed linking problem on Windows enabling dpctl to be functional on Windows for devices not supporting some data types #1083

Full Changelog: 0.14.0...0.14.2

v0.14.0

19 Nov 05:10
21a6931
Compare
Choose a tag to compare

[0.14.0] - 11/18/2022

Added

  • Implemented dpctl.tensor.linspace function from array-API #875.
  • Implemented dpctl.tensor.eye function from array-API #896.
  • Implemented dpctl.tensor.tril and dpctl.tensor.triu functions from array-API #910.
  • Added data type objects to dpctl.tensor namespace, finfo, iinfo, can_cast, and result_type functions #913.
  • Implemented dpctl.tensor.meshgrid creation function from array-API #920.
  • Implemented convenience class to represent output of dpctl.tensor.usm_ndarray.flags property #921.
  • Added new device attributes and kernel's device-specific attributes #894.
  • Added dpctl.utils.onetrace_enabled context manager for targeted trace collection #903.
  • Added support for stream keyword in __dlpack__ method, enabling support for sending usm_ndarray using mpi4py #906.
  • dpctl.tensor.asarray can now transition data between incompatible devices, #951.
  • Introduced "syclinterface/dpctl_sycl_types_casters.hpp" header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960.
  • Added C-API to dpctl.program.SyclKernel and dpctl.program.SyclProgram. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970.
  • Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
  • Added experimental support for sharing data allocated on sub-devices via dlpack #984.
  • Added dpctl.SyclDevice.sub_group_sizes property to retrieve supported sizes of sub-group by the device #985.

Changed

  • Improved queue compatibility testing in dpctl.tensor's implementation module #900.
  • Added automatic measurement of array-API conformance test suite in CI #901.
  • Improved performance of array metadata transfer from host to device #912.
  • Used os.add_dll_directory on Windows to ensure that DPCTLSyclInterface library can be found #918.
  • Refactored dpctl.tensor's implementation module #941 to streamline adding new functionality. Streamlined dpctl::tensor::usm_ndarray class implementation.
  • Added debugging messaging in case when DPCTLDynamicLib::getSymbol encounters errors #956.
  • Updated code base according to changes in DPC++ compiler #952, #957, #958.
  • Changed dpctl to use pybind11 2.10.1 #967.
  • Extended dpctl.tensor.full to accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.

Fixed

  • Improved SyclDevice constructor error message #893.
  • Fixed issue gh-890 about dpctl.tensor.reshape function #915.
  • Fixed unexpected UnboundLocalError exception in #922.
  • Fixed bugs in dpctl.tensor.arange in #945.
  • Fixed issue with type inferencing in dpctl.tensor.asarray in #949.
  • Added missing docstrings for dpctl.SyclDevice properties #964.

v0.13.0

28 Jul 21:30
5004aa1
Compare
Choose a tag to compare

Added

  • Implemented and deployed dedicated kernels for copying with casting #781, used in __setitem__, implementaion of asarray, dpctl.tensor.copy functions.
  • Implemented dedicated copying kernel for dpctl.tensor.reshape function #810, added support for copy keyword #807.
  • Implemented dedicated kernel to copy with casting from numpy.ndarray into dpctl.tensor.usm_ndarray #817.
  • Implemented dpctl.tensor.permute_dims function from array-API #787.
  • Implemented dpctl.tensor.expand_dims function from array-API #788.
  • Implemented dpctl.tensor.squeeze function from array-API #790.
  • Implemented dpctl.tensor.broadcast_to function from array-API #791.
  • Implemented dpctl.tensor.broadcast_arrays function from array-API #798.
  • Implemented dpctl.tensor.flip function from array-API #801.
  • Implemented dpctl.tensor.usm_ndarray.mT property per array-API #805.
  • Implemented dpctl.tensor.roll function from array-API #809.
  • Implemented dpctl.tensor.arange function from array-API #814.
  • Implemented dpctl.tensor.zeros function from array-API #816.
  • Implemented dpctl.tensor.zeros function from array-API #816.
  • Implemented dpctl.tensor.ones, dpctl.tensor.full, dpctl.tensor.empty_like, dpctl.tensor.zeros_like, dpctl.tensor.ones_like, dpctl.tensor.full_like functions from array-API #822.
  • Implemented DPCTLQueue_Memset function in SyclInterface library #812, and exposed it for dpctl.memory.MemoryUSM* classes #815.
  • Implemented dpctl.utils.get_coerced_usm_type to deduced usm type of the output array from types of input arrays in compute-follows-data execution model #797.
  • Added dpctl.SyclDevice.profiling_timer_resolution property #825.
  • Added dpctl.SyclDevice.platform and dpctl.SyclPlatform.default_context properties #827.
  • Provided pybind11 example for functions working on dpctl.tensor.usm_ndarray container applying oneMKL functions #780, #793, #819. The example was expanded to demonstrate implementing iterative linear solvers (Chebyshev solver, and Conjugate-Gradient solver) by asynchronously submitting individual SYCL kernels from Python #821, #833, #838.
  • Wrote manual page about working with dpctl.SyclQueue #829.
  • Added cmake scripts to dpctl package layout and a way to query the location #853.
  • Implemented dpctl.tensor.concat function from array-API #867.
  • Implemented dpctl.tensor.stack function from array-API #872.

Changed

  • Enhanced coverage collection for SyclInterface library by also collecting it during pytest run and combining traces with those collected during C-test run #818. This change also allows to not rebuild SyclInterface library when building C-test executable.
  • Exported keep_args_alive utility in dpctl4pybind11.hpp header #820. The utility uses sycl::handler::host_task to keep given Python arguments alive until eac sycl::event from the given vector of events is complete. The host task is scheduled on the SYCL queue provided as the first argument.
  • Changed the size of struct underlying dpctl.SyclEvent to avoid storing Python object previously used to keep kernel arguments scheduled with dpctl.SyclQueue.submit #823.
  • Fixed docstring for dpctl.SyclTimer #824.
  • Changed type of exceptions raised on failure to create dpctl.SyclDevice from ValueError to dpctl.SyclDeviceCreationError #826.
  • Improved performance of pybind11 type casters #837.
  • Changed implementation of dpctl.SyclProgram from using deprecated sycl::program to sycl::kernel_bundle #845.
  • Removed deprecated device aspects, added new supported aspects #844.
  • Updated vendored dlpack.h to version 0.7 #847.

Fixed

  • Fixed dpctl.lsplatform() to work correctly when used from within Jupyter notebook #800.
  • Fixed script to drive debug build #835 and fixed code to compile in debug mode #836.
  • Fixed filter selector string produced in outputs of dpctl.lsplatform(verbosity=2) and dpctl.SyclDevice.print_device_info #866.
  • Fixed issue with slicing reported in gh-870 in #871.

New contributor: @npolina4 contributed #867, #872 and reported #870

v0.12.0

01 Mar 23:54
Compare
Choose a tag to compare

What's changed in 0.12.0

Added

  • Properties added to MemoryUSM* objects. #647
  • Added dpctl.tensor.asarray #646
  • Implemented DLPack support for usm_ndarray #682
  • Exported dpctl.tensor.Device class #708 #718
  • Added testing of examples in CI #722
  • Added user manuals to dpctl documentation #712 #773

Changed

  • Folder dpctl-capi/ renamed to libsyclinterface/ in sources and documentation. #666
    #768
  • Added workflow to publish rendered documentation on PRs #673 #753 #726
  • Synchronization functions and USM allocation functions release GIL #736 #766
  • dpctl.SyclEvent destructor is made non-blocking #751

Fixed

  • Fixed for issue in code of dpctl.tensor.usm_ndarray.T #653
  • Fixed issue with dpctl.tensor.reshape's affect on contiguity flags of usm_ndarray #695
  • Fixed handling of empty list by dpctl.tensor.asarray #694
  • Fixed type inference with array of empty arrays in dpctl.tensor.asarray #697
  • Fixed issue gh-698 with dpctl.tensr.asarray #709
  • Fixed performance of item assignment from numpy array #724
  • DPCTLDeviceMgr_GetNumDevices should not operate on rejected devices #737
  • Fixed issue gh-729 for dpctl.tensor.reshape applied to 0-element usm_ndarray #756
  • Fixed issue gh-728 with dpctl.tensor.astype #757
  • Fixed type in memory overlapping test #770
  • Fixed issue with operator.pos for dpctl.tensor.usm_ndarray #783
  • Only call PyThread_Ensure from host_task if the main-thread interpreter is initialized and not finalizing #776 #778 #721

Full Changelog: 0.11.4...0.12.0

0.11.4

03 Dec 09:01
47a2684
Compare
Choose a tag to compare

What's Changed

  • Fix tests for nested context factories expecting for integration environment by @PokhodenkoSA in #705

Full Changelog: 0.11.3...0.11.4

0.11.3

01 Dec 09:10
Compare
Choose a tag to compare

Fixed

  • Set the last byte in allocated char array to zero [cherry picked from #650] (#699)

Full Changelog: 0.11.2...0.11.3

0.11.2

29 Nov 17:05
Compare
Choose a tag to compare

Added

  • Extending dpctl.device_context with nested contexts (#678)

Fixed

  • Fixed issue #649 about incorrect behavior of .T method on sliced arrays (#653)

Full Changelog: 0.11.1...0.11.2

0.11.1

13 Nov 01:23
Compare
Choose a tag to compare

Changed

  • Replaced uses of clang compiler with icx executable (#665)