-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement] remove contiguous check from _check_array
#2185
Changes from all commits
1673a85
830400b
c2c23f9
2d86d75
d34c2c2
585f7eb
8666c11
e303421
50f2a73
676a739
efce9e8
13c4cd0
0104ea5
816a374
485fb34
3d40ece
6b98206
85e57fe
85931b9
0f98a3f
8d68574
469a806
5f9fb6b
912fa5b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,7 @@ | |
|
||
namespace oneapi::dal::python { | ||
|
||
using namespace pybind11::literals; | ||
// Please follow <https://intelpython.github.io/dpctl/latest/ | ||
// api_reference/dpctl/sycl_usm_array_interface.html#sycl-usm-array-interface-attribute> | ||
// for the description of `__sycl_usm_array_interface__` protocol. | ||
|
@@ -42,6 +43,8 @@ namespace oneapi::dal::python { | |
// of `__sycl_usm_array_interface__` protocol. | ||
template <typename Type> | ||
dal::table convert_to_homogen_impl(py::object obj) { | ||
dal::table res{}; | ||
|
||
// Get `__sycl_usm_array_interface__` dictionary representing USM allocations. | ||
auto sua_iface_dict = get_sua_interface(obj); | ||
|
||
|
@@ -64,6 +67,25 @@ dal::table convert_to_homogen_impl(py::object obj) { | |
// Get oneDAL Homogen DataLayout enumeration from input object shape and strides. | ||
const auto layout = get_sua_iface_layout(sua_iface_dict, r_count, c_count); | ||
|
||
if (layout == dal::data_layout::unknown){ | ||
// NOTE: this will make a C-contiguous deep copy of the data | ||
// if possible, this is expected to be a special case | ||
py::object copy; | ||
if (py::hasattr(obj, "copy")){ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it guaranteed that the copy will be C-contiguous? It will always be the case with numpy, but what about other packages? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as of now only dpctl and dpnp use this standard: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about other libraries that implement the array protocol? XArray also has "copy" for example: |
||
copy = obj.attr("copy")(); | ||
} | ||
else if (py::hasattr(obj, "__array_namespace__")){ | ||
const auto space = obj.attr("__array_namespace__")(); | ||
copy = space.attr("asarray")(obj, "copy"_a = true); | ||
} | ||
else { | ||
throw std::runtime_error("Wrong strides"); | ||
} | ||
res = convert_to_homogen_impl<Type>(copy); | ||
copy.dec_ref(); | ||
return res; | ||
} | ||
|
||
// Get `__sycl_usm_array_interface__['data'][0]`, the first element of data entry, | ||
// which is a Python integer encoding USM pointer value. | ||
const auto* const ptr = reinterpret_cast<const Type*>(get_sua_ptr(sua_iface_dict)); | ||
|
@@ -79,8 +101,6 @@ dal::table convert_to_homogen_impl(py::object obj) { | |
// Use read-only accessor for onedal table. | ||
bool is_readonly = is_sua_readonly(sua_iface_dict); | ||
|
||
dal::table res{}; | ||
|
||
if (is_readonly) { | ||
res = dal::homogen_table(queue, | ||
ptr, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -153,15 +153,6 @@ def _check_array( | |
|
||
if sp.issparse(array): | ||
return array | ||
|
||
# TODO: Convert this kind of arrays to a table like in daal4py | ||
if not array.flags.aligned and not array.flags.writeable: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From numpy documentation: https://numpy.org/devdocs/dev/alignment.html only structured arrays will cause this, meaning that this is a non issue based on the dtypes of use for oneDAL (float32, float64, int32, and int64), and would failt to convert in table.cpp anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think numpy also allows creating non-aligned arrays out of custom non-owned pointers, for example through PyArray_SimpleNewFromData, so in theory there could be a non-aligned float32 array or similar. But it'd be a very unlikely input, since default allocators in most platforms are aligned. Also an easier conversion could be through np.require. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, I'll probably because of this too I will make a stricter check on C++ side https://github.com/intel/scikit-learn-intelex/blob/main/onedal/decomposition/pca.py#L150 probably using https://numpy.org/devdocs/reference/c-api/array.html#c.PyArray_FromArray where I can specify the requirements to be aligned and owned (rather than use PyArray_GETCONTIGUOUS) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @david-cortes-intel I looked into it, the checks PyArray_ISFARRAY_RO and PyArray_ISCARRAY_RO check for alignment natively (https://numpy.org/devdocs/reference/c-api/array.html#c.PyArray_ISFARRAY_RO), and then PyArray_GETCONTIGUOUS also returns an aligned (well-behaved) copy regardless of the ownership. The change in the checks was to make sure that numpy objects wouldn't cause an infinite recursion. So the issue was taken care of, even if I didn't realize it. Just as a note, I am trying to minimize the number of numpy calls because of upcoming array_api support which we cannot rely on direct numpy calls on non-numpy arrays (say dpctl tensors). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With respect to non-ownership, I don't want to step on anyone's toes, as changes to PCA are occuring here which may inpact the ownership check there: #2106 |
||
array = np.array(array.tolist()) | ||
|
||
# TODO: If data is not contiguous copy to contiguous | ||
# Need implemeted numpy table in oneDAL | ||
if not array.flags.c_contiguous and not array.flags.f_contiguous: | ||
array = np.ascontiguousarray(array, array.dtype) | ||
return array | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, could you explain what would happen if an error occurs here? Additionally, is there a risk of a memory leak in this scenario?
For example, is there some error flag that should be checked?
Wouldn't it be safer to use the Python NumPy API for such checks and conversions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Decref is necessary to prevent a memory leak (otherwise leaks in BasicStatistics occur in all CIs), the memory is effectively owned by the table object, who will call its destructor with the pycapsule. This work makes to_table support non-contiguous arrays, which will simplify the python coding occurring before to_table. This work will enable
_check_array
to be moved fromonedal
tosklearnex
simplifying the rollout of the new finite checker to SVM and neighbors algorithms. There is no error handling available (that I can find) in the numpy C-api as also numpy generally doesn't raise errors like that. The checks that it is 1) a numpy array and 2) of certain aligned and type characteristics will prevent errors, and the use of the numpy c-api is very standardized.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it can fail to allocate, in which case it will return
NULL
:https://github.com/numpy/numpy/blob/cf9598572528318a54489b3c9ed5f65ef042e8c8/numpy/_core/src/multiarray/convert.c#L495
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh thank you for doing that research @david-cortes-intel , that would mean in the case of a failure in allocation, it will fail to decref. I will add a check there.