You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The dpex_k implementation for DBSCAN currently fails execution with a rather cryptic message stating:
"Datatypes of array passed to @numba_dpex.kernel has to be the same. Passed datatypes: "...
The error message needs fixing and I am working on a dpex PR to address that.
What the error message is really telling is that the implementation does not follow the "compute follows data" programming model. Under the compute follows data programming model the execution queue for the kernel should be discoverable from the input array arguments.
There are two problems with the current implementation.
In the dpex_k implementation, the arguments to the kernel are n_samples, min_pts, assignments, sizes, indices_list. Of these, sizes and indices_list are not allocated in the initialize function and therefore are never copied to usm_ndarray. The kernel inputs are a mix of numpy.ndarray and dpctl.tensor.usm_ndarray and there is no way to infer the execution queue using compute follows data. Thus, the dpex error. To fix the issue, the creation of these two arrays need to be moved into the initialize call.
Fixing the first problem will lead to the next issue that is hidden by the first failure right now. Only the get_neighborhood function is a kernel. The compute_cluster function is a njit function. Currently, njit functions cannot consume usm_ndarray and thus to make it work we will have to copy data back to host after the get_neighborhood call. Doing so will mess up the timing measurement. Moreover, implementing dbscan_dpex_k in this fashion is inaccurate in terms of comparing the kernel implementation with other implementations as the whole benchmark never runs on a device/GPU. If implemented this way, comparing the timing of kernel with any other implementation is not an apples to apples comparison. We either need to implement compute_cluster as a kernel or remove the dbscan_dpek_k module.
The text was updated successfully, but these errors were encountered:
The dpex_k implementation for DBSCAN currently fails execution with a rather cryptic message stating:
The error message needs fixing and I am working on a dpex PR to address that.
What the error message is really telling is that the implementation does not follow the "compute follows data" programming model. Under the compute follows data programming model the execution queue for the kernel should be discoverable from the input array arguments.
There are two problems with the current implementation.
In the dpex_k implementation, the arguments to the kernel are
n_samples
,min_pts
,assignments
,sizes
,indices_list
. Of these,sizes
andindices_list
are not allocated in theinitialize
function and therefore are never copied tousm_ndarray
. The kernel inputs are a mix ofnumpy.ndarray
anddpctl.tensor.usm_ndarray
and there is no way to infer the execution queue using compute follows data. Thus, the dpex error. To fix the issue, the creation of these two arrays need to be moved into the initialize call.Fixing the first problem will lead to the next issue that is hidden by the first failure right now. Only the
get_neighborhood
function is a kernel. Thecompute_cluster
function is anjit
function. Currently,njit
functions cannot consumeusm_ndarray
and thus to make it work we will have to copy data back to host after theget_neighborhood
call. Doing so will mess up the timing measurement. Moreover, implementingdbscan_dpex_k
in this fashion is inaccurate in terms of comparing the kernel implementation with other implementations as the whole benchmark never runs on a device/GPU. If implemented this way, comparing the timing of kernel with any other implementation is not an apples to apples comparison. We either need to implementcompute_cluster
as a kernel or remove the dbscan_dpek_k module.The text was updated successfully, but these errors were encountered: