-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contigugous loss after opt-einsum that may affect the efficiency of subsequent operations #211
Comments
My initial guess is the You can call |
Thanks for the suggestion. Modifying the code above as
then
still leads to
Revising as
leads to
or
also similar. I don't know how to see the version of
suggests I am using |
I need to confirm, but this appears to be a bug. |
I searched the source code of |
We should call np.asanyarray near the end of the computation which we currently do not do officially booting this to bug status. Do recall this being dropped @jcmgray ? |
Currently In my opinion, it would be best to deprecate the order kwarg fully. Since as I think as @dgasmith mentioned, the speedups all come from dispatching to pairwise contractions, that when performed using essentially matrix multiplication, impose a certain ordering on the indices. This means only a single There's not any obvious and efficient way to build ordering in that doesn't involve relying on details of the underlying pairwise contraction (e.g. cutensor as mentioned in #209), and One could also force the last contraction to be sidenote: can one search within or sort the index sorting?Sorting how indices appear can be an important micro optimization for both via-BLAS and in fact direct (einsum/cutensor) contraction implementations. You can modify and visualize it a bit with By default, (as with One can try and make the final contraction 'as contiguous as possible', but even then its not always possible: |
Thanks for the insights and suggestions. I am not sure if I understand your remarks fully, especially about the I think If I modify the example code to include ascontiguousarray as (sorry for the lengthy)
the result is
if I use
|
I think generally the optimizations In the cases here with 2 terms and thus a single pairwise contraction, I don't think there think is any general strategy to guarantee the order efficiently. In many contractions, doing a GEMM + transpose will be better, in this case, doing But at this low level (a single pairwise contraction, or as another example, very small tensors) the responsibility for this kind of optimization is more on the user, IMHO. Having said that, if you have some longer contraction expressions where it still makes a big difference that might be interesting. Also, I don't know if this holds for the actual contractions you are interested in, but if you are summing over einsum expressions, then the best thing might be to incorporate a batch dimension into the contraction like so: contraction = 'Xabcdef,Xfghij->abcdghij' |
After a tensor contraction, the resulting tensor from
numpy.einsum
is C-contiguous, and inopt_einsum
, such properties may be lost. If there is a further tensor addition, the non-contiguous array may be slower. Is there any solution to let the resulting tensor fromopt_einsum
be still C-contiguous? If not, is there any option to specify the resulting tensor to be C-contiguous, while letting the searching of the path within C-contiguous constrain?For example, the following code
The result is
which the
np.add
part is much slower inopt_einsum
thaneinsum
. If I usenp.ascontiguousarray
to let the resulting array be C-contiguous, this step itself is slow.The text was updated successfully, but these errors were encountered: