-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal scaling for expressions of constant treewidth #117
Comments
It does seem the current I have some optimizers based on the linegraph tree decomposition, and they both find the optimal scaling quickly for the largest (Apollonian-6) graph:
A hypergraph partitioner based optimizer, which usually outperforms both of those, still struggles a bit and gets:
Whereas a method that adaptively updates the usual greedy heuristic only gets:
|
We have always focused on generalized expressions, but have not focused on specific networks. Are there other networks beyond Apollonian that we should consider adding to the test suite and optimizing for? |
Perhaps random subgraphs of k-trees? Any einsum graph computable in O(n^{k+1}) time is a subgraph of a k-tree, so if you make it work with probability 1 for some k, you have achieved optimality for all problems of a certain size. k<=4 covers a bulk of real-world problems that are feasible to compute exactly I've added some k-trees to colab. Generated using code here, using the simplex "gluing" approach from https://en.wikipedia.org/wiki/K-tree In real-world you have a fixed compute budget, which limits the largest size of intermediate factors. Under this constraint the problem of optimal decomposition is fixed-parameter tractable -- getting optimal decomposition is linear in size of graph. That particular algorithm doesn't seem practical, but even for heuristics, "maximum size of intermediate factor" constraint should limit the search space significantly. Ideally, an einsum optimizer could just tell you that computation is not possible for your scaling budget, so you would then know to reformulate the problem. The algorithm in 10.5 of Kleinberg/Tardos book gives a linear time algorithm that either produces width=4*w decomposition, or proof that treewidth is more than w. BTW, I'm curious what is your workflow for evaluating these methods.
|
Ah the printed info is actually an object (with a verbose path, info = oe.contract_path(eq, *shapes, shapes=True)
info.opt_cost
info.largest_intermediate
max(info.scale_list)
# etc |
|
Couple more examples, I expected to always get O(n^2) schedule from tree structured graphs, but I get O(n^4) schedule 66% of the time using Here's an example that takes 100 random subgraphs of tree structured expression below and prints scaling of resulting einsum optimization
|
@jcmgray Do you think we can fix this with current algorithms or should we look at adding flowcutter/quickbb for these highly structured graphs? |
The practical need is the following -- if there's a schedule that computes an expression in reasonable time, discovering this schedule should also take reasonable time. The assumption of "reasonable time" puts an upper bound on tree-width of the expression graph. Perhaps the |
BTW, I've recently reran this comparing against np.einsum, and the issue still exists. However, opt_einsum's dp optimizer seems to be better than np.einsum optimizer. Numpy seems to use this algorithm Here's the colab comparing the methods What motivated this is that there's @rgommers was working including np.einsum in the array API, which requires deciding which method to include in the API standard for optimization. |
I added the greedy and optimal versions of the What we are looking for here is that the minimal scaling, but not flop count is found in a reasonable time envelope for an arbitrary graph. Let me check if a few facts are true:
There are a few ideas there which should be simple to implement if we can cover the use cases. Is there a working group we can join to find out more about the API standard? |
I learned about np.einsum from this discussion, @rgommers -- could you point us to where Python Array API discussions are happening? |
Sure, this is the main repo: https://github.com/data-apis/array-api/issues |
I'm seeing opt_einsum generate suboptimal schedules for einsum graphs of small treewidth
For instance, family of Apollonian networks are treewidth=3 graphs where optimal decomposition is discovered by greedily triangulating the graph using minfill heuristic. The optimal scaling order is n^4 regardless of graph size.
Here's an example graph and its tree decomposition (generated with this code)
This problem corresponds to the following einsum:
BC,BD,BE,BF,BG,BI,BJ,BL,BM,CD,CE,CF,CH,CI,CK,CO,CP,DE,DG,DH,DL,DN,DO,DQ,EF,EG,EH,EJ,EK,EM,EN,EP,EQ,FI,FJ,FK,GL,GM,GN,HO,HP,HQ->
Default optimizer gives n^9 scaling. optimizer=dp recovers n^4 but seems a bit slow and doesn't work for next largest graph, where default contraction order is O(n^20)
Even larger example is Apollonian-6 with 1095 edges where opt_einsum's
optimize=auto
gives O(n^60) contraction order -- colabThe text was updated successfully, but these errors were encountered: