Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for __partitioned__ as data source #152

Closed
fschlimb opened this issue Sep 14, 2021 · 0 comments
Closed

Support for __partitioned__ as data source #152

fschlimb opened this issue Sep 14, 2021 · 0 comments

Comments

@fschlimb
Copy link
Contributor

An API/protocol for exchanging distributed, partitioned data is being defined. The goal is to establish a uniform way of describing such data while making it possible to avoid unnecessary data copies and even zero copy and so avoid specialized implementations for different data containers (like RayDataSet, MLDataSet, modin etc). The protocol basically defines the "meta" data about the distribution and partitioning (tiling) of the data. The current proposal is here: https://github.com/IntelPython/DPPY-Spec/blob/draft/partitioned/Partitioned.md, an issue/discussion is here: IntelPython/DPPY-Spec#3

Any feedback/input to the discussion/spec is welcome.

There are several implementations/PRs for several projects, like HeAT, modin, MLDataSet and DAL.

It would be nice to have xgboost_ray support this, too. I will also open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants