-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] injest fiftyone datasets #1957
Comments
@nmichlo thanks a lot for opening the issue. Curious can you give us more context on the use case why would you like to import FiftyOne datasets? (what you like and don't like in FiftyOne?) |
Use Case: As part of my day to work I often need to find, download, import and pre-process many different existing datasets which are all usually in common formats like COCO or YOLOv5. Occasionally I will need to write a script to import a custom format, but I generally try and avoid that. These datasets are then often merged together or added to existing datasets that are then used for re-training. Improving models by iterating on the data, ultimately version control here would be great, which is why deep-lake is so appealing.
Fiftyone, the good and bad: Disclaimer: my overall experience with fiftyone is still fairly limited, my main use case however is the import/export functionality, combined with the local preview of datasets, occasional dataset filtering and renaming/removing labels. Ultimately I would love to replace fiftyone entirely with deeplake, and store datasets in our own cloud buckets. What is good about fiftyone:
What is bad about fiftyone:
EDIT: overall, deeplake has been extremely refreshing to work with. Really good work on the project so far! EDIT2: might be worth adding fiftyone to the README section on "Comparisons to Familiar Tools"? EDIT3: I can provide examples of my own import fiftyone -> deeplake script, but it is definitely not general in any sense. It was tailored to a specific format, purely as a test. |
Based on my clarified use case, I might even argue with my own issue, in that fiftyone injest would be a nice-to-have, and ultimately a better solution might be built in support for ingesting and exporting common dataset formats. EDIT: this could also serve as a good way of documenting / providing examples of real-world use cases, that can be adapted. |
Hey @nmichlo Thank you for the feedback. This is extremely useful for our product development. As I was reading your comments, I had the same thought as your last note:
Just want to clarify that I understand it correctly, because it appears aligned with our roadmap. Would you rather have a function to ingest from 51, or a set of function to ingest from dataset formats such as YOLO, COCO, CVAT, LabelStudio, and others? |
@istranic no problem, glad to help! Ideally in the long run I personally would prefer not to use fiftyone, and ingest/export datasets directly. However, I think there might be merit for both?
|
Got it. Thanks @nmichlo! |
🚨🚨 Feature Request
Is your feature request related to a problem?
My problem is being able to ingest fiftyone datasets into deeplake
If your feature will improve
HUB
Fiftyone is a common dataset import and export tool, integration with deeplake would make such operations easy, and would mean that we do not have to implement such operations from scratch.
Description of the possible solution
An alternative solution to the problem can look like
Ingest steps could be written manually. (Fiftyone doesn't enforce much structure on the datasets so I am not sure if the original ingest function even has a distinct solution, maybe some basic structure would be required).
Teachability, Documentation, Adoption, Migration Strategy
Needs discussion first
The text was updated successfully, but these errors were encountered: