Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for large datasets #24

Open
MatthewDZane opened this issue Jul 8, 2024 · 7 comments
Open

Support for large datasets #24

MatthewDZane opened this issue Jul 8, 2024 · 7 comments

Comments

@MatthewDZane
Copy link

Thanks for the amazing work. We have datasets ranging from 1k images to 14k images and we were wondering if vggsfm is going to support images of that magnitude in the pipeline in the near future. The simplest solution I am guessing is you could probably separate the images in submodels with overlap, process them using vggsfm and then combine them.

@jytime
Copy link
Contributor

jytime commented Jul 9, 2024

Hi @MatthewDZane

We will soon support the reconstruction for videos with such a large number of images, leveraging the assumed time continuity. For unordered images of such quantity, we may take a bit more time. Is your dataset composed of videos or unordered images?

@MatthewDZane
Copy link
Author

Our datasets are composed of ordered images but not in video format, they were taken using a drone on a flight path so there is some distance between a photo and the image that follows it. There are also some discontinuities where we don’t take pictures when the drone turns a corner.

@jytime
Copy link
Contributor

jytime commented Jul 10, 2024

Hi it should be possible to solve it (my guess, but cannot guarantee it). I will let you know when the video version is ready.

@bhack
Copy link

bhack commented Jul 13, 2024

@jytime I think it could be interesting to verify the robustness on different FPS and also other classical video issues like MB, defocus etc..

Also, as we have discussed in #9 (comment) it could be not always easy to maintain the assumption that all points are static/rigid specially on long sequences:
https://github.com/qianduoduolr/DecoMotion
https://tracks-to-4d.github.io/
https://chiaki530.github.io/projects/leapvo/
https://henry123-boy.github.io/SpaTracker/ (Camera pose estimation in dynamic scene section)

I think that without static and dynamic point clustering or classification (like occlusion/visibility) the risk of requiring an uncontrolled number of binary masks on videos for each required query frame it could be very problematic.

@bhack
Copy link

bhack commented Jul 13, 2024

Other then this I will add that some SOTA points trackers are still often failing on odometry-like sequences google-deepmind/tapnet#72

@jytime
Copy link
Contributor

jytime commented Jul 13, 2024

Hi @bhack Yes I agree with this point. I am testing the effect using an off-the-shelf video motion segmentation model to filter out the dynamic pixels.

@bhack
Copy link

bhack commented Jul 13, 2024

Many motion segmentation models that rely on optical flow networks are suffering on the described classical video effects (defocus/MB).

It is also in the limitation section of the recent ECCV 2024 DecoMotion (mentioned in the previous message).
Probably we could simulate MB/defocus with augmentation over available datasets if you have not collect something specific with this effect.
Different FPS rate instead could be easily simulated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants