Support for large datasets #24

MatthewDZane · 2024-07-08T05:01:13Z

Thanks for the amazing work. We have datasets ranging from 1k images to 14k images and we were wondering if vggsfm is going to support images of that magnitude in the pipeline in the near future. The simplest solution I am guessing is you could probably separate the images in submodels with overlap, process them using vggsfm and then combine them.

jytime · 2024-07-09T21:41:36Z

Hi @MatthewDZane

We will soon support the reconstruction for videos with such a large number of images, leveraging the assumed time continuity. For unordered images of such quantity, we may take a bit more time. Is your dataset composed of videos or unordered images?

MatthewDZane · 2024-07-10T20:07:14Z

Our datasets are composed of ordered images but not in video format, they were taken using a drone on a flight path so there is some distance between a photo and the image that follows it. There are also some discontinuities where we don’t take pictures when the drone turns a corner.

jytime · 2024-07-10T20:57:21Z

Hi it should be possible to solve it (my guess, but cannot guarantee it). I will let you know when the video version is ready.

bhack · 2024-07-13T15:15:33Z

@jytime I think it could be interesting to verify the robustness on different FPS and also other classical video issues like MB, defocus etc..

Also, as we have discussed in #9 (comment) it could be not always easy to maintain the assumption that all points are static/rigid specially on long sequences:
https://github.com/qianduoduolr/DecoMotion
https://tracks-to-4d.github.io/
https://chiaki530.github.io/projects/leapvo/
https://henry123-boy.github.io/SpaTracker/ (Camera pose estimation in dynamic scene section)

I think that without static and dynamic point clustering or classification (like occlusion/visibility) the risk of requiring an uncontrolled number of binary masks on videos for each required query frame it could be very problematic.

bhack · 2024-07-13T15:46:33Z

Other then this I will add that some SOTA points trackers are still often failing on odometry-like sequences google-deepmind/tapnet#72

jytime · 2024-07-13T22:02:11Z

Hi @bhack Yes I agree with this point. I am testing the effect using an off-the-shelf video motion segmentation model to filter out the dynamic pixels.

bhack · 2024-07-13T23:06:40Z

Many motion segmentation models that rely on optical flow networks are suffering on the described classical video effects (defocus/MB).

It is also in the limitation section of the recent ECCV 2024 DecoMotion (mentioned in the previous message).
Probably we could simulate MB/defocus with augmentation over available datasets if you have not collect something specific with this effect.
Different FPS rate instead could be easily simulated.

bhack mentioned this issue Jul 25, 2024

Processing images from video #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for large datasets #24

Support for large datasets #24

MatthewDZane commented Jul 8, 2024

jytime commented Jul 9, 2024

MatthewDZane commented Jul 10, 2024

jytime commented Jul 10, 2024

bhack commented Jul 13, 2024 •

edited

Loading

bhack commented Jul 13, 2024

jytime commented Jul 13, 2024

bhack commented Jul 13, 2024

Support for large datasets #24

Support for large datasets #24

Comments

MatthewDZane commented Jul 8, 2024

jytime commented Jul 9, 2024

MatthewDZane commented Jul 10, 2024

jytime commented Jul 10, 2024

bhack commented Jul 13, 2024 • edited Loading

bhack commented Jul 13, 2024

jytime commented Jul 13, 2024

bhack commented Jul 13, 2024

bhack commented Jul 13, 2024 •

edited

Loading