-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for large datasets #24
Comments
We will soon support the reconstruction for videos with such a large number of images, leveraging the assumed time continuity. For unordered images of such quantity, we may take a bit more time. Is your dataset composed of videos or unordered images? |
Our datasets are composed of ordered images but not in video format, they were taken using a drone on a flight path so there is some distance between a photo and the image that follows it. There are also some discontinuities where we don’t take pictures when the drone turns a corner. |
Hi it should be possible to solve it (my guess, but cannot guarantee it). I will let you know when the video version is ready. |
@jytime I think it could be interesting to verify the robustness on different FPS and also other classical video issues like MB, defocus etc.. Also, as we have discussed in #9 (comment) it could be not always easy to maintain the assumption that all points are static/rigid specially on long sequences: I think that without static and dynamic point clustering or classification (like occlusion/visibility) the risk of requiring an uncontrolled number of binary masks on videos for each required query frame it could be very problematic. |
Other then this I will add that some SOTA points trackers are still often failing on odometry-like sequences google-deepmind/tapnet#72 |
Hi @bhack Yes I agree with this point. I am testing the effect using an off-the-shelf video motion segmentation model to filter out the dynamic pixels. |
Many motion segmentation models that rely on optical flow networks are suffering on the described classical video effects (defocus/MB). It is also in the limitation section of the recent ECCV 2024 DecoMotion (mentioned in the previous message). |
Thanks for the amazing work. We have datasets ranging from 1k images to 14k images and we were wondering if vggsfm is going to support images of that magnitude in the pipeline in the near future. The simplest solution I am guessing is you could probably separate the images in submodels with overlap, process them using vggsfm and then combine them.
The text was updated successfully, but these errors were encountered: