Skip to content

A Comprehensive Benchmark for Robust Multi-image Understanding

Notifications You must be signed in to change notification settings

muirbench/MuirBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

MuirBench

A Comprehensive Benchmark for Robust Multi-image Understanding

🌐 Homepage | 🤗 Dataset | 📖 Paper | 💻 Evaluation

News

Intro

MuirBench is a benchmark containing 11,264 images and 2,600 multiple-choice questions, providing robust evaluation on 12 multi-image understanding tasks.

  • MuirBench evaluates on a comprehensive range of 12 multi-image understanding abilities, e.g. geographic understanding, diagram understanding, visual retrieval, ..., etc, while prior benchmarks generally contain single-image questions.
  • MuirBench contains 10 diverse multi-image relations, e.g. narrative, complementary, etc.
  • MuirBench provides a robust evaluation on models by unanswerable instance variants. Three major ways to create the unanswerable instances are as below.

image

Results

Evaluated upon 20 recent multi-modal LLMs, our results reveal that even the best-performing models like GPT-4o and Gemini Pro find it challenging to solve MuirBench, achieving 68.0% and 49.3% in accuracy. Open-source multimodal LLMs trained on single images can hardly generalize to multi-image questions, hovering below 33.3% in accuracy. These results highlight the importance of MuirBench in encouraging the community to develop multimodal LLMs that can look beyond a single image, suggesting potential pathways for future improvements.

Disclaimer

MuirBench incorporates data sourced from established image datasets. Every effort has been made to ensure that the data presented in this paper is utilized in compliance with relevant copyright laws and appropriately credited. Should any copyright holder identify an image in our work that they believe infringes upon their licensing agreements, we invite them to contact us directly. We are committed to addressing any legitimate concerns in a timely and responsible manner.

Contact

Citation

@article{wang2024muirbench,
  title={MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding},
  author={Wang, Fei and Fu, Xingyu and Huang, James Y and Li, Zekun and Liu, Qin and Liu, Xiaogeng and Ma, Mingyu Derek and Xu, Nan and Zhou, Wenxuan and Zhang, Kai and others},
  journal={arXiv preprint arXiv:2406.09411},
  year={2024}
}

About

A Comprehensive Benchmark for Robust Multi-image Understanding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages