Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Published in NAACL 2025 Main, 2025
This work introduces a comprehensive benchmark for evaluating the long-context capabilities of multimodal large language models, extending the traditional “needle in a haystack” evaluation to multimodal settings.
Key Contributions
- Developed novel evaluation protocols for multimodal long-context understanding
- Created comprehensive benchmarks spanning various modalities and context lengths
- Provided systematic analysis of current MLLM limitations in long-context scenarios
Status: Accepted at NAACL 2025 Main Conference
Recommended citation: Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, Hao Wang. "Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models." NAACL 2025 Main.
Download Paper | Download Slides