Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

Published in NAACL 2025 Main, 2025

This work introduces a comprehensive benchmark for evaluating the long-context capabilities of multimodal large language models, extending the traditional “needle in a haystack” evaluation to multimodal settings.

Key Contributions

  • Developed novel evaluation protocols for multimodal long-context understanding
  • Created comprehensive benchmarks spanning various modalities and context lengths
  • Provided systematic analysis of current MLLM limitations in long-context scenarios

Status: Accepted at NAACL 2025 Main Conference

Recommended citation: Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, Hao Wang. "Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models." NAACL 2025 Main.
Download Paper | Download Slides