Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

Published in NAACL 2025 Main, 2024

This work introduces a comprehensive benchmark for evaluating the long-context capabilities of multimodal large language models, extending the traditional “needle in a haystack” evaluation to multimodal settings.

Key Contributions

Evaluated performance of InstructBLIP vicuna/t5 on custom-developed benchmark
Provided systematic analysis of current MLLM limitations in long-context scenarios

Status: NAACL 2025 Main Conference

Recommended citation: Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, Hao Wang. "Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models." NAACL 2025 Main.
Download Paper | Download Slides

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Wenyuan Wang(王文渊)

Key Contributions

Share on