Geo-Image-Textualization

Geo-Image-Textualization Framework

A novel reinforcement learning-based approach for creating high-quality geometry image-caption datasets that enhance multimodal large language models’ reasoning capabilities.

Key Innovation

  • First-of-its-kind dataset: GeoReasoning-10K with full modality equivalence
  • RL-based synthesis: Intelligent generation of semantically aligned geometry-text pairs
  • Cross-modal enhancement: Significant improvements in geometric reasoning across multiple domains

Technical Achievements

  • Developed novel reward functions for geometry-text alignment
  • Created systematic evaluation protocols for geometric reasoning
  • Demonstrated performance improvements in Qwen-2.5-vl across geometry, arithmetic, algebraic, and numeric domains

Status: Submitted to NeurIPS 2025 Datasets and Benchmarks Track Period: March 2025 - May 2025 Institution: University of Illinois Urbana-Champaign Advisor: Prof. Tong Zhang