Geo-Image-Textualization
Geo-Image-Textualization Framework
A novel reinforcement learning-based approach for creating high-quality geometry image-caption datasets that enhance multimodal large language models’ reasoning capabilities.
Key Innovation
- First-of-its-kind dataset: GeoReasoning-10K with full modality equivalence
- RL-based synthesis: Intelligent generation of semantically aligned geometry-text pairs
- Cross-modal enhancement: Significant improvements in geometric reasoning across multiple domains
Technical Achievements
- Developed novel reward functions for geometry-text alignment
- Created systematic evaluation protocols for geometric reasoning
- Demonstrated performance improvements in Qwen-2.5-vl across geometry, arithmetic, algebraic, and numeric domains
Status: Submitted to NeurIPS 2025 Datasets and Benchmarks Track Period: March 2025 - May 2025 Institution: University of Illinois Urbana-Champaign Advisor: Prof. Tong Zhang