About Me
I am currently a research assistant at the University of Illinois Urbana-Champaign, working with Prof. Tong Zhang. Previously, I was a visiting student at Rutgers University (2024-2025), advised by Prof. Hao Wang. I obtained my B.S. in Electronic Information Engineering from Wuhan University in 2024, graduating with a GPA of 3.60/4.0.
My research focuses on developing innovative approaches to enhance multimodal understanding and reasoning capabilities in artificial intelligence systems. I have extensive experience in multimodal large language models (MLLMs), computer vision, and reinforcement learning applications.
Research Interests
Multimodal Learning: Developing frameworks that bridge visual and textual understanding, particularly in geometric reasoning and cross-modal alignment.
Computer Vision & Reinforcement Learning: Designing efficient neural architectures and reinforcement learning frameworks for visual understanding tasks.
Trustworthy AI: Working on interpretability and reliability of large language models and multimodal systems.
Formal Languages & Reasoning: Exploring the potential of formal languages to augment reasoning capabilities in AI systems.
Recent Work
I am currently working on Geo-Image-Textualization, a reinforcement learning-based framework for generating semantically aligned geometry image-caption pairs. This work has resulted in the creation of GeoReasoning-10K, the first dataset with full modality equivalence for geometric reasoning.
My research has been published in top-tier venues including NAACL, Neural Networks, and has submissions under review at NeurIPS, IJCAI, and TMLR. I have also contributed to comprehensive surveys in continual learning for large language models.