Tianshuo Peng | MultiMedia Lab, CUHK

About Me

My name is Tianshuo Peng (彭天硕). I am a 1st year PhD student at CUHK MultiMedia Lab, under the supervision of Prof. Xiangyu Yue. Before that, I obtained my bachelor degree at the School of Computer Science, Wuhan University (WHU), advised by Prof. Zuchao Li and Prof. Lefei Zhang. Currently, I’m a research intern in the Shanghai Artificial Intelligence Laboratory, collaborate with Bo Zhang.

My research interests lie in multi-modal perception and understanding, especially vision-language understanding. My goal is to build an efficient, unified, and versatile intelligent system for understanding and generating content in any modality.

👋👋👋 I am open to any potential discussions or collaboration opportunities. If you are interested in my work or have any collaboration intentions, please feel free to email me without hesitation.

Research Interests

Multi-Agent System: Multi Agent Collaboration, Planning and Execution in Complex Scenarios
Multi-Modal understanding and generating: Multi-Modal Representation Learning, Visual-Language Generation, Unified Understanding-Generation Model
Multi-Modal Large Language Models (MLLMs): Visual-Language Reasoning model, Multi-Modal Instruction Following

News

[Jun. 2025] Our paper about Integration of Generalist and Specialist in VLMs is accepted to ICCV 2025. See you in Hawaii!
[May. 2025] I was honored to be named an Outstanding Graduate of Wuhan University, as well as for my Outstanding Graduation Thesis. I will always cherish the memories of Luojia Mountain!
[Apr. 2025] I was awarded the Lei Jun Breakthrough Scholarship (5 students schoolwide) — thank you all!
[Mar. 2025] I was selected as the Academic Pioneer of the School of Computer Science at WHU (2 students schoolwide). I’m truly grateful for the support from all my classmates and professors!
[Jul. 2024] Our paper about large multi-modal models is accepted to MM 2024. Providing some interesting discussion about how different modalities existing in LMMs.
[Dec. 2023] Our paper about multi-modal aspect based sentiment analysis is accepted to AAAI 2024. Interesting discussion provided to reveal how LMMs see
[May. 2023] Our paper about information extraction is accepted to ACL 2023. This moment means a lot for me!

Selected Publications

Preprint

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Jiakang Yuan†, Tianshuo Peng†, Yilei Jiang, Yiting Lu, Renrui Zhang, Kaituo Feng, Chaoyou Fu, Tao Chen*, Lei Bai, Bo Zhang*, Xiangyu Yue (*Corresponding authors, †Equal contribution)

PDF Code

ICCV

Chimera: Improving Generalist Model with Domain-Specific Experts

Tianshuo Peng†, Mingsheng Li†, Jiakang Yuan, Hongbin Zhou, Renqiu Xia, Renrui Zhang, Lei Bai, Song Mao, Bin Wang, Aojun Zhou, Botian Shi, Tao Chen, Bo Zhang*, Xiangyu Yue* (*Corresponding authors, †Equal contribution)

International Conference on Computer Vision (ICCV), 2025.

PDF Code

ACM MM

Multi-modal Auto-regressive Modeling via Visual Tokens

Tianshuo Peng†, Zuchao Li†*, Lefei Zhang, Hai Zhao, Ping Wang, Bo Du (*Corresponding authors, †Equal contribution)

ACM Multimedia 2024 (MM), 2024.

PDF Code Poster Presentation

AAAI

A Novel Energy Based Model Mechanism for Multi-Modal Aspect-Based Sentiment Analysis

Tianshuo Peng†, Zuchao Li†*, Ping Wang, Lefei Zhang, and Hai Zhao (*Corresponding authors, †Equal contribution)

The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), 2024.

PDF Code Poster Presentation

ACL

FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction

Tianshuo Peng†, Zuchao Li†*, Lefei Zhang, Bo Du, Hai Zhao (*Corresponding authors, †Equal contribution)

The 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

PDF Code Poster Presentation