Mu Chen (陈牧)
I'm a Ph.D. student at University of Technology Sydney (UTS), affiliated with ReLER Lab, Australian Artificial Intelligence Institute (AAII,), advised by Prof. Yi Yang. I got my B.S. from Monash University in 2021.
Email  / 
Google Scholar  / 
Github
|
|
News
- 🔥2025.5: Our work DiffVsgg is accepted by CVPR'25!
- 🔥2024.7: Our work DCF is accepted by ACM MM'24 as Oral!
- 🔥2024.7: Our work GvSeg is accepted by ECCV'24!
- 2024.7: Our work UAHOI is accepted by CVIU'24!
- 2023.7: Our work PiPa is accepted by ACM MM'23!
|
My research interests lie in the intersection of computer vision and human visual reasoning. I began my early graduate studies by enhancing generalization capabilities of deep models for scene understanding tasks such as image/video segmentation. Then I applied cutting-edge techniques, such as diffusion models and LLMs, to advance research in high-level scene understanding tasks such and Video Scene Graph Generation.
Recently, with the insight that 3D Scene Graph Generation and 3D Scene Generation are highly correlated, I have been pursuing research in hierarchical scene-layout modeling for navigation robotics. I am also exploring LLM-driven multi-agent systems with applications in computer vision and social simulation.
Feel free to contact me about any questions.
|
Selected Publications
|
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation
Mu Chen,
Liulei Li,
Wenguan Wang †,
Yi Yang
CVPR, 2025
arXiv
/
code
Drawing inspiration from Latent Diffusion Models (LDMs) which generate images via denoising a latent feature embedding, we unify the decoding of object classification, bounding box regression, and graph generation three tasks using one shared feature embedding. Then, given an embedding containing unified features of object pairs, we conduct a step-wise Denoising on it within LDMs, so as to deliver a clean embedding which clearly indicates the relationships between objects.
|
|
GvSeg: General and Task-Oriented Video Segmentation
Mu Chen,
Liulei Li,
Wenguan Wang,
Ruijie Quan,
Yi Yang †
ECCV, 2024
arXiv
/
code
/
 video (AI TIME)
We present GvSeg, a general and task-oriented video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintain- ing an identical architectural design.
|
|
Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation
Mu Chen,
Zhedong Zheng,
Yi Yang †
ACM Multimedia, 2024   (Oral Presentation, 3.97% Accept Rate)
arXiv
/
code
/
 video (极市)
We observe that semantic categories, such as sidewalks, buildings, and sky, display relatively consistent depth distributions, and could be clearly distinguished in a depth map. Based on such observation, we propose a depth-aware framework to explicitly leverage depth estimation to mix the categories and facilitate the two complementary tasks, i.e., segmentation and depth learning in an end-to-end manner.
|
Selected Awards
Outstanding Reviewer, ACM Multimedia Main Conference, USA, 2024
UTS Post Thesis Award, Australia, 2024 (3,000 AUD)
ACM Travel Grants, USA, 2024 (1,000 USD)
Outstanding Reviewer, ACM Multimedia UAVM Workshop, USA, 2024
Outstanding Reviewer, ACM Multimedia UAVM Workshop, USA, 2023
First-Class Honour, Monash University, Australia, 2021
Summer Research Scholarship, Monash University, Australia, 2021 (3,200 AUD)
Tuition Fee Waiver Scholarship, Monash University, Australia, 2019-2021 (awarded four times, totaling 18,000 AUD)
Dean's Honour List, Monash University, Australia, 2019-2021
Undergraduate Student Support Grant, Monash University, Australia, 2018 (8,000 AUD)
|
Academic Service
Journal Reviewer: IJCV, TPAMI, TIP, TMM, TNNLS, TCSVT, KBS, CVIU, PR, Neurocomputing, Information Fusion, Visual Computer
Conference Reviewer: ICLR, NeurIPS, ACL, ICCV, VR, ACM MM, ICWSM, NeurIPS (FM4Science, Bayesian Decision-making and Uncertainty)
|
Code stolen from Jon Barron 0v0.
|
|