Mu Chen (陈牧)

I'm a Ph.D. student at University of Technology Sydney (UTS), affiliated with ReLER Lab, Australian Artificial Intelligence Institute (AAII,), advised by Prof. Yi Yang. I got my B.S. from Monash University in 2021.

Email  /  Google Scholar  /  Github

profile photo

News

  • 🔥2025.5: Our work DiffVsgg is accepted by CVPR'25!
  • 🔥2024.7: Our work DCF is accepted by ACM MM'24 as Oral!
  • 🔥2024.7: Our work GvSeg is accepted by ECCV'24!
  • 2024.7: Our work UAHOI is accepted by CVIU'24!
  • 2023.7: Our work PiPa is accepted by ACM MM'23!

My research interests lie in the intersection of computer vision and human visual reasoning. I began my early graduate studies by enhancing generalization capabilities of deep models for scene understanding tasks such as image/video segmentation. Then I applied cutting-edge techniques, such as diffusion models and LLMs, to advance research in high-level scene understanding tasks such and Video Scene Graph Generation. Recently, with the insight that 3D Scene Graph Generation and 3D Scene Generation are highly correlated, I have been pursuing research in hierarchical scene-layout modeling for navigation robotics. I am also exploring LLM-driven multi-agent systems with applications in computer vision and social simulation. Feel free to contact me about any questions.

Selected Publications
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation
Mu Chen, Liulei Li, Wenguan Wang †, Yi Yang
CVPR, 2025
arXiv / code

Drawing inspiration from Latent Diffusion Models (LDMs) which generate images via denoising a latent feature embedding, we unify the decoding of object classification, bounding box regression, and graph generation three tasks using one shared feature embedding. Then, given an embedding containing unified features of object pairs, we conduct a step-wise Denoising on it within LDMs, so as to deliver a clean embedding which clearly indicates the relationships between objects.

GvSeg: General and Task-Oriented Video Segmentation
Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang †
ECCV, 2024
arXiv / code /  video (AI TIME)

We present GvSeg, a general and task-oriented video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintain- ing an identical architectural design.

Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation
Mu Chen, Zhedong Zheng, Yi Yang †
ACM Multimedia, 2024   (Oral Presentation, 3.97% Accept Rate)
arXiv / code /  video (极市)

We observe that semantic categories, such as sidewalks, buildings, and sky, display relatively consistent depth distributions, and could be clearly distinguished in a depth map. Based on such observation, we propose a depth-aware framework to explicitly leverage depth estimation to mix the categories and facilitate the two complementary tasks, i.e., segmentation and depth learning in an end-to-end manner.

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation
Mu Chen, Zhedong Zheng, Yi Yang, Tat-seng Chua †
ACM Multimedia, 2023
arXiv /  video (AI 新青年) / code

We propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts.

UAHOI: Uncertainty-aware robust interaction learning for HOI detection
Mu Chen, Minghan Chen, Yi Yang †
Computer Vision and Image Understanding (CVIU), 2024
arXiv

We propose a novel approach UAHOI, Uncertainty-aware Robust Human-Object Interaction Learning that explicitly estimates prediction uncertainty during the training process to refine both detection and interaction predictions.

PiPa++: Towards Unification of Domain Adaptive Semantic Segmentation via Self-supervised Learning
Mu Chen, Zhedong Zheng †, Yi Yang
under IJCV review, 2025
arXiv / code

An extension version of PiPa towards Video domain.

Selected Awards

  • Outstanding Reviewer, ACM Multimedia Main Conference, USA, 2024
  • UTS Post Thesis Award, Australia, 2024 (3,000 AUD)
  • ACM Travel Grants, USA, 2024 (1,000 USD)
  • Outstanding Reviewer, ACM Multimedia UAVM Workshop, USA, 2024
  • Outstanding Reviewer, ACM Multimedia UAVM Workshop, USA, 2023
  • First-Class Honour, Monash University, Australia, 2021
  • Summer Research Scholarship, Monash University, Australia, 2021 (3,200 AUD)
  • Tuition Fee Waiver Scholarship, Monash University, Australia, 2019-2021 (awarded four times, totaling 18,000 AUD)
  • Dean's Honour List, Monash University, Australia, 2019-2021
  • Undergraduate Student Support Grant, Monash University, Australia, 2018 (8,000 AUD)
  • Academic Service

  • Journal Reviewer: IJCV, TPAMI, TIP, TMM, TNNLS, TCSVT, KBS, CVIU, PR, Neurocomputing, Information Fusion, Visual Computer
  • Conference Reviewer: ICLR, NeurIPS, ACL, ICCV, VR, ACM MM, ICWSM, NeurIPS (FM4Science, Bayesian Decision-making and Uncertainty)


  • Code stolen from Jon Barron 0v0.