Multimodal Agents · Visual Generation

Building Generative Intelligence for Open-Ended Multimodal Creation

I am an undergraduate student at South China University of Technology. My research focuses on multimodal agent systems that turn human intent into controllable visual and interactive content. I am broadly interested in agents for image generation, video generation, 3D/4D generation, visual reasoning, and iterative editing: systems that can decompose creative tasks, coordinate specialized models or tools, and improve outputs through planning, feedback, and self-reflection. I work closely with Jinxiu Liu.

Experience

Mar. 2025 - Present

Multimodal Agents for Visual Creation

Research projects across multimodal agents, controllable image/video generation, motion transfer, 3D mesh generation, visual reasoning, and 2D/3D/4D creation workflows.

Papers

Reserved

Selected papers will be added here.

Projects

Open-source work
OpenDinq interface preview
GitHub Open-source

OpenDinq

An open-source product alpha for evidence-backed AI-native profiles, card workspaces, and explainable people discovery.

Agentic Profile Generation Evidence-Grounded Reasoning Card-Based Workflow People Discovery
View repository
GitHub Coming soon

Multi-Agent Blender Generation System

A system for 2D/3D/4D generation in Blender that coordinates visual reasoning agents and symbolic-program agents to turn visual intent into structured, editable scenes, inspired by vision-as-inverse-graphics workflows.

Multi-Agent Systems Vision as Inverse Graphics 2D / 3D / 4D Generation Blender
Repository pending

Awards

2025

Meritorious Winner

Mathematical Contest in Modeling / Interdisciplinary Contest in Modeling.

2024

Second Prize

Contemporary Undergraduate Mathematical Contest in Modeling, Guangdong Division.