AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

  • Hao Yu Hsu
  • , Chih Hao Lin
  • , Albert J. Zhai
  • , Hongchi Xia
  • , Shenlong Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern visual effects (VFX) software has made it possible for skilled artists to create imagery of virtually anything. However, the creation process remains laborious, complex, and largely inaccessible to everyday users. In this work, we present AutoVFX, a framework that automatically creates realistic and dynamic VFX videos from a single video and natural language instructions. By carefully integrating neural scene modeling, LLM-based code generation, and physical simulation, AutoVFX is able to provide physically-grounded, photorealistic editing effects that can be controlled directly using natural language instructions. We conduct extensive experiments to validate AutoVFX's efficacy across a diverse spectrum of videos and instructions. Quantitative and qualitative results suggest that AutoVFX outperforms all competing methods by a large margin in generative quality, instruction alignment, editing versatility, and physical plausibility.

Original languageEnglish (US)
Title of host publicationProceedings - 2025 International Conference on 3D Vision, 3DV 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages769-780
Number of pages12
ISBN (Electronic)9798331538514
DOIs
StatePublished - 2025
Event12th International Conference on 3D Vision, 3DV 2025 - Singapore, Singapore
Duration: Mar 25 2025Mar 28 2025

Publication series

NameProceedings - 2025 International Conference on 3D Vision, 3DV 2025

Conference

Conference12th International Conference on 3D Vision, 3DV 2025
Country/TerritorySingapore
CitySingapore
Period3/25/253/28/25

Keywords

  • llm agent
  • material editing
  • object insertion
  • physical simulation
  • scene simulation
  • text-guided video editing
  • visual effects

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'AutoVFX: Physically Realistic Video Editing from Natural Language Instructions'. Together they form a unique fingerprint.

Cite this