Open-sourced by Meituan,
LongCat Video Avatar1.5 is an audio-driven avatar model that generates high-precision lip-synced short videos from one portrait and audio. Upgraded with Whisper encoder, it achieves better synchronization across multilingual, fast-spoken and singing content. Reduced sampling steps to 8 greatly accelerates inference and allows local deployment on 8G VRAM GPUs. Beyond single real-human clips, it supports anime and animal avatars, multi-person interactive scenes and video continuation, serving bulk production of self-media clips, e-commerce introductions and virtual streamer videos.https://www.longcatavatarai.com/longcat-video-avatar-1-5