Newsletter
Subscribe online
Subscribe to our newsletter for the latest news and updates
Wan2.1 VACE is an all-in-one video generation and editing model open-sourced by Alibaba, designed to provide users with an integrated video creation solution.
OpenAudio S1 is the latest text-to-speech (TTS) model launched by Fish Audio. Trained on over 2 million hours of audio data, it aims to deliver a highly natural speech synthesis experience.
Wan2.1 VACE is an all-in-one video generation and editing model open-sourced by Alibaba, designed to provide users with an integrated video creation solution.
Features
Wan2.1 VACE supports various video generation and editing tasks, including:
This versatility allows users to complete multiple creative needs within a single model, greatly improving work efficiency.
Outstanding Performance: Wan2.1 performs excellently in multiple benchmark tests, surpassing many existing open-source and commercial solutions. Its 14B version is particularly strong in generating high-quality videos.
Consumer GPU Support: The 1.3B version requires only 8.19GB of VRAM, making it capable of running on standard consumer graphics cards. This lowers the barrier to entry and enables more users to access high-quality video generation technology.
Wan2.1 VACE introduces a new Video Condition Unit (VCU) that can uniformly process different types of video inputs, including text, images, and video. This innovation makes the model more efficient in handling multimodal inputs, better meeting users’ creative needs.
Wan-VAE, the core component of the model, can efficiently encode and decode 1080P video while maintaining temporal coherence. This ensures the quality and detail of the generated videos, even during long video generation.
Wan2.1 is the first model capable of generating both Chinese and English text within videos. This feature greatly enhances its potential in multilingual environments, making it suitable for video content requiring subtitles or text overlays.
Application Scenarios
Short Video Production: Creators can use Wan2.1 VACE to quickly generate short video content suitable for social media platforms such as TikTok and Kuaishou. Users only need to provide text descriptions or reference images to generate engaging videos.
Online Education: Educators can use the model to produce instructional videos, generating vivid course content from text or images to enhance the learning experience.
Game Commentary: Game streamers can generate commentary videos by combining gameplay footage and commentary text, quickly producing high-quality content.
Animation Production: Animators can use Wan2.1 VACE for animation stylization and environment transformation, creating animations with unique visual styles.
Ad Creation: Brands can use the model to generate advertising videos, combining images and text to quickly produce promotional videos aligned with brand identity.
Product Demonstration: Enterprises can use Wan2.1 VACE to create product demo videos, showcasing product features and usage scenarios to boost consumer interest.
Art Video Production: Artists can use the model for video stylization, creating visually artistic works suitable for exhibitions and artistic sharing.
Experimental Video Creation: Creators can explore different visual styles and storytelling techniques, leveraging the model’s flexibility for innovative experimentation.
Personal Video Projects: Ordinary users can use Wan2.1 VACE to create personal videos such as travel logs or family gatherings, generating high-quality videos with ease.
Social Media Content: Users can quickly generate content suitable for social media sharing, enhancing personal brand image and influence.
Alibaba’s Wan2.1-VACE model has officially been open-sourced. The model supports various video generation and editing functions, including text-to-video, image-based reference video generation, video repainting, local editing, background extension, and video length extension. Two versions are available in this release: 1.3B and 14B. The 1.3B version is especially suited for running on consumer-grade GPUs, lowering the usage threshold and enabling more developers to participate in video creation.