Generation Mode
Learn about the two lip sync generation modes—Standard and Precision—and how to choose the right one for your video.
Overview
Vozo provides two modes for lip syncing: Standard Mode and Precision Mode.
Each mode is designed for different types of videos and output quality requirements.
Choosing between modes is available to members only. Free users will automatically use Standard Mode by default.
Key Differences
Standard Mode | Precision Mode | |
---|---|---|
⏱️ Speed | Fast: ~10 minutes for queue and processing | Slower: ~2 hours for queue and processing |
✅ Best for | Most front-facing videos with clear facial visibility | Videos with complex visual details (e.g., beards, wrinkles, side profiles) |
⚠️ Limitations | Less accurate on side profiles or detailed facial textures | Not good if the original video has static faces or minimal mouth movement |
Standard Mode
Standard Mode is optimized for speed and works well in general cases, including:
-
Videos with clear, front-facing speakers
-
Projects where a quick turnaround matters more than fine detail
Example of original video suitable for standard mode
Precision Mode
Precision Mode provides greater accuracy and attention to detail. It’s ideal for:
- Videos with side profiles or complex facial details, such as facial hair or distinguishing features
- Professional content requiring high-quality lip syncing
Precision Mode relies on learning the mouth movements from the original video. It’s not suitable for videos with static or minimal mouth movement, such as AI-generated videos where the speaker’s mouth doesn’t move naturally.
Example of original video suitable for precision mode