LipSync API
Overview
Create LipSync Or Talking Photos
How it works
1
Host your assets at a publicly accessible URL
Upload your video, photo, and audio files so our servers can retrieve them.
2
Send an API request with the appropriate parameters
Reference your hosted assets and specify your desired mode (Standard or Precision).
3
Wait or query status
Use our webhook callback or poll the API with your job ID until processing is complete.
4
Download video output
Retrieve the finished talking photo or lip‑synced video from the provided URL.
Usage Limitation:
- You may have up to 5 concurrent jobs (including queued requests).
- Only single‑face videos or photos are supported.
- Estimated queue time: 1–120 minutes, depending on system load.
- Standard Mode processing time: ~10 minutes.
- Precision Mode processing time: ~20 minutes.
If a video or photo contains multiple faces, only the largest detected face will be lip‑synced.
API Error Codes
Code | Description |
---|---|
5 | Invalid request parameters. |
7 | No permission to request. |
104 | Insufficient credits. |
814 | Your account is not a member and is not allowed to call the API. |
1000 | Internal Server Error. |
1301 | Challenge failed. |
1302 | API key has been revoked. |
1304 | API key has reached the maximum number of concurrent requests. |
1502 | Your audio driver is either invalid or cannot be downloaded. |
1503 | Your account is not authorized to call the API. |
Job Error Codes
Code | Description |
---|---|
20403 | Not enough faces. |
20407 | The number of face tracks is too many. |
20408 | The image-to-video facial detection has not been passed. |
20601 | There are no faces in the picture. |
20602 | Unknown image format. |
20611 | Video triggering flow limit. |
20613 | Generate video input sensitive to images. |