How it works

1

Host your assets at a publicly accessible URL

Upload your video, photo, and audio files so our servers can retrieve them.

2

Send an API request with the appropriate parameters

Reference your hosted assets and specify your desired mode (Standard or Precision).

3

Wait or query status

Use our webhook callback or poll the API with your job ID until processing is complete.

4

Download video output

Retrieve the finished talking photo or lip‑synced video from the provided URL.

Usage Limitation:

  • You may have up to 5 concurrent jobs (including queued requests).
  • Only single‑face videos or photos are supported.
  • Estimated queue time: 1–120 minutes, depending on system load.
  • Standard Mode processing time: ~10 minutes.
  • Precision Mode processing time: ~20 minutes.

If a video or photo contains multiple faces, only the largest detected face will be lip‑synced.

API Error Codes

CodeDescription
5Invalid request parameters.
7No permission to request.
104Insufficient credits.
814Your account is not a member and is not allowed to call the API.
1000Internal Server Error.
1301Challenge failed.
1302API key has been revoked.
1304API key has reached the maximum number of concurrent requests.
1502Your audio driver is either invalid or cannot be downloaded.
1503Your account is not authorized to call the API.

Job Error Codes

CodeDescription
20403Not enough faces.
20407The number of face tracks is too many.
20408The image-to-video facial detection has not been passed.
20601There are no faces in the picture.
20602Unknown image format.
20611Video triggering flow limit.
20613Generate video input sensitive to images.