How it works

1

Host your assets at a publicly accessible URL

Upload your video, photo, and audio files so our servers can retrieve them.

2

Send an API request with the appropriate parameters

Reference your hosted assets and specify your desired mode (Standard or Precision).

3

Wait or Query status

Use our webhook callback or poll the API with your job ID until processing is complete.

4

Download Video Output

Retrieve the finished talking photo or lip‑synced video from the provided URL.

Usage Limitation:

  • You may have up to 5 concurrent jobs (including queued requests).
  • Only single‑face videos or photos are supported.
  • Estimated queue time: 1–120 minutes, depending on system load.
  • Standard Mode processing time: ~10 minutes.
  • Precision Mode processing time: ~20 minutes.

If a video or photo contains multiple faces, only the largest detected face will be lip‑synced.

Error Codes

CodeDescription
5Invalid request parameters.
104Insufficient credits.
1304API key has reached the maximum number of concurrent requests.
1301Face recognition failed. Ensure a single identifiable real face is in the image.
1302API key has been revoked.