API Usage Notes
This doc provides details about using the APIs, including what's currently supported, limitations and workarounds, and the current usage limits.
Limitations and workarounds
These are some known limitations of the this API and their workarounds:
- Gesture mismatch: Output videos may occasionally feature gesture mismatches.
- TTS voice modulation: The output may have signification modulation in pitch or voice. Regenerating the audio can often resolve this issue.
- Limited voice controls: Currently we do not support voice controls like emphasis, speed or pitch modulation.
- Mispronunciation: The audio output might mispronounce certain uncommon words or proper nouns. This can be addressed by using phonetic spellings.
Request limits
To be sure everyone enjoys peak performance with these APIs, Adobe sets limits on the volume, frequency, and concurrency of API calls. Adobe monitors your API usage and will contact you proactively to resolve any risks to API performance.
These are the current rate limits for API requests:
Be aware that these usage limits apply to your entire organization.
Avatar API: 1 request per minute and 150 requests per day. Note that each request corresponds to one generation.
TTS API: 1 request per minute.
Get Result API: 100 requests per minute.
You may encounter a HTTP 429 "Too Many Requests"
error if usage exceeds either the per minute or per day limits. We recommend using the retry-after
header to determine the number of seconds you should wait before trying again.
Language support
Audio and video generation is supported for the following languages:
- English
(en-US)
- Spanish
(es-ES)
- German
(de-DE)
- French
(fr-FR)
- Portuguese
(pt-PT)
- Italian
(it-IT)
Change the localeCode
parameter to get the results in the desired language/accent.
Input text specifications
Transcript length: Up to 7500 characters.
Input Medium: Direct text or .txt
file via a pre-signed URL.
Input audio specifications (for Avatar API)
Duration (max): 30 mins.
CODEC: MPEG, PCM.
Formats/container: audio/mp3, audio/mpeg, audio/x-wav, audio/wav, audio/vnd.dlna.adts, audio/aac.
Input Medium: Pre-signed URL.
Background video specifications (for Avatar API)
Duration (max): 30 mins.
FPS: 24 fps, 25 fps, 29.97, 30, 50, 59.94, 60.
Resolution (max): Full HD.
Aspect Ratio: 1,920*1,080px.
CODEC: H.264.
Formats/container: video/mp4, video/mov.
Input Medium: Pre-signed URL.
Background image specifications (for Avatar API)
Formats: JPEG,PNG.
Input Medium: Pre-signed URL.
Aspect Ratio: 1,920*1,080px.
API render time
Avatar API: 10X the output video length.
TTS API: 2X the output audio length.