Avatar and TTS (Text-to-Speech) API (beta)

Text-to-Speech and avatar resources are now available in private beta.

Overview

Avatar and Text-to-Speech (TTS) is a technology for creating digital clones of real humans which can be used to create lifelike speaking videos or audio from a transcript. These resources reduce creation time and cost for professional content production.

These APIs offer automated video and audio creation at scale:

Avatar API enables you to create an avatar speaking on video from a provided transcript. You may provide audio or text input files.
Text-to-Speech (TTS) API enables you to generate lifelike spoken audio from a provided transcript.

Start exploring this API to see what it's all about.

Last updated 1/29/2025

Was this helpful?

Yes

Avatar and TTS (Text-to-Speech) API (beta)

Overview.css-83mclq{margin-left:var(--spectrum-global-dimension-size-100);}

Overview