Audio/Video APIs
Audio/Video APIs offer automated audio/video content production at scale with AI.
Overview
Audio/Video APIs are a collection of resources that leverage Firefly Services' AI to create and customize audio and video content.
Explore our APIs
Generate spoken audio from a provided transcript.
Automatically reframe videos.
Create transcriptions and precise, accurate video dubs.
Text-to-Speech API
The Text-to-Speech (TTS) API generates lifelike spoken audio from a provided transcript. Features include:
- Choose voices from Firefly's catalog of voices.
- Turn prompts into spoken audio.
- Generate speech in a variety of languages and accents.
Reframe API
The Reframe API intelligently analyzes video content to dynamically adjust frame composition to fit the aspect ratios that you've specified, generating seamless content where it's needed from the existing video characteristics.
Reframe your Videos with AI
This API uses technology similar to the Auto Reframe feature currently available in Premiere Pro software. It can be integrated with third-party systems and workflows, subject to applicable terms and conditions. Performance and results may vary based on input parameters and system configurations.
All content in the generated reframed output is derived solely from the original source video.
Reframe features include:
- Generate Video Variations: The API accepts video input, processes it, and delivers output with specific aspect ratios (including but not limited to 4:3, 9:16, and 1:1) via downloadable links.
- Analyze Scenes: Enable scene edit detection to analyze video transitions and use the existing video characteristics to maintain compositional integrity across different aspect ratio outputs.
- Track Status: Check a job's progress using a designated endpoint. Response times and update frequencies are subject to system load and configuration.
- Add Overlays: Apply pre-generated graphic overlays, such as GIFs or PNGs, over videos with precise control over timing, positioning, scaling, and looping behavior. Customization ensures that overlays align across different aspect ratios and remain consistent with the visual layout.
Translate and Lip Sync API
The Translate and Lip Sync (TLS) API uses transcriptions to generate audio and video with precise, accurate dubbing and composited lip sync. This feature supports multi-speaker scenarios.
Supported workflows include:
- Transcribe audio and video.
- Generate captions for audio and video.
- Automated Dubbing for audio and video.
- Dubbing with edited transcripts.
- Dubbing with pre-existing translations.
Lip Sync is also included as a parameter of the Dub API to create high-quality composited videos with precise lip-syncing. Content Authenticity Initiative (CAI) support ensures protection against deepfakes.