Generate speech audio from text.
Generates audio from the input text using text-to-speech models. Supports multiple voices and output formats including mp3, opus, aac, flac, wav, and pcm.
Returns streaming audio data that can be saved to a file or streamed directly to users.
API key authentication using Bearer token
Request to generate audio from text.
One of the available TTS models: openai/tts-1, openai/tts-1-hd or openai/gpt-4o-mini-tts.
"openai/tts-1"
"openai/tts-1-hd"
The text to generate audio for. The maximum length is 4096 characters.
"Hello, how are you today?"
The voice to use when generating the audio. Supported voices are alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, and verse. Previews of the voices are available in the Text to speech guide.
alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse "alloy"
"nova"
Control the voice of your generated audio with additional instructions. Does not work with tts-1 or tts-1-hd.
The format to audio in. Supported formats are mp3, opus, aac, flac, wav, and pcm.
mp3, opus, aac, flac, wav, pcm "mp3"
The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
0.25 <= x <= 41
The format to stream the audio in. Supported formats are sse and audio. sse is not supported for tts-1 or tts-1-hd.
sse, audio "sse"
Audio file stream
The response is of type file.