Skip to main content
POST
/
v1
/
audio
/
speech
Typescript
const client = new Dedalus();

const result = await client.audio.speech.create({ ...params });
"<string>"

Overview

Generate audio from text using text-to-speech models. Currently supports OpenAI’s TTS models with multiple voice options. Note: OpenAI only endpoint.

Usage Examples

curl -X POST https://api.dedaluslabs.ai/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/tts-1",
    "input": "Hello, this is a test of text to speech.",
    "voice": "alloy"
  }' \
  --output speech.mp3

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Schema for SpeechRequest.

Fields:

  • model (required): str | Literal["tts-1", "tts-1-hd", "gpt-4o-mini-tts", "gpt-4o-mini-tts-2025-12-15"]
  • input (required): Annotated[str, StringConstraints(max_length=4096)]
  • instructions (optional): Annotated[str, StringConstraints(max_length=4096)]
  • voice (required): VoiceIdsOrCustomVoice
  • response_format (optional): Literal["mp3", "opus", "aac", "flac", "wav", "pcm"]
  • speed (optional): float
  • stream_format (optional): Literal["sse", "audio"]
model
required

One of the available TTS models: tts-1, tts-1-hd, gpt-4o-mini-tts, or gpt-4o-mini-tts-2025-12-15.

input
string
required

The text to generate audio for. The maximum length is 4096 characters.

Maximum string length: 4096
voice
required

The voice to use when generating the audio. Supported built-in voices are alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse, marin, and cedar. You may also provide a custom voice object with an id, for example { "id": "voice_1234" }. Previews of the voices are available in the Text to speech guide.

instructions
string

Control the voice of your generated audio with additional instructions. Does not work with tts-1 or tts-1-hd.

Maximum string length: 4096
response_format
enum<string>
default:mp3

The format to audio in. Supported formats are mp3, opus, aac, flac, wav, and pcm.

Available options:
mp3,
opus,
aac,
flac,
wav,
pcm
speed
number
default:1

The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

Required range: 0.25 <= x <= 4
stream_format
enum<string>
default:audio

The format to stream the audio in. Supported formats are sse and audio. sse is not supported for tts-1 or tts-1-hd.

Available options:
sse,
audio

Response

Audio file stream

The response is of type file.

Last modified on April 9, 2026