Skip to main content
POST
/
v1
/
audio
/
transcriptions
Typescript
const client = new Dedalus();

const result = await client.audio.transcriptions.create({ ...params });
{
  "language": "<string>",
  "duration": 123,
  "text": "<string>",
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "usage": {
    "type": "<string>",
    "seconds": 123
  }
}

Overview

Transcribe audio files to text using speech-to-text models. Supports multiple audio formats including mp3, mp4, wav, and more. Note: OpenAI only endpoint.

Usage Examples

curl -X POST https://api.dedaluslabs.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file="@audio.mp3" \
  -F model="openai/whisper-1"

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
file
file
required
model
string
required
language
string | null
prompt
string | null
response_format
string | null
temperature
number | null

Response

Successful Response

Represents a verbose json transcription response returned by model, based on the provided input.

Fields:

  • language (required): str
  • duration (required): float
  • text (required): str
  • words (optional): list[TranscriptionWord]
  • segments (optional): list[TranscriptionSegment]
  • usage (optional): TranscriptTextUsageDuration
language
string
required

The language of the input audio.

duration
number
required

The duration of the input audio.

text
string
required

The transcribed text.

words
TranscriptionWord · object[]

Extracted words and their corresponding timestamps.

segments
TranscriptionSegment · object[]

Segments of the transcribed text and their corresponding details.

usage
TranscriptTextUsageDuration · object

Usage statistics for models billed by audio input duration.

Last modified on April 9, 2026