Create Transcription

curl --request POST \
  --url https://api.dedaluslabs.ai/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file='@example-file' \
  --form 'model=<string>' \
  --form 'language=<string>' \
  --form 'prompt=<string>' \
  --form 'response_format=<string>' \
  --form temperature=123

{
  "language": "<string>",
  "duration": 123,
  "text": "<string>",
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "usage": {
    "type": "<string>",
    "seconds": 123
  }
}

Endpoints

Create Transcription

Transcribe audio into text.

Transcribes audio files using OpenAI’s Whisper model. Supports multiple audio formats including mp3, mp4, mpeg, mpga, m4a, wav, and webm. Maximum file size is 25 MB.

Args: file: Audio file to transcribe (required) model: Model ID to use (e.g., “openai/whisper-1”) language: ISO-639-1 language code (e.g., “en”, “es”) - improves accuracy prompt: Optional text to guide the model’s style response_format: Format of the output (json, text, srt, verbose_json, vtt) temperature: Sampling temperature between 0 and 1

Returns: Transcription object with the transcribed text

POST

audio

transcriptions

Create Transcription

curl --request POST \
  --url https://api.dedaluslabs.ai/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file='@example-file' \
  --form 'model=<string>' \
  --form 'language=<string>' \
  --form 'prompt=<string>' \
  --form 'response_format=<string>' \
  --form temperature=123

{
  "language": "<string>",
  "duration": 123,
  "text": "<string>",
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "usage": {
    "type": "<string>",
    "seconds": 123
  }
}

Authorizations

Authorization

string

header

required

API key authentication using Bearer token

Body

multipart/form-data

file

required

model

string

required

language

string | null

prompt

string | null

response_format

string | null

temperature

number | null

Response

Successful Response

CreateTranscriptionResponseVerboseJson
CreateTranscriptionResponseJson

Represents a verbose json transcription response returned by model, based on the provided input.

Fields:

language (required): str
duration (required): float
text (required): str
words (optional): list[TranscriptionWord]
segments (optional): list[TranscriptionSegment]
usage (optional): TranscriptTextUsageDuration

language

string

required

The language of the input audio.

duration

number

required

The duration of the input audio.

text

string

required

The transcribed text.

words

TranscriptionWord · object[]

Extracted words and their corresponding timestamps.

Show child attributes

segments

TranscriptionSegment · object[]

Segments of the transcribed text and their corresponding details.

Show child attributes

usage

TranscriptTextUsageDuration · object

Usage statistics for models billed by audio input duration.

Show child attributes

Create Speech

Create Translation

Overview

Endpoints

Schemas

Create Transcription

Authorizations

Body

Response