Transcribe audio into text.
Transcribes audio files using OpenAI’s Whisper model. Supports multiple audio formats including mp3, mp4, mpeg, mpga, m4a, wav, and webm. Maximum file size is 25 MB.
Args: file: Audio file to transcribe (required) model: Model ID to use (e.g., “openai/whisper-1”) language: ISO-639-1 language code (e.g., “en”, “es”) - improves accuracy prompt: Optional text to guide the model’s style response_format: Format of the output (json, text, srt, verbose_json, vtt) temperature: Sampling temperature between 0 and 1
Returns: Transcription object with the transcribed text
API key authentication using Bearer token
Successful Response
Represents a verbose json transcription response returned by model, based on the provided input.
Fields:
The language of the input audio.
The duration of the input audio.
The transcribed text.
Extracted words and their corresponding timestamps.
Segments of the transcribed text and their corresponding details.
Usage statistics for models billed by audio input duration.