Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
ChatCompletion request schema.
Supports OpenAI-compatible parameters, provider-specific extensions, server-side execution, and agent orchestration features.
Model identifier string (e.g., 'openai/gpt-5', 'anthropic/claude-3-5-sonnet').
"openai/gpt-5"
Parameters for audio output. Required when requesting audio responses with modalities: ['audio']. See: https://platform.openai.com/docs/guides/audio
{ "format": "mp3", "voice": "alloy" }
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
-2 <= x <= 2Deprecated in favor of tool_choice. Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"name": "my_function"} forces the model to call that function. none is the default when no functions are present. auto is the default if functions are present.
Deprecated in favor of tools. A list of functions the model may generate JSON inputs for.
Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.
Maximum tokens in completion (newer parameter name)
Maximum tokens in completion
x >= 1Conversation history (OpenAI: messages, Google: contents, Responses: input)
Developer-provided instructions that the model should follow, regardless of
messages sent by the user. With o1 models and newer, developer messages
replace the previous system messages.
Fields:
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Output types that you would like the model to generate. Most models are capable of generating text, which is the default: ["text"] The gpt-4o-audio-preview model can also be used to generate audio. To request that this model generate both text and audio responses, you can use: ["text", "audio"]
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
1 <= x <= 128Whether to enable parallel tool calls (Anthropic uses inverted polarity).
Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
-2 <= x <= 2Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field. Learn more.
The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps cached prefixes active for longer, up to a maximum of 24 hours. Learn more.
Constrains effort on reasoning for reasoning models. Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response. - gpt-5.1 defaults to none, which does not perform reasoning. The supported reasoning values for gpt-5.1 are none, low, medium, and high. Tool calls are supported for all reasoning values in gpt-5.1. - All models before gpt-5.1 default to medium reasoning effort, and do not support none. - The gpt-5-pro model defaults to (and only supports) high reasoning effort. - xhigh is supported for all models after gpt-5.1-codex-max.
Default response format. Used to generate text responses.
Fields:
A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user, with a maximum length of 64 characters. We recommend hashing their username or email address, in order to avoid sending us any identifying information. Learn more.
64Random seed for deterministic output
0 <= x <= 9223372036854776000Service tier for request processing
Sequences that stop generation
Whether or not to store the output of this chat completion request for use in our model distillation or evals products. Supports text and image inputs. Note: image inputs over 8MB will be dropped.
Enable streaming response
Options for streaming response. Only set this when you set stream: true.
System instruction/prompt
Sampling temperature (0-2 for most providers)
0 <= x <= 1Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
Available tools/functions for the model
Top-k sampling parameter
x >= 0An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
0 <= x <= 8Nucleus sampling threshold
0 <= x <= 1This field is being replaced by safety_identifier and prompt_cache_key. Use prompt_cache_key instead to maintain caching optimizations. A stable identifier for your end-users. Used to boost cache hit rates by better bucketing similar requests and to help OpenAI detect and prevent abuse. Learn more.
Constrains the verbosity of the model's response. Lower values will result in more concise responses, while higher values will result in more verbose responses. Currently supported values are low, medium, and high.
This tool searches the web for relevant results to use in a response. Learn more about the web search tool.
Top-level cache control automatically applies a cache_control marker to the last cacheable block in the request.
Container identifier for reuse across requests.
If set to true, the request returns a request_id. You can then get the deferred response by GET /v1/chat/deferred-completion/{request_id}.
Generation parameters wrapper (Google-specific)
Specifies the geographic region for inference processing. If not specified, the workspace's default_inference_geo is used.
Allows toggling between the reasoning mode and no system prompt. When set to reasoning the system prompt for reasoning models will be used.
"reasoning"Whether to inject a safety prompt before all conversations.
Safety/content filtering settings (Google-specific)
Set the parameters to be used for searched data. If not set, no data will be acquired by the model.
Schema for ThinkingConfigEnabled.
Fields:
Tool calling configuration (Google-specific)
MCP server identifiers. Accepts marketplace slugs, URLs, or MCPServerSpec objects. MCP tools are executed server-side and billed separately.
"dedalus-labs/example-server"
Credential for MCP server authentication.
Passed at endpoint level (e.g., chat.completions.create) and matched to MCP servers by connection name. Wire format matches dedalus_mcp.Credential.to_dict().
{
"connection_name": "external-service",
"values": { "api_key": "sk-..." }
}
Content filtering and safety policy configuration.
Configuration for multi-model handoffs.
Model attributes for routing. Maps model IDs to attribute dictionaries with values in [0.0, 1.0].
{
"gpt-5": { "accuracy": 0.95, "speed": 0.6 }
}
Agent attributes. Values in [0.0, 1.0].
{ "accuracy": 0.9, "complexity": 0.8 }
Maximum conversation turns.
1 <= x <= 1005
Execute tools server-side. If false, returns raw tool calls for manual handling.
Stable session ID for resuming a previous handoff. Returned by the server on handoff; echo it on the next request to resume.
Tier 2 stateless resumption. Deferred tool specs from a previous handoff response, sent back verbatim so the server can resume without Redis.
Handoff control. None or omitted: auto-detect. true: structured handoff (SDK). false: drop-in (LLM re-run for mixed turns).
JSON or SSE stream of ChatCompletionChunk events
Chat completion response for Dedalus API.
OpenAI-compatible chat completion response with Dedalus extensions. Maintains full compatibility with OpenAI API while providing additional features like server-side tool execution tracking and MCP error reporting.
A unique identifier for the chat completion.
A list of chat completion choices. Can be more than one if n is greater than 1.
The Unix timestamp (in seconds) of when the chat completion was created.
The model used for the chat completion.
The object type, which is always chat.completion.
"chat.completion"Specifies the processing type used for serving the request.
When the service_tier parameter is set, the response body will include the service_tier value based on the processing mode actually used to serve the request. This response value may be different from the value set in the parameter.
auto, default, flex, scale, priority This fingerprint represents the backend configuration that the model runs with.
Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.
Usage statistics for the completion request.
List of tool names that were executed server-side (e.g., MCP tools). Only present when tools were executed on the server rather than returned for client-side execution.
MCP server failures keyed by server name.
Detailed results of MCP tool executions including inputs, outputs, and timing. Provides full visibility into server-side tool execution for debugging and audit purposes.
Stable session ID for cross-turn handoff state. Echo this on the next request to resume server-side execution.
Completed server tool outputs keyed by call ID.
Server tools blocked on client results.
Client tools to execute, with dependency ordering.
Number of internal LLM calls made during this request. SDKs can sum this across their outer loop to track total LLM calls.