Create a chat completion. Supports streaming, tool calling, and MCP server integration across all providers.
API key authentication using Bearer token
Chat completion request (OpenAI-compatible).
Stateless chat completion endpoint. For stateful conversations with threads, use the Responses API instead.
Model identifier string (e.g., 'openai/gpt-5', 'anthropic/claude-3-5-sonnet').
"openai/gpt-4o"
Conversation history. Accepts either a list of message objects or a string, which is treated as a single user message.
[
{
"content": "Hello, how are you?",
"role": "user"
}
]
Convenience alias for Responses-style input. Used when messages is omitted to provide the user prompt directly.
"Translate this paragraph into French."
What sampling temperature to use, between 0 and 2. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. We generally recommend altering this or 'top_p' but not both.
0 <= x <= 20
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or 'temperature' but not both.
0 <= x <= 10.1
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API. This value is now deprecated in favor of 'max_completion_tokens' and is not compatible with o-series models.
x >= 1100
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
-2 <= x <= 2-0.5
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
-2 <= x <= 2-0.5
Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object mapping token IDs (as strings) to bias values from -100 to 100. The bias is added to the logits before sampling; values between -1 and 1 nudge selection probability, while values like -100 or 100 effectively ban or require a token.
{ "50256": -100 }
Not supported with latest reasoning models 'o3' and 'o4-mini'.
Up to 4 sequences where the API will stop generating further tokens; the returned text will not contain the stop sequence.
["\n", "END"]
Fields:
{ "budget_tokens": 2048, "type": "enabled" }
Top-k sampling. Anthropic: pass-through. Google: injected into generationConfig.topK.
x >= 040
System prompt/instructions. Anthropic: pass-through. Google: converted to systemInstruction. OpenAI: extracted from messages.
"You are a helpful assistant."
Convenience alias for Responses-style instructions. Takes precedence over system and over system-role messages when provided.
"You are a concise assistant."
Google generationConfig object. Merged with auto-generated config. Use for Google-specific params (candidateCount, responseMimeType, etc.).
{
"candidateCount": 2,
"responseMimeType": "application/json"
}
Google safety settings (harm categories and thresholds).
[
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE"
}
]
Google tool configuration (function calling mode, etc.).
{
"function_calling_config": { "mode": "ANY" }
}
Google-only flag to disable the SDK's automatic function execution. When true, the model returns function calls for the client to execute manually.
true
If specified, system will make a best effort to sample deterministically. Determinism is not guaranteed for the same seed across different models or API versions.
42
Stable identifier for your end-users. Helps OpenAI detect and prevent abuse and may boost cache hit rates. This field is being replaced by 'safety_identifier' and 'prompt_cache_key'.
"user-123"
How many chat completion choices to generate for each input message. Keep 'n' as 1 to minimize costs.
1 <= x <= 1281
If true, the model response data is streamed to the client as it is generated using Server-Sent Events.
true
false
Options for streaming responses. Only set when 'stream' is true (supports 'include_usage' and 'include_obfuscation').
{ "include_usage": true }
An object specifying the format that the model must output. Use {'type': 'json_schema', 'json_schema': {...}} for structured outputs or {'type': 'json_object'} for the legacy JSON mode. Currently only OpenAI-prefixed models honour this field; Anthropic and Google requests will return an invalid_request_error if it is supplied.
{ "type": "text" }
A list of tools the model may call. Supports OpenAI function tools and custom tools; use 'mcp_servers' for Dedalus-managed server-side tools.
[
{
"function": {
"description": "Get current weather for a location",
"name": "get_weather",
"parameters": {
"properties": {
"location": {
"description": "City name",
"type": "string"
}
},
"required": ["location"],
"type": "object"
}
},
"type": "function"
}
]
Controls which (if any) tool is called by the model. 'none' stops tool calling, 'auto' lets the model decide, and 'required' forces at least one tool invocation. Specific tool payloads force that tool.
"auto"
Whether to enable parallel function calling during tool use.
true
Deprecated in favor of 'tools'. Legacy list of function definitions the model may generate JSON inputs for.
Deprecated in favor of 'tool_choice'. Controls which function is called by the model (none, auto, or specific name).
Whether to return log probabilities of the output tokens. If true, returns the log probabilities for each token in the response content.
true
An integer between 0 and 20 specifying how many of the most likely tokens to return at each position, with log probabilities. Requires 'logprobs' to be true.
0 <= x <= 205
An upper bound for the number of tokens that can be generated for a completion, including visible output and reasoning tokens.
x >= 11000
Constrains effort on reasoning for supported reasoning models. Higher values use more compute, potentially improving reasoning quality at the cost of latency and tokens.
low, medium, high "medium"
Parameters for audio output. Required when requesting audio responses (for example, modalities including 'audio').
{ "format": "mp3", "voice": "alloy" }
Output types you would like the model to generate. Most models default to ['text']; some support ['text', 'audio'].
["text"]
Configuration for predicted outputs. Improves response times when you already know large portions of the response content.
Set of up to 16 key-value string pairs that can be attached to the request for structured metadata.
{ "session": "abc", "user_id": "123" }
Whether to store the output of this chat completion request for OpenAI model distillation or eval products. Image inputs over 8MB are dropped if storage is enabled.
true
Specifies the processing tier used for the request. 'auto' uses project defaults, while 'default' forces standard pricing and performance.
auto, default "auto"
Used by OpenAI to cache responses for similar requests and optimize cache hit rates. Replaces the legacy 'user' field for caching.
Stable identifier used to help detect users who might violate OpenAI usage policies. Consider hashing end-user identifiers before sending.
Constrains the verbosity of the model's response. Lower values produce concise answers, higher values allow more detail.
low, medium, high Configuration for OpenAI's web search tool. Learn more at https://platform.openai.com/docs/guides/tools-web-search?api-mode=chat.
xAI-specific parameter for configuring web search data acquisition. If not set, no data will be acquired by the model.
xAI-specific parameter. If set to true, the request returns a request_id for async completion retrieval via GET /v1/chat/deferred-completion/{request_id}.
MCP (Model Context Protocol) server addresses to make available for server-side tool execution. Entries can be URLs (e.g., 'https://mcp.example.com'), slugs (e.g., 'dedalus-labs/brave-search'), or structured objects specifying slug/version/url. MCP tools are executed server-side and billed separately.
[
"dedalus-labs/brave-search",
"dedalus-labs/github-api"
]
Guardrails to apply to the agent for input/output validation and safety checks. Reserved for future use - guardrails configuration format not yet finalized.
Configuration for multi-model handoffs and agent orchestration. Reserved for future use - handoff configuration format not yet finalized.
Attributes for individual models used in routing decisions during multi-model execution. Format: {'model_name': {'attribute': value}}, where values are 0.0-1.0. Common attributes: 'intelligence', 'speed', 'cost', 'creativity', 'accuracy'. Used by agent to select optimal model based on task requirements.
{
"anthropic/claude-3-5-sonnet": {
"cost": 0.7,
"creativity": 0.8,
"intelligence": 0.95
},
"openai/gpt-4": {
"cost": 0.8,
"intelligence": 0.9,
"speed": 0.6
},
"openai/gpt-4o-mini": {
"cost": 0.2,
"intelligence": 0.7,
"speed": 0.9
}
}
Attributes for the agent itself, influencing behavior and model selection. Format: {'attribute': value}, where values are 0.0-1.0. Common attributes: 'complexity', 'accuracy', 'efficiency', 'creativity', 'friendliness'. Higher values indicate stronger preference for that characteristic.
{
"accuracy": 0.9,
"complexity": 0.8,
"efficiency": 0.7
}
Maximum number of turns for agent execution before terminating (default: 10). Each turn represents one model inference cycle. Higher values allow more complex reasoning but increase cost and latency.
1 <= x <= 1005
When False, skip server-side tool execution and return raw OpenAI-style tool_calls in the response.
true
false
JSON or SSE stream of ChatCompletionChunk events
Chat completion response for Dedalus API.
OpenAI-compatible chat completion response with Dedalus extensions. Maintains full compatibility with OpenAI API while providing additional features like server-side tool execution tracking and MCP error reporting.
A unique identifier for the chat completion.
A list of chat completion choices. Can be more than one if n is greater than 1.
The Unix timestamp (in seconds) of when the chat completion was created.
The model used for the chat completion.
The object type, which is always chat.completion.
"chat.completion"Specifies the processing type used for serving the request.
When the service_tier parameter is set, the response body will include the service_tier value based on the processing mode actually used to serve the request. This response value may be different from the value set in the parameter.
auto, default, flex, scale, priority This fingerprint represents the backend configuration that the model runs with.
Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.
Usage statistics for the completion request.
List of tool names that were executed server-side (e.g., MCP tools). Only present when tools were executed on the server rather than returned for client-side execution.
Information about MCP server failures, if any occurred during the request. Contains details about which servers failed and why, along with recommendations for the user. Only present when MCP server failures occurred.