Skip to main content
POST
/
v1
/
chat
/
completions
Typescript
const client = new Dedalus();

const result = await client.chat.completions.create({ ...params });
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The next Warriors game is tomorrow at 7:30 PM.",
        "role": "assistant"
      }
    }
  ],
  "created": 1677652288,
  "id": "chatcmpl-123",
  "model": "gpt-4o-mini",
  "object": "chat.completion",
  "tools_executed": [
    "search_events",
    "get_event_details"
  ],
  "usage": {
    "completion_tokens": 12,
    "prompt_tokens": 9,
    "total_tokens": 21
  }
}

Overview

Create a chat completion using any supported LLM provider. This endpoint provides a vendor-agnostic chat completions API that works with thousands of LLMs. It supports MCP integration, multi-model routing with intelligent agentic handoffs, client-side and server-side tool execution, and streaming and non-streaming responses.

Returns

  • ChatCompletion: OpenAI-compatible completion response with usage data

Billing

  • Token usage billed automatically based on model pricing
  • MCP tool calls billed separately using credits system
  • Streaming responses billed after completion via background task

Usage Examples

from dedalus_labs import Dedalus

client = Dedalus(api_key="your-api-key")

completion = client.chat.completions.create(
    model="openai/gpt-5-nano",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
)

print(completion.choices[0].message.content)

Key Features

  • Multi-Model Routing: Provide multiple models and let the agent choose based on task complexity
  • MCP Integration: Use MCP servers for tool calling
  • Streaming: Get real-time responses with SSE
  • Tool Calling: Execute functions during the conversation

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

ChatCompletion request schema.

Supports OpenAI-compatible parameters, provider-specific extensions, server-side execution, and agent orchestration features.

model
required

Model identifier string (e.g., 'openai/gpt-5', 'anthropic/claude-3-5-sonnet').

Example:

"openai/gpt-5"

audio
ChatCompletionRequestAudio · object

Parameters for audio output. Required when requesting audio responses with modalities: ['audio']. See: https://platform.openai.com/docs/guides/audio

Example:
{ "format": "mp3", "voice": "alloy" }
frequency_penalty
number | null
default:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

Required range: -2 <= x <= 2
function_call
string | null

Deprecated in favor of tool_choice. Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"name": "my_function"} forces the model to call that function. none is the default when no functions are present. auto is the default if functions are present.

functions
ChatCompletionFunctions · object[] | null

Deprecated in favor of tools. A list of functions the model may generate JSON inputs for.

logit_bias
Logit Bias · object

Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

logprobs
boolean | null
default:false

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

max_completion_tokens
integer | null

Maximum tokens in completion (newer parameter name)

max_tokens
integer | null

Maximum tokens in completion

Required range: x >= 1
messages
(ChatCompletionRequestDeveloperMessage · object | ChatCompletionRequestSystemMessage · object | ChatCompletionRequestUserMessage · object | ChatCompletionRequestAssistantMessage · object | ChatCompletionRequestToolMessage · object | ChatCompletionRequestFunctionMessage · object)[] | null

Conversation history (OpenAI: messages, Google: contents, Responses: input)

Developer-provided instructions that the model should follow, regardless of messages sent by the user. With o1 models and newer, developer messages replace the previous system messages.

Fields:

  • content (required): str | Annotated[list[ChatCompletionRequestMessageContentPartText], MinLen(1), ArrayTitle("ChatCompletionRequestDeveloperMessageContentArray")]
  • role (required): Literal["developer"]
  • name (optional): str
metadata
object

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

modalities
string[] | null

Output types that you would like the model to generate. Most models are capable of generating text, which is the default: ["text"] The gpt-4o-audio-preview model can also be used to generate audio. To request that this model generate both text and audio responses, you can use: ["text", "audio"]

n
integer | null
default:1

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

Required range: 1 <= x <= 128
parallel_tool_calls
boolean | null
default:true

Whether to enable parallel tool calls (Anthropic uses inverted polarity).

prediction
PredictionContent · object

Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.

presence_penalty
number | null
default:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Required range: -2 <= x <= 2
prompt_cache_key
string | null

Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field. Learn more.

prompt_cache_retention
string | null

The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps cached prefixes active for longer, up to a maximum of 24 hours. Learn more.

reasoning_effort
string | null
default:medium

Constrains effort on reasoning for reasoning models. Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response. - gpt-5.1 defaults to none, which does not perform reasoning. The supported reasoning values for gpt-5.1 are none, low, medium, and high. Tool calls are supported for all reasoning values in gpt-5.1. - All models before gpt-5.1 default to medium reasoning effort, and do not support none. - The gpt-5-pro model defaults to (and only supports) high reasoning effort. - xhigh is supported for all models after gpt-5.1-codex-max.

response_format
ResponseFormatText · object

Default response format. Used to generate text responses.

Fields:

  • type (required): Literal["text"]
safety_identifier
string | null

A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user, with a maximum length of 64 characters. We recommend hashing their username or email address, in order to avoid sending us any identifying information. Learn more.

Maximum string length: 64
seed
integer | null

Random seed for deterministic output

Required range: 0 <= x <= 9223372036854776000
service_tier
string | null
default:auto

Service tier for request processing

stop

Sequences that stop generation

store
boolean | null
default:false

Whether or not to store the output of this chat completion request for use in our model distillation or evals products. Supports text and image inputs. Note: image inputs over 8MB will be dropped.

stream
boolean | null
default:false

Enable streaming response

stream_options
object

Options for streaming response. Only set this when you set stream: true.

system_instruction

System instruction/prompt

temperature
number | null
default:1

Sampling temperature (0-2 for most providers)

Required range: 0 <= x <= 1
tool_choice
default:auto

Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

tools
Tool · object[] | null

Available tools/functions for the model

top_k
integer | null

Top-k sampling parameter

Required range: x >= 0
top_logprobs
integer | null

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

Required range: 0 <= x <= 8
top_p
number | null
default:1

Nucleus sampling threshold

Required range: 0 <= x <= 1
user
string | null

This field is being replaced by safety_identifier and prompt_cache_key. Use prompt_cache_key instead to maintain caching optimizations. A stable identifier for your end-users. Used to boost cache hit rates by better bucketing similar requests and to help OpenAI detect and prevent abuse. Learn more.

verbosity
string | null
default:medium

Constrains the verbosity of the model's response. Lower values will result in more concise responses, while higher values will result in more verbose responses. Currently supported values are low, medium, and high.

web_search_options
object

This tool searches the web for relevant results to use in a response. Learn more about the web search tool.

cache_control
CacheControlEphemeral · object

Top-level cache control automatically applies a cache_control marker to the last cacheable block in the request.

cached_content
string | null

Optional. The name of the content cached to use as context to serve the prediction. Format: cachedContents/{cachedContent}

container
string | null

Container identifier for reuse across requests.

deferred
boolean | null
default:false

If set to true, the request returns a request_id. You can then get the deferred response by GET /v1/chat/deferred-completion/{request_id}.

generation_config
object

Generation parameters wrapper (Google-specific)

inference_geo
string | null

Specifies the geographic region for inference processing. If not specified, the workspace's default_inference_geo is used.

output_config
object
prompt_mode
string | null

Allows toggling between the reasoning mode and no system prompt. When set to reasoning the system prompt for reasoning models will be used.

Allowed value: "reasoning"
safe_prompt
boolean | null
default:false

Whether to inject a safety prompt before all conversations.

safety_settings
SafetySetting · object[] | null

Safety/content filtering settings (Google-specific)

search_parameters
object

Set the parameters to be used for searched data. If not set, no data will be acquired by the model.

thinking
ThinkingConfigEnabled · object

Schema for ThinkingConfigEnabled.

Fields:

  • budget_tokens (required): int
  • type (required): Literal["enabled"]
tool_config
object

Tool calling configuration (Google-specific)

mcp_servers

MCP server identifiers. Accepts marketplace slugs, URLs, or MCPServerSpec objects. MCP tools are executed server-side and billed separately.

Example:

"dedalus-labs/example-server"

credentials
Credential · object

Credential for MCP server authentication.

Passed at endpoint level (e.g., chat.completions.create) and matched to MCP servers by connection name. Wire format matches dedalus_mcp.Credential.to_dict().

Example:
{
  "connection_name": "external-service",
  "values": { "api_key": "sk-..." }
}
guardrails
Guardrails · object[] | null

Content filtering and safety policy configuration.

handoff_config
Handoff Config · object

Configuration for multi-model handoffs.

model_attributes
Model Attributes · object

Model attributes for routing. Maps model IDs to attribute dictionaries with values in [0.0, 1.0].

Example:
{
  "gpt-5": { "accuracy": 0.95, "speed": 0.6 }
}
agent_attributes
Agent Attributes · object

Agent attributes. Values in [0.0, 1.0].

Example:
{ "accuracy": 0.9, "complexity": 0.8 }
max_turns
integer | null

Maximum conversation turns.

Required range: 1 <= x <= 100
Example:

5

automatic_tool_execution
boolean
default:true

Execute tools server-side. If false, returns raw tool calls for manual handling.

correlation_id
string | null

Stable session ID for resuming a previous handoff. Returned by the server on handoff; echo it on the next request to resume.

deferred_calls
DeferredCallResponse · object[] | null

Tier 2 stateless resumption. Deferred tool specs from a previous handoff response, sent back verbatim so the server can resume without Redis.

handoff_mode
boolean | null

Handoff control. None or omitted: auto-detect. true: structured handoff (SDK). false: drop-in (LLM re-run for mixed turns).

Response

JSON or SSE stream of ChatCompletionChunk events

Chat completion response for Dedalus API.

OpenAI-compatible chat completion response with Dedalus extensions. Maintains full compatibility with OpenAI API while providing additional features like server-side tool execution tracking and MCP error reporting.

id
string
required

A unique identifier for the chat completion.

choices
Choice · object[]
required

A list of chat completion choices. Can be more than one if n is greater than 1.

created
integer
required

The Unix timestamp (in seconds) of when the chat completion was created.

model
string
required

The model used for the chat completion.

object
string
required

The object type, which is always chat.completion.

Allowed value: "chat.completion"
service_tier
enum<string> | null

Specifies the processing type used for serving the request.

  • If set to 'auto', then the request will be processed with the service tier configured in the Project settings. Unless otherwise configured, the Project will use 'default'.
  • If set to 'default', then the request will be processed with the standard pricing and performance for the selected model.
  • If set to 'flex' or 'priority', then the request will be processed with the corresponding service tier.
  • When not set, the default behavior is 'auto'.

When the service_tier parameter is set, the response body will include the service_tier value based on the processing mode actually used to serve the request. This response value may be different from the value set in the parameter.

Available options:
auto,
default,
flex,
scale,
priority
system_fingerprint
string

This fingerprint represents the backend configuration that the model runs with.

Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.

usage
CompletionUsage · object

Usage statistics for the completion request.

tools_executed
string[] | null

List of tool names that were executed server-side (e.g., MCP tools). Only present when tools were executed on the server rather than returned for client-side execution.

mcp_server_errors
Mcp Server Errors · object

MCP server failures keyed by server name.

mcp_tool_results
MCPToolResult · object[] | null

Detailed results of MCP tool executions including inputs, outputs, and timing. Provides full visibility into server-side tool execution for debugging and audit purposes.

correlation_id
string | null

Stable session ID for cross-turn handoff state. Echo this on the next request to resume server-side execution.

server_results
Server Results · object

Completed server tool outputs keyed by call ID.

deferred
DeferredCallResponse · object[] | null

Server tools blocked on client results.

pending_tools
PendingCallResponse · object[] | null

Client tools to execute, with dependency ordering.

turns_consumed
integer | null

Number of internal LLM calls made during this request. SDKs can sum this across their outer loop to track total LLM calls.

Last modified on April 9, 2026