Skip to main content
POST
/
v1
/
chat
/
completions
Create Chat Completion
curl --request POST \
  --url https://api.dedaluslabs.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "content": "Hello, how are you?",
      "role": "user"
    }
  ],
  "input": "Translate this paragraph into French.",
  "temperature": 0,
  "top_p": 0.1,
  "max_tokens": 100,
  "presence_penalty": -0.5,
  "frequency_penalty": -0.5,
  "logit_bias": {
    "50256": -100
  },
  "stop": [
    "\n",
    "END"
  ],
  "thinking": {
    "budget_tokens": 2048,
    "type": "enabled"
  },
  "top_k": 40,
  "system": "You are a helpful assistant.",
  "instructions": "You are a concise assistant.",
  "generation_config": {
    "candidateCount": 2,
    "responseMimeType": "application/json"
  },
  "safety_settings": [
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_NONE"
    }
  ],
  "tool_config": {
    "function_calling_config": {
      "mode": "ANY"
    }
  },
  "disable_automatic_function_calling": true,
  "seed": 42,
  "user": "user-123",
  "n": 1,
  "stream": false,
  "stream_options": {
    "include_usage": true
  },
  "response_format": {
    "type": "text"
  },
  "tools": [
    {
      "function": {
        "description": "Get current weather for a location",
        "name": "get_weather",
        "parameters": {
          "properties": {
            "location": {
              "description": "City name",
              "type": "string"
            }
          },
          "required": [
            "location"
          ],
          "type": "object"
        }
      },
      "type": "function"
    }
  ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "functions": [
    {}
  ],
  "function_call": "<string>",
  "logprobs": true,
  "top_logprobs": 5,
  "max_completion_tokens": 1000,
  "reasoning_effort": "medium",
  "audio": {
    "format": "mp3",
    "voice": "alloy"
  },
  "modalities": [
    "text"
  ],
  "prediction": {},
  "metadata": {
    "session": "abc",
    "user_id": "123"
  },
  "store": true,
  "service_tier": "auto",
  "prompt_cache_key": "<string>",
  "safety_identifier": "<string>",
  "verbosity": "low",
  "web_search_options": {},
  "search_parameters": {},
  "deferred": true,
  "mcp_servers": [
    "dedalus-labs/brave-search",
    "dedalus-labs/github-api"
  ],
  "guardrails": [
    {}
  ],
  "handoff_config": {},
  "model_attributes": {
    "anthropic/claude-3-5-sonnet": {
      "cost": 0.7,
      "creativity": 0.8,
      "intelligence": 0.95
    },
    "openai/gpt-4": {
      "cost": 0.8,
      "intelligence": 0.9,
      "speed": 0.6
    },
    "openai/gpt-4o-mini": {
      "cost": 0.2,
      "intelligence": 0.7,
      "speed": 0.9
    }
  },
  "agent_attributes": {
    "accuracy": 0.9,
    "complexity": 0.8,
    "efficiency": 0.7
  },
  "max_turns": 5,
  "auto_execute_tools": true
}
'
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The next Warriors game is tomorrow at 7:30 PM.",
        "role": "assistant"
      }
    }
  ],
  "created": 1677652288,
  "id": "chatcmpl-123",
  "model": "gpt-4o-mini",
  "object": "chat.completion",
  "tools_executed": [
    "search_events",
    "get_event_details"
  ],
  "usage": {
    "completion_tokens": 12,
    "prompt_tokens": 9,
    "total_tokens": 21
  }
}

Authorizations

Authorization
string
header
required

API key authentication using Bearer token

Body

application/json

Chat completion request (OpenAI-compatible).

Stateless chat completion endpoint. For stateful conversations with threads, use the Responses API instead.

model
required

Model identifier string (e.g., 'openai/gpt-5', 'anthropic/claude-3-5-sonnet').

Example:

"openai/gpt-4o"

messages
required

Conversation history. Accepts either a list of message objects or a string, which is treated as a single user message.

Example:
[
  {
    "content": "Hello, how are you?",
    "role": "user"
  }
]
input

Convenience alias for Responses-style input. Used when messages is omitted to provide the user prompt directly.

Example:

"Translate this paragraph into French."

temperature
number | null

What sampling temperature to use, between 0 and 2. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. We generally recommend altering this or 'top_p' but not both.

Required range: 0 <= x <= 2
Example:

0

top_p
number | null

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or 'temperature' but not both.

Required range: 0 <= x <= 1
Example:

0.1

max_tokens
integer | null

The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API. This value is now deprecated in favor of 'max_completion_tokens' and is not compatible with o-series models.

Required range: x >= 1
Example:

100

presence_penalty
number | null

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Required range: -2 <= x <= 2
Example:

-0.5

frequency_penalty
number | null

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

Required range: -2 <= x <= 2
Example:

-0.5

logit_bias
Logit Bias · object

Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object mapping token IDs (as strings) to bias values from -100 to 100. The bias is added to the logits before sampling; values between -1 and 1 nudge selection probability, while values like -100 or 100 effectively ban or require a token.

Example:
{ "50256": -100 }
stop
string[] | null

Not supported with latest reasoning models 'o3' and 'o4-mini'.

    Up to 4 sequences where the API will stop generating further tokens; the returned text will not contain the stop sequence.
Example:
["\n", "END"]
thinking
ThinkingConfigDisabled · object

Fields:

  • type (required): Literal['disabled']
Example:
{ "budget_tokens": 2048, "type": "enabled" }
top_k
integer | null

Top-k sampling. Anthropic: pass-through. Google: injected into generationConfig.topK.

Required range: x >= 0
Example:

40

system

System prompt/instructions. Anthropic: pass-through. Google: converted to systemInstruction. OpenAI: extracted from messages.

Example:

"You are a helpful assistant."

instructions

Convenience alias for Responses-style instructions. Takes precedence over system and over system-role messages when provided.

Example:

"You are a concise assistant."

generation_config
Generation Config · object

Google generationConfig object. Merged with auto-generated config. Use for Google-specific params (candidateCount, responseMimeType, etc.).

Example:
{
  "candidateCount": 2,
  "responseMimeType": "application/json"
}
safety_settings
Safety Settings · object[] | null

Google safety settings (harm categories and thresholds).

Example:
[
  {
    "category": "HARM_CATEGORY_HARASSMENT",
    "threshold": "BLOCK_NONE"
  }
]
tool_config
Tool Config · object

Google tool configuration (function calling mode, etc.).

Example:
{
  "function_calling_config": { "mode": "ANY" }
}
disable_automatic_function_calling
boolean | null

Google-only flag to disable the SDK's automatic function execution. When true, the model returns function calls for the client to execute manually.

Example:

true

seed
integer | null

If specified, system will make a best effort to sample deterministically. Determinism is not guaranteed for the same seed across different models or API versions.

Example:

42

user
string | null

Stable identifier for your end-users. Helps OpenAI detect and prevent abuse and may boost cache hit rates. This field is being replaced by 'safety_identifier' and 'prompt_cache_key'.

Example:

"user-123"

n
integer | null

How many chat completion choices to generate for each input message. Keep 'n' as 1 to minimize costs.

Required range: 1 <= x <= 128
Example:

1

stream
boolean
default:false

If true, the model response data is streamed to the client as it is generated using Server-Sent Events.

Examples:

true

false

stream_options
Stream Options · object

Options for streaming responses. Only set when 'stream' is true (supports 'include_usage' and 'include_obfuscation').

Example:
{ "include_usage": true }
response_format
Response Format · object

An object specifying the format that the model must output. Use {'type': 'json_schema', 'json_schema': {...}} for structured outputs or {'type': 'json_object'} for the legacy JSON mode. Currently only OpenAI-prefixed models honour this field; Anthropic and Google requests will return an invalid_request_error if it is supplied.

Example:
{ "type": "text" }
tools
Tools · object[] | null

A list of tools the model may call. Supports OpenAI function tools and custom tools; use 'mcp_servers' for Dedalus-managed server-side tools.

Example:
[
  {
    "function": {
      "description": "Get current weather for a location",
      "name": "get_weather",
      "parameters": {
        "properties": {
          "location": {
            "description": "City name",
            "type": "string"
          }
        },
        "required": ["location"],
        "type": "object"
      }
    },
    "type": "function"
  }
]
tool_choice

Controls which (if any) tool is called by the model. 'none' stops tool calling, 'auto' lets the model decide, and 'required' forces at least one tool invocation. Specific tool payloads force that tool.

Example:

"auto"

parallel_tool_calls
boolean | null

Whether to enable parallel function calling during tool use.

Example:

true

functions
Functions · object[] | null

Deprecated in favor of 'tools'. Legacy list of function definitions the model may generate JSON inputs for.

function_call

Deprecated in favor of 'tool_choice'. Controls which function is called by the model (none, auto, or specific name).

logprobs
boolean | null

Whether to return log probabilities of the output tokens. If true, returns the log probabilities for each token in the response content.

Example:

true

top_logprobs
integer | null

An integer between 0 and 20 specifying how many of the most likely tokens to return at each position, with log probabilities. Requires 'logprobs' to be true.

Required range: 0 <= x <= 20
Example:

5

max_completion_tokens
integer | null

An upper bound for the number of tokens that can be generated for a completion, including visible output and reasoning tokens.

Required range: x >= 1
Example:

1000

reasoning_effort
enum<string> | null

Constrains effort on reasoning for supported reasoning models. Higher values use more compute, potentially improving reasoning quality at the cost of latency and tokens.

Available options:
low,
medium,
high
Example:

"medium"

audio
Audio · object

Parameters for audio output. Required when requesting audio responses (for example, modalities including 'audio').

Example:
{ "format": "mp3", "voice": "alloy" }
modalities
string[] | null

Output types you would like the model to generate. Most models default to ['text']; some support ['text', 'audio'].

Example:
["text"]
prediction
Prediction · object

Configuration for predicted outputs. Improves response times when you already know large portions of the response content.

metadata
Metadata · object

Set of up to 16 key-value string pairs that can be attached to the request for structured metadata.

Example:
{ "session": "abc", "user_id": "123" }
store
boolean | null

Whether to store the output of this chat completion request for OpenAI model distillation or eval products. Image inputs over 8MB are dropped if storage is enabled.

Example:

true

service_tier
enum<string> | null

Specifies the processing tier used for the request. 'auto' uses project defaults, while 'default' forces standard pricing and performance.

Available options:
auto,
default
Example:

"auto"

prompt_cache_key
string | null

Used by OpenAI to cache responses for similar requests and optimize cache hit rates. Replaces the legacy 'user' field for caching.

safety_identifier
string | null

Stable identifier used to help detect users who might violate OpenAI usage policies. Consider hashing end-user identifiers before sending.

verbosity
enum<string> | null

Constrains the verbosity of the model's response. Lower values produce concise answers, higher values allow more detail.

Available options:
low,
medium,
high
web_search_options
Web Search Options · object

Configuration for OpenAI's web search tool. Learn more at https://platform.openai.com/docs/guides/tools-web-search?api-mode=chat.

search_parameters
Search Parameters · object

xAI-specific parameter for configuring web search data acquisition. If not set, no data will be acquired by the model.

deferred
boolean | null

xAI-specific parameter. If set to true, the request returns a request_id for async completion retrieval via GET /v1/chat/deferred-completion/{request_id}.

mcp_servers

MCP (Model Context Protocol) server addresses to make available for server-side tool execution. Entries can be URLs (e.g., 'https://mcp.example.com'), slugs (e.g., 'dedalus-labs/brave-search'), or structured objects specifying slug/version/url. MCP tools are executed server-side and billed separately.

Example:
[
  "dedalus-labs/brave-search",
  "dedalus-labs/github-api"
]
guardrails
Guardrails · object[] | null

Guardrails to apply to the agent for input/output validation and safety checks. Reserved for future use - guardrails configuration format not yet finalized.

handoff_config
Handoff Config · object

Configuration for multi-model handoffs and agent orchestration. Reserved for future use - handoff configuration format not yet finalized.

model_attributes
Model Attributes · object

Attributes for individual models used in routing decisions during multi-model execution. Format: {'model_name': {'attribute': value}}, where values are 0.0-1.0. Common attributes: 'intelligence', 'speed', 'cost', 'creativity', 'accuracy'. Used by agent to select optimal model based on task requirements.

Example:
{
  "anthropic/claude-3-5-sonnet": {
    "cost": 0.7,
    "creativity": 0.8,
    "intelligence": 0.95
  },
  "openai/gpt-4": {
    "cost": 0.8,
    "intelligence": 0.9,
    "speed": 0.6
  },
  "openai/gpt-4o-mini": {
    "cost": 0.2,
    "intelligence": 0.7,
    "speed": 0.9
  }
}
agent_attributes
Agent Attributes · object

Attributes for the agent itself, influencing behavior and model selection. Format: {'attribute': value}, where values are 0.0-1.0. Common attributes: 'complexity', 'accuracy', 'efficiency', 'creativity', 'friendliness'. Higher values indicate stronger preference for that characteristic.

Example:
{
  "accuracy": 0.9,
  "complexity": 0.8,
  "efficiency": 0.7
}
max_turns
integer | null

Maximum number of turns for agent execution before terminating (default: 10). Each turn represents one model inference cycle. Higher values allow more complex reasoning but increase cost and latency.

Required range: 1 <= x <= 100
Example:

5

auto_execute_tools
boolean
default:true

When False, skip server-side tool execution and return raw OpenAI-style tool_calls in the response.

Examples:

true

false

Response

JSON or SSE stream of ChatCompletionChunk events

Chat completion response for Dedalus API.

OpenAI-compatible chat completion response with Dedalus extensions. Maintains full compatibility with OpenAI API while providing additional features like server-side tool execution tracking and MCP error reporting.

id
string
required

A unique identifier for the chat completion.

choices
Choice · object[]
required

A list of chat completion choices. Can be more than one if n is greater than 1.

created
integer
required

The Unix timestamp (in seconds) of when the chat completion was created.

model
string
required

The model used for the chat completion.

object
string
required

The object type, which is always chat.completion.

Allowed value: "chat.completion"
service_tier
enum<string> | null

Specifies the processing type used for serving the request.

  • If set to 'auto', then the request will be processed with the service tier configured in the Project settings. Unless otherwise configured, the Project will use 'default'.
  • If set to 'default', then the request will be processed with the standard pricing and performance for the selected model.
  • If set to 'flex' or 'priority', then the request will be processed with the corresponding service tier.
  • When not set, the default behavior is 'auto'.

When the service_tier parameter is set, the response body will include the service_tier value based on the processing mode actually used to serve the request. This response value may be different from the value set in the parameter.

Available options:
auto,
default,
flex,
scale,
priority
system_fingerprint
string

This fingerprint represents the backend configuration that the model runs with.

Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.

usage
CompletionUsage · object

Usage statistics for the completion request.

tools_executed
string[] | null

List of tool names that were executed server-side (e.g., MCP tools). Only present when tools were executed on the server rather than returned for client-side execution.

mcp_server_errors
Mcp Server Errors · object

Information about MCP server failures, if any occurred during the request. Contains details about which servers failed and why, along with recommendations for the user. Only present when MCP server failures occurred.