Skip to main content
Streaming shows output token-by-token instead of waiting for the complete response. Users see progress immediately, which matters for longer outputs or interactive applications.

Stream in one line

Set stream=True so users see progress as the agent works.
import asyncio
from dedalus_labs import AsyncDedalus, DedalusRunner
from dedalus_labs.utils.stream import stream_async
from dotenv import load_dotenv

load_dotenv()

async def main():
    client = AsyncDedalus()
    runner = DedalusRunner(client)

    stream = runner.run(
        input="Find me the nearest basketball games in January in San Francisco (stream your work).",
        model="anthropic/claude-opus-4-5",
        mcp_servers=["windsor/ticketmaster-mcp"],  # Discover events via Ticketmaster
        stream=True,
    )

    await stream_async(stream)

if __name__ == "__main__":
    asyncio.run(main())

Streaming with Tools

Streaming works with tool-calling workflows. You can stream while the agent calls local tools, MCPs, or both.
import asyncio

from dedalus_labs import AsyncDedalus, DedalusRunner
from dedalus_labs.utils.stream import stream_async
from dotenv import load_dotenv

load_dotenv()

def summarize_headlines(headlines: list[str]) -> str:
    """Format headlines as a short bullet list."""
    return "\n".join(f"• {h}" for h in headlines[:3])

async def main():
    client = AsyncDedalus()
    runner = DedalusRunner(client)

    stream = runner.run(
        input=(
            "Search for AI news. Extract 3 headlines. "
            "Then call summarize_headlines(headlines) and stream your final answer."
        ),
        model="openai/gpt-5.2",
        mcp_servers=["windsor/brave-search-mcp"],  # Web search via Brave Search MCP
        tools=[summarize_headlines],
        stream=True,
    )

    await stream_async(stream)

if __name__ == "__main__":
    asyncio.run(main())

Compare: non-streaming vs streaming (same scenario)

The scenario below is the same in both snippets. The only difference is whether you set stream=True and iterate over the stream.
In Python, non-streaming refers to stream=False, not “sync”. If you use AsyncDedalus, you’ll still write async code and use asyncio.run(...). If you prefer fully synchronous code, use the Dedalus client (example below).

Python

import asyncio
from dedalus_labs import AsyncDedalus, DedalusRunner
from dotenv import load_dotenv

load_dotenv()

async def main():
    client = AsyncDedalus()
    runner = DedalusRunner(client)

    result = await runner.run(
        input="Find me the nearest basketball games in January in San Francisco.",
        model="anthropic/claude-opus-4-5",
        mcp_servers=["windsor/ticketmaster-mcp"],  # Discover events via Ticketmaster
    )

    # You only see output after the full run completes.
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

Python (sync client)

from dedalus_labs import Dedalus, DedalusRunner
from dotenv import load_dotenv

load_dotenv()

def main():
    client = Dedalus()
    runner = DedalusRunner(client)

    result = runner.run(
        input="Find me the nearest basketball games in January in San Francisco.",
        model="anthropic/claude-opus-4-5",
        mcp_servers=["windsor/ticketmaster-mcp"],  # Discover events via Ticketmaster
    )

    print(result.final_output)

if __name__ == "__main__":
    main()

TypeScript

import Dedalus from "dedalus-labs";
import { DedalusRunner } from "dedalus-labs";

const client = new Dedalus();
const runner = new DedalusRunner(client, true);

async function main() {
	const result = await runner.run({
		input: "Find me the nearest basketball games in January in San Francisco.",
		model: "anthropic/claude-opus-4-5",
		mcpServers: ["windsor/ticketmaster-mcp"], // Discover events via Ticketmaster
	});

	console.log((result as any).finalOutput);
}

main();

How the user experience differs

  • Progressive rendering: you can display text as it arrives (“typing”), instead of waiting for a complete response.
  • Visible work: in tool/MCP workflows, you can show status updates (e.g., “Searching Ticketmaster…”) while the agent is calling tools.
  • Interruptibility: you can stop early (client-side) if the user already has what they need, instead of paying for a full completion.

When to Stream

Stream when:
  • Building chat interfaces where perceived latency matters
  • Generating long-form content (articles, code, analysis)
  • Running in terminals or logs where progress feedback helps
Don’t stream when:
  • You need to parse the complete response before displaying
  • Using structured outputs with .parse()
  • Response time is already fast enough

Next steps

  • Route across models: Handoffs — Use fast/strong models by phase
  • Add images last: Images & Vision — Add multimodality when your text workflow is solid
  • See patterns: Use Cases — More streaming agent examples
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.
Last modified on February 28, 2026