Streaming

For chat completions where you want tokens as they're generated — building a chat UI, a CLI that prints word-by-word, or just trimming time-to-first-token — pass stream: true and read the Server-Sent Events. The wire format is identical to OpenAI's, so the SDK does the parsing for you.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.echotokens.com/v1",
    api_key="sk-echo-...",
)

stream = client.chat.completions.create(
    model="claude-opus-4.7",
    messages=[{"role": "user", "content": "Explain the OSI model in plain English."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Each chunk is a partial completion. The delta.content field holds the substring added since the previous chunk; concatenate them to reconstruct the final message. The stream ends when the SDK iterator stops.

Node / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.echotokens.com/v1",
  apiKey: process.env.ECHOTOKENS_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "gpt-5.5-pro",
  messages: [{ role: "user", content: "Explain the OSI model in plain English." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

Tool calls and structured output

Streaming and function-calling compose. The same delta.tool_calls chunks appear in the stream as you'd see from OpenAI — partial JSON arguments arrive across multiple chunks, so accumulate the strings yourself before parsing the final object.

Cost during streaming

Streaming responses bill at the same flat-USD passthrough as non-streaming calls — but cost_usd_cents only finalizes once the stream completes. If you cancel mid-stream, the gateway still charges for the upstream tokens delivered before disconnect; downstream providers don't refund partial completions. Budget for the worst case if you build cancellation into your UX.

reconnect on transient errors

If your stream drops on a network blip, the OpenAI SDK retries the whole request — you'll be billed twice for two completions. For production agents, wrap streaming calls with your own resumable-state logic (cache the assistant text, replay on resume) rather than relying on SDK retries.

When to stream vs. not

  • Stream if your user is watching the output in real time (chat UIs, terminals).
  • Don't stream if you're going to parse the entire response anyway (structured JSON for an agent, embeddings, summarization pipelines). Non-streaming responses arrive in one network round-trip with a cleaner error surface.