Streaming

For chat completions where you want tokens as they're generated — building a chat UI, a CLI that prints word-by-word, or just trimming time-to-first-token — pass stream: true and read the Server-Sent Events. The wire format is identical to OpenAI's, so the SDK does the parsing for you.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.echotokens.com/v1",
    api_key="sk-echo-...",
)

stream = client.chat.completions.create(
    model="claude-opus-4.7",
    messages=[{"role": "user", "content": "Explain the OSI model in plain English."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Each chunk is a partial completion. The delta.content field holds the substring added since the previous chunk; concatenate them to reconstruct the final message. The stream ends when the SDK iterator stops.

Node / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.echotokens.com/v1",
  apiKey: process.env.ECHOTOKENS_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "gpt-5.5-pro",
  messages: [{ role: "user", content: "Explain the OSI model in plain English." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

Tool calls and structured output

Streaming and function-calling compose. The same delta.tool_calls chunks appear in the stream as you'd see from OpenAI — partial JSON arguments arrive across multiple chunks, so accumulate the strings yourself before parsing the final object.

Cost during streaming

Streaming responses bill at the same flat-USD passthrough as non-streaming calls — but cost_usd_cents only finalizes once the stream completes. If you cancel mid-stream, the gateway still charges for the upstream tokens delivered before disconnect; downstream providers don't refund partial completions. Budget for the worst case if you build cancellation into your UX.

reconnect on transient errors

If your stream drops on a network blip, the OpenAI SDK retries the whole request — you'll be billed twice for two completions. For production agents, wrap streaming calls with your own resumable-state logic (cache the assistant text, replay on resume) rather than relying on SDK retries.

When to stream vs. not

Stream if your user is watching the output in real time (chat UIs, terminals).
Don't stream if you're going to parse the entire response anyway (structured JSON for an agent, embeddings, summarization pipelines). Non-streaming responses arrive in one network round-trip with a cleaner error surface.