Streaming
For chat completions where you want tokens as they're generated — building a chat UI, a CLI that prints word-by-word, or just trimming time-to-first-token — pass stream: true and read the Server-Sent Events. The wire format is identical to OpenAI's, so the SDK does the parsing for you.
Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.echotokens.com/v1",
api_key="sk-echo-...",
)
stream = client.chat.completions.create(
model="claude-opus-4.7",
messages=[{"role": "user", "content": "Explain the OSI model in plain English."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
Each chunk is a partial completion. The delta.content field holds the substring added since the previous chunk; concatenate them to reconstruct the final message. The stream ends when the SDK iterator stops.
Node / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.echotokens.com/v1",
apiKey: process.env.ECHOTOKENS_API_KEY,
});
const stream = await client.chat.completions.create({
model: "gpt-5.5-pro",
messages: [{ role: "user", content: "Explain the OSI model in plain English." }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}
Tool calls and structured output
Streaming and function-calling compose. The same delta.tool_calls chunks appear in the stream as you'd see from OpenAI — partial JSON arguments arrive across multiple chunks, so accumulate the strings yourself before parsing the final object.
Cost during streaming
Streaming responses bill at the same flat-USD passthrough as non-streaming calls — but cost_usd_cents only finalizes once the stream completes. If you cancel mid-stream, the gateway still charges for the upstream tokens delivered before disconnect; downstream providers don't refund partial completions. Budget for the worst case if you build cancellation into your UX.
If your stream drops on a network blip, the OpenAI SDK retries the whole request — you'll be billed twice for two completions. For production agents, wrap streaming calls with your own resumable-state logic (cache the assistant text, replay on resume) rather than relying on SDK retries.
When to stream vs. not
- Stream if your user is watching the output in real time (chat UIs, terminals).
- Don't stream if you're going to parse the entire response anyway (structured JSON for an agent, embeddings, summarization pipelines). Non-streaming responses arrive in one network round-trip with a cleaner error surface.