Embeddings
Vector embeddings for semantic search, clustering, recommendation, and retrieval-augmented generation. The endpoint is the standard OpenAI embeddings.create — the gateway routes by model, so you can swap between providers (OpenAI, Voyage, Cohere) without changing your code.
Single input
from openai import OpenAI
client = OpenAI(
base_url="https://api.echotokens.com/v1",
api_key="sk-echo-...",
)
result = client.embeddings.create(
model="text-embedding-3-large",
input="The quick brown fox jumps over the lazy dog.",
)
vector = result.data[0].embedding
print(len(vector)) # 3072
print(result.cost_usd_cents) # tiny — embeddings are cheap
The response shape mirrors OpenAI: result.data is an array of { index, embedding } objects in input order, plus a top-level usage block and the cost_usd_cents passthrough field.
Batch input
Embedding endpoints accept arrays of strings — batching is dramatically faster and cheaper than one-call-per-input because the upstream amortizes the model load and you pay one HTTP round-trip:
texts = [
"The cat sat on the mat.",
"TCP guarantees ordered, reliable delivery.",
"espresso machines are cathedral engineering.",
]
result = client.embeddings.create(
model="text-embedding-3-large",
input=texts,
)
vectors = [item.embedding for item in result.data]
# vectors[i] corresponds to texts[i]
Batches up to a few hundred items work well in practice. For tens of thousands of documents, split into batches and parallelize the requests at the client.
Dimension control
text-embedding-3-large and text-embedding-3-small support a dimensions parameter — shorter vectors use less storage and compare faster at a small accuracy cost. Useful when you're pushing to a vector DB priced per-dimension:
result = client.embeddings.create(
model="text-embedding-3-large",
input="hello world",
dimensions=512,
)
print(len(result.data[0].embedding)) # 512
If you pass dimensions to a model that doesn't support it (e.g. voyage-3-large), the gateway returns a 400 — the upstream's native dimensionality is the only option there.
Picking a model
text-embedding-3-large— OpenAI's 3072-dim flagship. Strong general-purpose default.text-embedding-3-small— 1536 dims, cheaper, fast. Good for high-volume indexing.voyage-3-large— best-in-class for retrieval; native 1024 dims, code/legal/multilingual variants.cohere-embed-v4— multilingual specialist; pairs well with Cohere's reranker.
Vectors from different embedding models live in different geometric spaces — you cannot compare a voyage-3-large vector against a text-embedding-3-large one. Pick a model up front and stick with it for the lifetime of your index. If you need to migrate, re-embed everything in a single batch job.