Embeddings

Vector embeddings for semantic search, clustering, recommendation, and retrieval-augmented generation. The endpoint is the standard OpenAI embeddings.create — the gateway routes by model, so you can swap between providers (OpenAI, Voyage, Cohere) without changing your code.

Single input

from openai import OpenAI

client = OpenAI(
    base_url="https://api.echotokens.com/v1",
    api_key="sk-echo-...",
)

result = client.embeddings.create(
    model="text-embedding-3-large",
    input="The quick brown fox jumps over the lazy dog.",
)

vector = result.data[0].embedding
print(len(vector))         # 3072
print(result.cost_usd_cents)  # tiny — embeddings are cheap

The response shape mirrors OpenAI: result.data is an array of { index, embedding } objects in input order, plus a top-level usage block and the cost_usd_cents passthrough field.

Batch input

Embedding endpoints accept arrays of strings — batching is dramatically faster and cheaper than one-call-per-input because the upstream amortizes the model load and you pay one HTTP round-trip:

texts = [
    "The cat sat on the mat.",
    "TCP guarantees ordered, reliable delivery.",
    "espresso machines are cathedral engineering.",
]

result = client.embeddings.create(
    model="text-embedding-3-large",
    input=texts,
)

vectors = [item.embedding for item in result.data]
# vectors[i] corresponds to texts[i]

Batches up to a few hundred items work well in practice. For tens of thousands of documents, split into batches and parallelize the requests at the client.

Dimension control

text-embedding-3-large and text-embedding-3-small support a dimensions parameter — shorter vectors use less storage and compare faster at a small accuracy cost. Useful when you're pushing to a vector DB priced per-dimension:

result = client.embeddings.create(
    model="text-embedding-3-large",
    input="hello world",
    dimensions=512,
)

print(len(result.data[0].embedding))  # 512

If you pass dimensions to a model that doesn't support it (e.g. voyage-3-large), the gateway returns a 400 — the upstream's native dimensionality is the only option there.

Picking a model

  • text-embedding-3-large — OpenAI's 3072-dim flagship. Strong general-purpose default.
  • text-embedding-3-small — 1536 dims, cheaper, fast. Good for high-volume indexing.
  • voyage-3-large — best-in-class for retrieval; native 1024 dims, code/legal/multilingual variants.
  • cohere-embed-v4 — multilingual specialist; pairs well with Cohere's reranker.
match models across writes and reads

Vectors from different embedding models live in different geometric spaces — you cannot compare a voyage-3-large vector against a text-embedding-3-large one. Pick a model up front and stick with it for the lifetime of your index. If you need to migrate, re-embed everything in a single batch job.