Skip to main content

Embeddings

Embeddings are numerical representations of text strings that measure their relationships. Text embeddings measure the similarity of text strings and are often used for the following tasks:

  • Search (results are sorted by relevance to the query)
  • Clustering (grouping text strings by similarity)
  • Recommendations (recommending items with similar text strings)
  • Anomaly detection (finding elements that differ significantly from others)
  • Diversity measurement (analyzing similarity distribution)
  • Classification (classifying text strings based on their similarity to labels)

An embedding is a vector (list) of numbers. The distance between two vectors measures their similarity: a small distance indicates high similarity, while a large distance indicates low similarity.

Creating embeddings for single and multiple text objects

# from openai import OpenAI - if you don't have this package yet

client = OpenAI(
api_key = "Your_Compressa_API_key",
base_url = "https://compressa-api.mil-team.ru/v1"
)

# Creating an embedding for a single query
embedding = client.embeddings.create(
model="Compressa-Embedding",
input="How to cook borscht?",
encoding_format="float",
)

# Creating embeddings for multiple documents
docs = [
"Borscht is a traditional Slavic soup",
"To make borscht you need beets",
"Borscht is usually served with sour cream",
"Meat is often added to borscht",
"Borscht has a characteristic red color"
]

embeddings = client.embeddings.create(
model="Compressa-Embedding",
input=docs,
encoding_format="float",
)