Embeddings
Embeddings are numerical representations of text strings that measure their relationships. Text embeddings measure the similarity of text strings and are often used for the following tasks:
- Search (results are sorted by relevance to the query)
- Clustering (grouping text strings by similarity)
- Recommendations (recommending items with similar text strings)
- Anomaly detection (finding elements that differ significantly from others)
- Diversity measurement (analyzing similarity distribution)
- Classification (classifying text strings based on their similarity to labels)
An embedding is a vector (list) of numbers. The distance between two vectors measures their similarity: a small distance indicates high similarity, while a large distance indicates low similarity.
Creating embeddings for single and multiple text objects
- Python (OpenAI client)
- cURL
# from openai import OpenAI - if you don't have this package yet
client = OpenAI(
api_key = "Your_Compressa_API_key",
base_url = "https://compressa-api.mil-team.ru/v1"
)
# Creating an embedding for a single query
embedding = client.embeddings.create(
model="Compressa-Embedding",
input="How to cook borscht?",
encoding_format="float",
)
# Creating embeddings for multiple documents
docs = [
"Borscht is a traditional Slavic soup",
"To make borscht you need beets",
"Borscht is usually served with sour cream",
"Meat is often added to borscht",
"Borscht has a characteristic red color"
]
embeddings = client.embeddings.create(
model="Compressa-Embedding",
input=docs,
encoding_format="float",
)
curl -X 'POST' \
'https://compressa-api.mil-team.ru/v1/embeddings' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer Your_Compressa_API_key' \
-d '{
"model": "Compressa-Embedding",
"input": ["text_one", "text_two"]
}'
If you want to dive deeper into Embeddings and understand how semantic search works technically - check out our practical guide.