Skip to main content

Embeddings

Embeddings are numerical representations of text strings that measure their relationships. Text embeddings measure the degree of similarity between text strings and are often used for the following tasks:

  • Search (results are sorted by relevance to the query)
  • Clustering (grouping text strings by similarity)
  • Recommendations (recommending items with similar text strings)
  • Anomaly detection (finding items that differ significantly from others)
  • Diversity measurement (analyzing similarity distribution)
  • Classification (classifying text strings based on their similarity to labels)

An embedding is a vector (list) of numbers. The distance between two vectors measures their degree of similarity: a small distance indicates high similarity, and a large distance indicates low similarity.

Creating Embeddings for Single and Multiple Text Objects

#pip install langchain-openai - if you don't have this package yet


from langchain_openai import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings(
model="CompressaEmbeddings",
base_url="http://localhost:5000/v1",
api_key="Your_API_key_Compressa",
)


# Create embedding for a single query
query_embedding = embeddings.embed_query("How to cook borscht?")

# Create embeddings for multiple documents
docs_embeddings = embeddings.embed_documents([
"Borscht is a traditional Slavic soup",
"Beets are needed to make borscht",
"Borscht is usually served with sour cream",
"Meat is often added to borscht",
"Borscht has a characteristic red color"
])