Skip to main content

LLM

The Compressa platform includes a ready-made module for fast and cost-effective inference of open-source LLM models on your server. We've already applied the best optimization techniques at the infrastructure level so you save on costs and improve user experience.

As part of the cloud version, we provide a test API for one of the current open-source models (e.g., Qwen or Llama), but with load limitations.

You can interact with LLM through our Python library for Langchain or through a direct cURL request.

In addition, our APIs are compatible with OpenAI, more details on the separate page.

Calling the model without streaming

#pip install langchain-openai - if you don't have this package yet
#pip install langchain-core - if you don't have this package yet
#pip install langchain - if you don't have this package yet

from langchain-openai import ChatOpenAI

llm = ChatOpenAI(
model="Compressa-LLM",
base_url="https://compressa-api.mil-team.ru/v1",
api_key="Your_Compressa_API_key",
temperature=0.7,
max_tokens=150,
stream="false"
)

messages = [
("system", "You are a helpful assistant who translates from Russian to English. Translate the user's sentence."),
("human", "I love programming.")
]

ai_msg = llm.invoke(messages)
print(f"Model response: {ai_msg.content}")

# Model response: I love programming.

Calling the model with response streaming option

import requests

response = requests.post(
url="https://compressa-api.mil-team.ru/v1/chat/completions",
headers={
"Authorization": "Bearer Your_Compressa_API_key",
"accept": "application/json",
"Content-Type": "application/json"
},
json={
"model": "Compressa-LLM",
"messages": [
{
"role": "system",
"content": "You are an expert in soccer"
},
{
"role": "user",
"content": "Write a bedtime story about a kind artificial intelligence!"
}
],
"max_tokens": 512,
"temperature": 0.7,
"stream": True
}
)

for chunk in response.iter_content(chunk_size=None):
if chunk:
print(chunk.decode('utf-8'))

# Example data:
# data: {"id":"126","object":"chat.completion.chunk","created":1728680725,"model":"Compressa-LLM","choices":[{"index":0,"delta":{"role":"assistant","content":"Once"},"logprobs":null,"finish_reason":null}],"usage":null}
# data: {"id":"126","object":"chat.completion.chunk","created":1728680725,"model":"Compressa-LLM","choices":[{"index":0,"delta":{"role":"assistant","content":" upon"},"logprobs":null,"finish_reason":null}],"usage":null}
# ...