Skip to main content

UI Interface

URL: http://localhost:8080/chat/

UI Playground for testing different LLM settings and prompt selection. Don't forget to specify the user token in the special field.

API Interface

You can interact with LLM through the Langchain library or through a direct cURL request.

In addition, our APIs are compatible with OpenAI, more details on the separate page.

note

When initially installing Compressa, the "Compressa-LLM" model is used. If you've replaced the LLM model, you need to specify your model name in the examples below.

Calling the Model Without Streaming

#pip install langchain-openai - if you don't have this package yet
#pip install langchain-core - if you don't have this package yet
#pip install langchain - if you don't have this package yet

from langchain_openai import ChatOpenai

llm = ChatCompressa(
base_url="http://localhost:5000/v1",
api_key="Your_user_API_key",
temperature=0.7,
max_tokens=150,
stream="false",
model="Compressa-LLM"
)

messages = [
("system", "You are a helpful assistant who translates from Russian to English. Translate the user's sentence."),
("human", "I love programming.")
]

ai_msg = llm.invoke(messages)
print(f"Model response: {ai_msg.content}")

# Model response: I love programming.

Calling the Model with Response Streaming Option

import requests

response = requests.post(
url="http://localhost:5000/v1/chat/completions",
headers={
"Authorization": "Bearer Your_user_API_key",
"accept": "application/json",
"Content-Type": "application/json"
},
json={
"model": "Compressa-LLM",
"messages": [
{
"role": "system",
"content": "You are an expert in soccer"
},
{
"role": "user",
"content": "Write a bedtime story about a kind artificial intelligence!"
}
],
"max_tokens": 512,
"temperature": 0.7,
"stream": True
}
)

for chunk in response.iter_content(chunk_size=None):
if chunk:
print(chunk.decode('utf-8'))

# Example data:
# data: {"id":"126","object":"chat.completion.chunk","created":1728680725,"model":"Compressa-LLM","choices":[{"index":0,"delta":{"role":"assistant","content":"Once"},"logprobs":null,"finish_reason":null}],"usage":null}
# data: {"id":"126","object":"chat.completion.chunk","created":1728680725,"model":"Compressa-LLM","choices":[{"index":0,"delta":{"role":"assistant","content":" upon"},"logprobs":null,"finish_reason":null}],"usage":null}
# ...