UI Interface
URL: http://localhost:8080/chat/
UI Playground for testing different LLM settings and prompt selection. Don't forget to specify the user token in the special field.
API Interface
You can interact with LLM through the Langchain library or through a direct cURL request.
In addition, our APIs are compatible with OpenAI, more details on the separate page.
note
When initially installing Compressa, the "Compressa-LLM" model is used. If you've replaced the LLM model, you need to specify your model name in the examples below.
Calling the Model Without Streaming
- Python (Langchain)
- cURL
#pip install langchain-openai - if you don't have this package yet
#pip install langchain-core - if you don't have this package yet
#pip install langchain - if you don't have this package yet
from langchain_openai import ChatOpenai
llm = ChatCompressa(
base_url="http://localhost:5000/v1",
api_key="Your_user_API_key",
temperature=0.7,
max_tokens=150,
stream="false",
model="Compressa-LLM"
)
messages = [
("system", "You are a helpful assistant who translates from Russian to English. Translate the user's sentence."),
("human", "I love programming.")
]
ai_msg = llm.invoke(messages)
print(f"Model response: {ai_msg.content}")
# Model response: I love programming.
curl -X POST \
'http://localhost:5000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer Your_user_API_key' \
-d '{
"model": "Compressa-LLM",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant who translates from Russian to English. Translate the user's sentence."
},
{
"role": "user",
"content": "I love programming."
}
],
"max_tokens": 128,
"temperature": 0.5,
"stream": false
}'
Calling the Model with Response Streaming Option
- Python (requests)
- cURL
import requests
response = requests.post(
url="http://localhost:5000/v1/chat/completions",
headers={
"Authorization": "Bearer Your_user_API_key",
"accept": "application/json",
"Content-Type": "application/json"
},
json={
"model": "Compressa-LLM",
"messages": [
{
"role": "system",
"content": "You are an expert in soccer"
},
{
"role": "user",
"content": "Write a bedtime story about a kind artificial intelligence!"
}
],
"max_tokens": 512,
"temperature": 0.7,
"stream": True
}
)
for chunk in response.iter_content(chunk_size=None):
if chunk:
print(chunk.decode('utf-8'))
# Example data:
# data: {"id":"126","object":"chat.completion.chunk","created":1728680725,"model":"Compressa-LLM","choices":[{"index":0,"delta":{"role":"assistant","content":"Once"},"logprobs":null,"finish_reason":null}],"usage":null}
# data: {"id":"126","object":"chat.completion.chunk","created":1728680725,"model":"Compressa-LLM","choices":[{"index":0,"delta":{"role":"assistant","content":" upon"},"logprobs":null,"finish_reason":null}],"usage":null}
# ...
curl -X POST \
'http://localhost:5000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer Your_user_API_key' \
-d '{
"model": "Compressa-LLM",
"messages": [
{
"role": "user",
"content": "Write a bedtime story about a kind artificial intelligence!"
}
],
"max_tokens": 512,
"temperature": 0.7,
"stream": true
}'