UI Interface

URL: http://localhost:8080/chat/

UI Playground for testing different LLM settings and prompt selection. Don't forget to specify the user token in the special field.

API Interface

You can interact with LLM through the Langchain library or through a direct cURL request.

In addition, our APIs are compatible with OpenAI, more details on the separate page.

note

When initially installing Compressa, the "Compressa-LLM" model is used. If you've replaced the LLM model, you need to specify your model name in the examples below.

Calling the Model Without Streaming

Python (Langchain)
cURL

#pip install langchain-openai - if you don't have this package yet
#pip install langchain-core - if you don't have this package yet
#pip install langchain - if you don't have this package yet

from langchain_openai import ChatOpenai

llm = ChatCompressa(
    base_url="http://localhost:5000/v1", 
    api_key="Your_user_API_key", 
    temperature=0.7, 
    max_tokens=150, 
    stream="false",
    model="Compressa-LLM"
)

messages = [
    ("system", "You are a helpful assistant who translates from Russian to English. Translate the user's sentence."),
    ("human", "I love programming.")
]

ai_msg = llm.invoke(messages)
print(f"Model response: {ai_msg.content}")

# Model response: I love programming.

curl -X POST \
  'http://localhost:5000/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer Your_user_API_key' \
  -d '{
    "model": "Compressa-LLM",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant who translates from Russian to English. Translate the user's sentence."
      },
      {
        "role": "user",
        "content": "I love programming."
      }
    ],
    "max_tokens": 128,
    "temperature": 0.5,
    "stream": false
}'

Calling the Model with Response Streaming Option

Python (requests)
cURL

import requests

response = requests.post(
  url="http://localhost:5000/v1/chat/completions", 
  headers={
    "Authorization": "Bearer Your_user_API_key", 
    "accept": "application/json",
    "Content-Type": "application/json"
  },
  json={
    "model": "Compressa-LLM",
    "messages": [
      {
        "role": "system",
        "content": "You are an expert in soccer"
      },
      {
        "role": "user",
        "content": "Write a bedtime story about a kind artificial intelligence!"
      }
    ],
    "max_tokens": 512,
    "temperature": 0.7,
    "stream": True
  }
)

for chunk in response.iter_content(chunk_size=None):
    if chunk:
        print(chunk.decode('utf-8'))

# Example data:
# data: {"id":"126","object":"chat.completion.chunk","created":1728680725,"model":"Compressa-LLM","choices":[{"index":0,"delta":{"role":"assistant","content":"Once"},"logprobs":null,"finish_reason":null}],"usage":null}
# data: {"id":"126","object":"chat.completion.chunk","created":1728680725,"model":"Compressa-LLM","choices":[{"index":0,"delta":{"role":"assistant","content":" upon"},"logprobs":null,"finish_reason":null}],"usage":null}
# ...

curl -X POST \
  'http://localhost:5000/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer Your_user_API_key' \
  -d '{
    "model": "Compressa-LLM",
    "messages": [
      {
        "role": "user",
        "content": "Write a bedtime story about a kind artificial intelligence!"
      }
    ],
    "max_tokens": 512,
    "temperature": 0.7,
    "stream": true
}'

UI Interface

API Interface

Calling the Model Without Streaming​

Calling the Model with Response Streaming Option​

Calling the Model Without Streaming

Calling the Model with Response Streaming Option