Platform Management
Creating Pods (Requires Admin Authorization Token)
Reads config.yaml content and creates automatic compose file
curl -X 'POST' 'http://localhost:8080/create-pods/'
Deploy Pods (Requires Admin Authorization Token)
Reads config.yaml content and sends model deployment requests
curl -X 'POST' 'http://localhost:8080/deploy-pods/'
Running Autotests (Requires User Authorization Token)
Depending on deployed models, runs a set of autotests
curl -X 'POST' 'http://localhost:8080/test/'
Running Performance Tests (Requires User Authorization Token)
GET /v1/performance/
Get information about a running test (if any) or indication that the test is completed and result is saved.
Example:
curl -X 'GET' 'http://localhost:8080/v1/performance/?pod_id=0'
Response schema if test is in progress:
{"model":"Compressa-LLM","job":{"id":"b83cb091-5acd-4254-84b5-e1141ec01711","name":"performance TEST","status":"RUNNING","started_at":"2025-08-03T10:07:30.818815","exception_details":null,"retry":0},"message":null}
Response schema if test is completed
{"model":"Compressa-LLM","job":{"id":"b83cb091-5acd-4254-84b5-e1141ec01711","name":"performance TEST","status":"FINISHED","started_at":"2025-08-03T10:07:30.818815","exception_details":null,"retry":0},"message":"Results are available in /app/resources/performance_results"}
POST /v1/performance/
Start a test. Test parameters are passed in the request, such as number of examples, number of threads, report format.
Example
import requests
response = requests.post(
url="http://localhost:8080/v1/performance",
headers={
"accept": "application/json",
"Content-Type": "application/json"
},
json={
"pod_id": "0",
"payload": {
"report_mode": "pdf",
"num_tasks": 500,
"num_runners": 10,
"generate_prompts": False,
"num_prompts": 0,
"prompt_length": 0,
"max_tokens": 1000,
"report_file": "experiment",
"experiment_name": "Performance Test"
}
}
)
Response schema:
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"name": "string",
"status": "RUNNING",
"started_at": "2024-03-21T10:10:07.521Z"
}
After testing is complete, reports are saved in the selected format (pdf, md or csv) in the previously mounted RESOURCES_PATH folder
POST /v1/performance/interrupt/
Interrupt a running test.
Example:
curl -X 'POST' \
'http://localhost:8080/v1/performance/interrupt/?pod_id=0' \
-H 'accept: application/json' \
-d ''
Response schema:
{"id":"6be69577-ebe8-4062-a5cb-ad8085c11393","name":"performance TEST","status":"INTERRUPTED","started_at":"2025-08-03T10:18:00.368896","exception_details":null,"retry":0}
Running Quality Tests (Requires User Authorization Token)
GET /v1/metrics/
GET /v1/metrics/status
Get information about a running test or test result if the test is completed.
Example:
curl -X 'GET' \
'http://localhost:8080/v1/metrics/'
Response schema if test is in progress:
{"model":"Compressa-LLM","message":null,"job":{"id":"26908ef2-60b4-4211-9e7f-3a649c200419","name":"QUALITY TEST","status":"RUNNING","started_at":"2025-08-03T10:30:18.573467","exception_details":null,"retry":0}}
Response schema if test is completed
{"model":"Compressa-LLM","job":{"id":"b83cb091-5acd-4254-84b5-e1141ec01711","name":"observability TEST","status":"FINISHED","started_at":"2025-08-03T10:07:30.818815","exception_details":null,"retry":0},"message":"Results are available in /app/resources/quality_tests"}
POST /v1/metrics/
Start a test. Test parameters are passed in the request, such as dataset (standard or custom) and number of randomly selected examples (default - 10, if -1 is specified - all)
Request for standard dataset
curl -X 'POST' 'http://localhost:8080/v1/metrics/?dataset=medical&num_examples=5'
List of available standard datasets
- SberQuad -
sberquad - Medical questions dataset -
medical - Jeopardy and Own Game dataset -
jeopardy
Request for custom dataset
import requests
url = 'http://localhost:8080/v1/metrics/?dataset=custom'
payload = {
"question": [
"What is important when welding titanium and its alloys?",
"What is boric acid needed for in reactor coolant?",
"In what year did the Komsomolets submarine sink?",
],
"answers": [
"Ensure an inert atmosphere in the welding zone, prevent titanium oxidation",
"Boric acid is used as a liquid absorber to control the chain fission reaction",
"In 1989",
]
}
response = requests.post(
url,
headers={
"accept": "application/json",
"Content-Type": "application/json"
},
json=payload
)
Response schema:
{
{"id":"ce79eaee-4541-4341-ac21-b79d9bb7f9bb","name":"QUALITY TEST","status":"RUNNING","started_at":"2025-08-03T10:34:09.113636","exception_details":null,"retry":0}
}
After testing is complete, reports are saved in pdf, csv and json format in the previously mounted RESOURCES_PATH folder
POST /v1/metrics/interrupt/
Interrupt a running test.
Example:
curl -X 'POST' \
'http://localhost:8080/v1/metrics/interrupt/' \
-H 'accept: application/json' \
-d ''
Response schema:
{"id":"ce79eaee-4541-4341-ac21-b79d9bb7f9bb","name":"QUALITY TEST","status":"INTERRUPTED","started_at":"2025-08-03T10:34:09.113636","exception_details":null,"retry":0}