Management API
URL: http://localhost:5100/
API allows:
- adding models
- deploying models
- interrupting model deployment
- running Performance tests using the compressa-perf library
- running Observability tests on standard or custom datasets using the DeepEval library or Self-Scoring (for Qwen2.5)
Model Library
GET /v1/models/
List of models available for launch and fine-tuning.
Example:
curl -X 'GET' \
'http://localhost:5100/v1/models/' \
-H 'accept: application/json'
Response Schema:
[
{
"model_id": "string",
"adapter": true,
"base_model_id": "string"
}
]
POST /v1/models/add/
Downloading a model from Hugging Face.
Parameters:
- query:
model_id- model identifier on Hugging Face, e.g.Qwen/Qwen3-14Bfor link.
Example:
curl -X 'POST' \
'http://localhost:5100/v1/models/add/?model_id=mymodel_id' \
-H 'accept: application/json' \
-d ''
Response Schema:
{
"id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
"name": "DOWNLOAD_mymodel_id",
"status": "RUNNING",
"started_at": "2024-03-21T09:58:29.846708"
}
Model Deployment
GET /v1/deploy/
Get information about the currently deployed model.
Example:
curl -X 'GET' \
'http://localhost:5100/v1/deploy/' \
-H 'accept: application/json'
Response Schema:
{
"model_id": "string",
"adapter_ids": [
"string"
]
}
POST /v1/deploy/
Launch models and fine-tuned adapters for inference. List of ids can be obtained using GET /v1/models/.
Request Body:
{
"model_id": "string",
"adapter_ids": [
"string"
]
}
model_id- model identifieradapter_ids- list of adapter identifiers
Example:
curl -X 'POST' \
'http://localhost:5100/v1/deploy/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model_id": "model_id1",
"adapter_ids": [
"adapter_id1",
"adapter_id2"
]
}'
Response Schema:
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"name": "string",
"status": "CREATED",
"started_at": "2024-03-21T10:10:07.521Z"
}
GET /v1/deploy/status
Get status of the deployed model.
Example:
curl -X 'GET' \
'http://localhost:5100/v1/deploy/status/' \
-H 'accept: application/json'
Response Schema:
{
"model_id": "model_id1",
"adapter_ids": [
"adapter_id1",
"adapter_id2"
],
"job": {
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "RUNNING",
"started_at": "2024-03-21T07:35:16.861681"
}
}
POST /v1/deploy/interrupt/
Disconnect the currently deployed model.
Example:
curl -X 'POST' \
'http://localhost:5100/v1/deploy/interrupt/' \
-H 'accept: application/json' \
-d ''
Response Schema:
{
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "RUNNING",
"started_at": "2024-03-21T07:35:16.861681"
}
Jobs
Operations such as model loading or deployment are associated with jobs. The following APIs allow managing job execution.
GET /v1/jobs/
Get all jobs with statuses.
Example:
curl -X 'GET' \
'http://localhost:5100/v1/jobs/' \
-H 'accept: application/json'
Response Schema:
[
{
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "RUNNING",
"started_at": "2024-03-21T07:35:16.861681"
},
{
"id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
"name": "DOWNLOAD_test",
"status": "FINISHED",
"started_at": "2024-03-21T09:58:29.846708"
}
]
GET /v1/jobs/{job_id}/status/
Get the latest status of job with job_id.
Parameters:
- path:
job_id
Example:
curl -X 'GET' \
'http://localhost:5100/v1/jobs/4d78d943-1896-4d7b-9f11-b10cc2389ba3/status/' \
-H 'accept: application/json'
Response Schema:
{
"id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
"name": "DOWNLOAD_test",
"status": "FINISHED",
"started_at": "2024-03-21T09:58:29.846708"
}
POST /v1/jobs/{job_id}/interrupt/
Interrupt execution of job with job_id.
Parameters:
- path:
job_id
Example:
curl -X 'POST' \
'http://localhost:5100/v1/jobs/8a63349c-078f-4e98-8968-4f011593329c/interrupt/' \
-H 'accept: application/json' \
-d ''
Response Schema:
{
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "KILLED",
"started_at": "2024-03-21T07:35:16.861681"
}
Running Performance Tests
GET /v1/performance/
GET /v1/performance/status
Get information about a running test (if any) or indication that the test is completed and the result is saved.
Example:
curl -X 'GET' 'http://localhost:5100/v1/performance/'
Response Schema if test is in progress:
{"model":"Compressa-LLM","job":{"id":"b83cb091-5acd-4254-84b5-e1141ec01711","name":"performance TEST","status":"RUNNING","started_at":"2025-08-03T10:07:30.818815","exception_details":null,"retry":0},"message":null}
Response Schema if test is completed
{"model":"Compressa-LLM","job":{"id":"b83cb091-5acd-4254-84b5-e1141ec01711","name":"performance TEST","status":"FINISHED","started_at":"2025-08-03T10:07:30.818815","exception_details":null,"retry":0},"message":"Results are available in /app/resources/performance_results"}
POST /v1/performance/
Start a test. Test parameters are passed in the request, such as number of examples, number of threads, report format.
Example
curl -X 'POST' 'http://localhost:5100/v1/performance/?num_tasks=100&num_runners=10&report_mode=md'
Response Schema:
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"name": "string",
"status": "RUNNING",
"started_at": "2024-03-21T10:10:07.521Z"
}
After testing is complete, reports are saved in the selected format (pdf, md or csv) in the previously mounted RESOURCES_PATH folder
POST /v1/performance/interrupt/
Interrupt a running test.
Example:
curl -X 'POST' \
'http://localhost:5100/v1/performance/interrupt/' \
-H 'accept: application/json' \
-d ''
Response Schema:
{"id":"6be69577-ebe8-4062-a5cb-ad8085c11393","name":"performance TEST","status":"INTERRUPTED","started_at":"2025-08-03T10:18:00.368896","exception_details":null,"retry":0}
Running Quality Tests
GET /v1/metrics/
GET /v1/metrics/status
Get information about a running test or test result if the test is completed.
Example:
curl -X 'GET' \
'http://localhost:5100/v1/metrics/'
Response Schema if test is in progress:
{"model":"Compressa-LLM","message":null,"job":{"id":"26908ef2-60b4-4211-9e7f-3a649c200419","name":"QUALITY TEST","status":"RUNNING","started_at":"2025-08-03T10:30:18.573467","exception_details":null,"retry":0}}
Response Schema if test is completed
{"model":"Compressa-LLM","job":{"id":"b83cb091-5acd-4254-84b5-e1141ec01711","name":"observability TEST","status":"FINISHED","started_at":"2025-08-03T10:07:30.818815","exception_details":null,"retry":0},"message":"Results are available in /app/resources/quality_tests"}
POST /v1/metrics/
Start a test. Test parameters are passed in the request, such as dataset (standard or custom) and number of randomly selected examples (default - 10, if -1 is specified - all)
Request for standard dataset
curl -X 'POST' 'http://localhost:5100/v1/metrics/?dataset=medical&num_examples=5'
List of available standard datasets
- SberQuad -
sberquad - Medical questions dataset -
medical - Jeopardy and Own Game dataset -
jeopardy
Request for custom dataset
import requests
url = 'http://localhost:5100/v1/metrics/?dataset=custom'
payload = {
"question": [
"What is important when welding titanium and its alloys?",
"What is boric acid needed for in reactor coolant?",
"In what year did the Komsomolets submarine sink?",
],
"answers": [
"Ensure an inert atmosphere in the welding zone, prevent titanium oxidation",
"Boric acid is used as a liquid absorber to control the chain fission reaction",
"In 1989",
]
}
response = requests.post(
url,
headers={
"accept": "application/json",
"Content-Type": "application/json"
},
json=payload
)
Response Schema:
{
{"id":"ce79eaee-4541-4341-ac21-b79d9bb7f9bb","name":"QUALITY TEST","status":"RUNNING","started_at":"2025-08-03T10:34:09.113636","exception_details":null,"retry":0}
}
After testing is complete, reports are saved in pdf, csv and json format in the previously mounted RESOURCES_PATH folder
POST /v1/metrics/interrupt/
Interrupt a running test.
Example:
curl -X 'POST' \
'http://localhost:5100/v1/metrics/interrupt/' \
-H 'accept: application/json' \
-d ''
Response Schema:
{"id":"ce79eaee-4541-4341-ac21-b79d9bb7f9bb","name":"QUALITY TEST","status":"INTERRUPTED","started_at":"2025-08-03T10:34:09.113636","exception_details":null,"retry":0}
Self-Check Mode - Model self-assessment of the quality of its answers on a custom dataset without labels.
Experimental mode.
About the evaluation method
Based on the EM (Expectation-Maximization) algorithm for evaluating the quality of language models without ground truth answers. The algorithm analyzes 6 key metrics from 0 to 10 (the higher, the better):
- Self-Consistency — consistency of answers at different temperatures
- Self-Rating — model's self-assessment of its answers
- STD-Rating — spread of model's self-assessment scores at different temperatures
- Abstention — model's ability to express uncertainty
- Chain-of-Thought Critique — quality of step-by-step reasoning
- Paraphrase Self-Consistency — consistency when paraphrasing questions
The EM algorithm ranks each question from the dataset depending on the model's behavior on it. Each question is assigned its own latent score, which shows only the comparative quality of the model relative to other questions. The higher the latent score, the better the model's quality on this question.
Since different datasets may have different distributions of model answer quality, EM model weights are introduced, which show how strongly each metric affects the final latent score. The weight of a metric is greater the better it describes the entire dataset and agrees with other metrics. The final score is aggregated as a weighted average of metric scores by latent score quantiles. Scores are averaged taking into account EM model weights. This makes it possible to obtain a final objective assessment of model quality that can be compared with other datasets.
import requests
url = 'http://localhost:5100/v1/metrics/?dataset=custom&mode=no_gt'
payload = {
"question": [
"What factors increase the risk of developing malignant tumors?",
"How can you quickly navigate to a directory with a long name, and how to find out the file type using the ls command?",
],
"answers": []
}
response = requests.post(
url,
headers={
"accept": "application/json",
"Content-Type": "application/json"
},
json=payload
)
Response Schema:
{
'id': 'c8b0e922-f89b-4d32-af34-bcee22f5a82f', 'name': 'QUALITY TEST [NO_GT (PARALLEL)]', 'status': 'RUNNING', 'started_at': '2025-10-02T14:19:59.471777', 'exception_details': None, 'retry': 0
}