Management API

URL: http://localhost:5100/

API allows:

adding models
deploying models
interrupting model deployment
running Performance tests using the compressa-perf library
running Observability tests on standard or custom datasets using the DeepEval library or Self-Scoring (for Qwen2.5)

Model Library

`GET /v1/models/`

List of models available for launch and fine-tuning.

Example:

curl -X 'GET' \
  'http://localhost:5100/v1/models/' \
  -H 'accept: application/json'

Response Schema:

[
  {
    "model_id": "string",
    "adapter": true,
    "base_model_id": "string"
  }
]

`POST /v1/models/add/`

Downloading a model from Hugging Face.

Parameters:

query: model_id - model identifier on Hugging Face, e.g. Qwen/Qwen3-14B for link.

Example:

curl -X 'POST' \
  'http://localhost:5100/v1/models/add/?model_id=mymodel_id' \
  -H 'accept: application/json' \
  -d ''

Response Schema:

{
  "id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
  "name": "DOWNLOAD_mymodel_id",
  "status": "RUNNING",
  "started_at": "2024-03-21T09:58:29.846708"
}

Model Deployment

`GET /v1/deploy/`

Get information about the currently deployed model.

Example:

curl -X 'GET' \
  'http://localhost:5100/v1/deploy/' \
  -H 'accept: application/json'

Response Schema:

{
  "model_id": "string",
  "adapter_ids": [
    "string"
  ]
}

`POST /v1/deploy/`

Launch models and fine-tuned adapters for inference. List of ids can be obtained using GET /v1/models/.

Request Body:

{
  "model_id": "string",
  "adapter_ids": [
    "string"
  ]
}

model_id - model identifier
adapter_ids - list of adapter identifiers

Example:

curl -X 'POST' \
  'http://localhost:5100/v1/deploy/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model_id": "model_id1",
  "adapter_ids": [
    "adapter_id1",
    "adapter_id2"
  ]
}'

Response Schema:

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "name": "string",
  "status": "CREATED",
  "started_at": "2024-03-21T10:10:07.521Z"
}

`GET /v1/deploy/status`

Get status of the deployed model.

Example:

curl -X 'GET' \
  'http://localhost:5100/v1/deploy/status/' \
  -H 'accept: application/json'

Response Schema:

{
  "model_id": "model_id1",
  "adapter_ids": [
    "adapter_id1",
    "adapter_id2"
  ],
  "job": {
    "id": "8a63349c-078f-4e98-8968-4f011593329c",
    "name": "DEPLOY_model_id1_adapters_id1_id2",
    "status": "RUNNING",
    "started_at": "2024-03-21T07:35:16.861681"
  }
}

`POST /v1/deploy/interrupt/`

Disconnect the currently deployed model.

Example:

curl -X 'POST' \
  'http://localhost:5100/v1/deploy/interrupt/' \
  -H 'accept: application/json' \
  -d ''

Response Schema:

{
  "id": "8a63349c-078f-4e98-8968-4f011593329c",
  "name": "DEPLOY_model_id1_adapters_id1_id2",
  "status": "RUNNING",
  "started_at": "2024-03-21T07:35:16.861681"
}

Jobs

Operations such as model loading or deployment are associated with jobs. The following APIs allow managing job execution.

`GET /v1/jobs/`

Get all jobs with statuses.

Example:

curl -X 'GET' \
  'http://localhost:5100/v1/jobs/' \
  -H 'accept: application/json'

Response Schema:

[
  {
    "id": "8a63349c-078f-4e98-8968-4f011593329c",
    "name": "DEPLOY_model_id1_adapters_id1_id2",
    "status": "RUNNING",
    "started_at": "2024-03-21T07:35:16.861681"
  },
  {
    "id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
    "name": "DOWNLOAD_test",
    "status": "FINISHED",
    "started_at": "2024-03-21T09:58:29.846708"
  }
]

`GET /v1/jobs/{job_id}/status/`

Get the latest status of job with job_id.

Parameters:

path: job_id

Example:

curl -X 'GET' \
  'http://localhost:5100/v1/jobs/4d78d943-1896-4d7b-9f11-b10cc2389ba3/status/' \
  -H 'accept: application/json'

Response Schema:

{
  "id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
  "name": "DOWNLOAD_test",
  "status": "FINISHED",
  "started_at": "2024-03-21T09:58:29.846708"
}

`POST /v1/jobs/{job_id}/interrupt/`

Interrupt execution of job with job_id.

Parameters:

path: job_id

Example:

curl -X 'POST' \
  'http://localhost:5100/v1/jobs/8a63349c-078f-4e98-8968-4f011593329c/interrupt/' \
  -H 'accept: application/json' \
  -d ''

Response Schema:

{
  "id": "8a63349c-078f-4e98-8968-4f011593329c",
  "name": "DEPLOY_model_id1_adapters_id1_id2",
  "status": "KILLED",
  "started_at": "2024-03-21T07:35:16.861681"
}

Running Performance Tests

`GET /v1/performance/`

`GET /v1/performance/status`

Get information about a running test (if any) or indication that the test is completed and the result is saved.

Example:

curl -X 'GET' 'http://localhost:5100/v1/performance/'

Response Schema if test is in progress:

{"model":"Compressa-LLM","job":{"id":"b83cb091-5acd-4254-84b5-e1141ec01711","name":"performance TEST","status":"RUNNING","started_at":"2025-08-03T10:07:30.818815","exception_details":null,"retry":0},"message":null}

Response Schema if test is completed

{"model":"Compressa-LLM","job":{"id":"b83cb091-5acd-4254-84b5-e1141ec01711","name":"performance TEST","status":"FINISHED","started_at":"2025-08-03T10:07:30.818815","exception_details":null,"retry":0},"message":"Results are available in /app/resources/performance_results"}

`POST /v1/performance/`

Start a test. Test parameters are passed in the request, such as number of examples, number of threads, report format.

Example

curl -X 'POST' 'http://localhost:5100/v1/performance/?num_tasks=100&num_runners=10&report_mode=md'

Response Schema:

{
 "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
 "name": "string",
 "status": "RUNNING",
 "started_at": "2024-03-21T10:10:07.521Z"
}

After testing is complete, reports are saved in the selected format (pdf, md or csv) in the previously mounted RESOURCES_PATH folder

`POST /v1/performance/interrupt/`

Interrupt a running test.

Example:

curl -X 'POST' \
 'http://localhost:5100/v1/performance/interrupt/' \
 -H 'accept: application/json' \
 -d ''

Response Schema:

{"id":"6be69577-ebe8-4062-a5cb-ad8085c11393","name":"performance TEST","status":"INTERRUPTED","started_at":"2025-08-03T10:18:00.368896","exception_details":null,"retry":0}

Running Quality Tests

`GET /v1/metrics/`

`GET /v1/metrics/status`

Get information about a running test or test result if the test is completed.

Example:

curl -X 'GET' \
 'http://localhost:5100/v1/metrics/'

Response Schema if test is in progress:

{"model":"Compressa-LLM","message":null,"job":{"id":"26908ef2-60b4-4211-9e7f-3a649c200419","name":"QUALITY TEST","status":"RUNNING","started_at":"2025-08-03T10:30:18.573467","exception_details":null,"retry":0}}

Response Schema if test is completed

{"model":"Compressa-LLM","job":{"id":"b83cb091-5acd-4254-84b5-e1141ec01711","name":"observability TEST","status":"FINISHED","started_at":"2025-08-03T10:07:30.818815","exception_details":null,"retry":0},"message":"Results are available in /app/resources/quality_tests"}

`POST /v1/metrics/`

Start a test. Test parameters are passed in the request, such as dataset (standard or custom) and number of randomly selected examples (default - 10, if -1 is specified - all)

Request for standard dataset

curl -X 'POST' 'http://localhost:5100/v1/metrics/?dataset=medical&num_examples=5'

List of available standard datasets

SberQuad - sberquad
Medical questions dataset - medical
Jeopardy and Own Game dataset - jeopardy

Request for custom dataset

import requests

url = 'http://localhost:5100/v1/metrics/?dataset=custom'
payload = {
    "question": [
      "What is important when welding titanium and its alloys?",
      "What is boric acid needed for in reactor coolant?",
      "In what year did the Komsomolets submarine sink?",
    ],
    "answers": [
      "Ensure an inert atmosphere in the welding zone, prevent titanium oxidation",
      "Boric acid is used as a liquid absorber to control the chain fission reaction",
      "In 1989",
    ]
  }

response = requests.post(
  url,
  headers={
    "accept": "application/json",
    "Content-Type": "application/json"
  },
  json=payload
)

Response Schema:

{
{"id":"ce79eaee-4541-4341-ac21-b79d9bb7f9bb","name":"QUALITY TEST","status":"RUNNING","started_at":"2025-08-03T10:34:09.113636","exception_details":null,"retry":0}
}

After testing is complete, reports are saved in pdf, csv and json format in the previously mounted RESOURCES_PATH folder

`POST /v1/metrics/interrupt/`

Interrupt a running test.

Example:

curl -X 'POST' \
 'http://localhost:5100/v1/metrics/interrupt/' \
 -H 'accept: application/json' \
 -d ''

Response Schema:

{"id":"ce79eaee-4541-4341-ac21-b79d9bb7f9bb","name":"QUALITY TEST","status":"INTERRUPTED","started_at":"2025-08-03T10:34:09.113636","exception_details":null,"retry":0}

Self-Check Mode - Model self-assessment of the quality of its answers on a custom dataset without labels.

Experimental mode.

About the evaluation method

Based on the EM (Expectation-Maximization) algorithm for evaluating the quality of language models without ground truth answers. The algorithm analyzes 6 key metrics from 0 to 10 (the higher, the better):

Self-Consistency — consistency of answers at different temperatures
Self-Rating — model's self-assessment of its answers
STD-Rating — spread of model's self-assessment scores at different temperatures
Abstention — model's ability to express uncertainty
Chain-of-Thought Critique — quality of step-by-step reasoning
Paraphrase Self-Consistency — consistency when paraphrasing questions

The EM algorithm ranks each question from the dataset depending on the model's behavior on it. Each question is assigned its own latent score, which shows only the comparative quality of the model relative to other questions. The higher the latent score, the better the model's quality on this question.

Since different datasets may have different distributions of model answer quality, EM model weights are introduced, which show how strongly each metric affects the final latent score. The weight of a metric is greater the better it describes the entire dataset and agrees with other metrics. The final score is aggregated as a weighted average of metric scores by latent score quantiles. Scores are averaged taking into account EM model weights. This makes it possible to obtain a final objective assessment of model quality that can be compared with other datasets.

import requests
url = 'http://localhost:5100/v1/metrics/?dataset=custom&mode=no_gt'
payload = {
    "question": [
      "What factors increase the risk of developing malignant tumors?",
      "How can you quickly navigate to a directory with a long name, and how to find out the file type using the ls command?",
    ],
    "answers": []
  }

response = requests.post(
  url,
  headers={
    "accept": "application/json",
    "Content-Type": "application/json"
  },
  json=payload
)

Response Schema:

{
'id': 'c8b0e922-f89b-4d32-af34-bcee22f5a82f', 'name': 'QUALITY TEST [NO_GT (PARALLEL)]', 'status': 'RUNNING', 'started_at': '2025-10-02T14:19:59.471777', 'exception_details': None, 'retry': 0
}

Management API

Model Library​

GET /v1/models/​

Example:​

Response Schema:​

POST /v1/models/add/​

Parameters:​

Example:​

Response Schema:​

Model Deployment​

GET /v1/deploy/​

Example:​

Response Schema:​

POST /v1/deploy/​

Request Body:​

Example:​

Response Schema:​

GET /v1/deploy/status​

Example:​

Response Schema:​

POST /v1/deploy/interrupt/​

Example:​

Response Schema:​

Jobs​

GET /v1/jobs/​

Example:​

Response Schema:​

GET /v1/jobs/{job_id}/status/​

Parameters:​

Example:​

Response Schema:​

POST /v1/jobs/{job_id}/interrupt/​

Parameters:​

Example:​

Response Schema:​

Running Performance Tests​

GET /v1/performance/​

GET /v1/performance/status​

Example:​

Response Schema if test is in progress:​

Response Schema if test is completed​

POST /v1/performance/​

Example​

Response Schema:​

POST /v1/performance/interrupt/​

Example:​

Response Schema:​

Running Quality Tests​

GET /v1/metrics/​

GET /v1/metrics/status​

Example:​

Response Schema if test is in progress:​

Response Schema if test is completed​

POST /v1/metrics/​

Request for standard dataset​

List of available standard datasets​

Request for custom dataset​

Response Schema:​

POST /v1/metrics/interrupt/​

Example:​

Response Schema:​

Self-Check Mode - Model self-assessment of the quality of its answers on a custom dataset without labels.​

About the evaluation method​

Response Schema:​

Model Library

`GET /v1/models/`

Example:

Response Schema:

`POST /v1/models/add/`

Parameters:

Example:

Response Schema:

Model Deployment

`GET /v1/deploy/`

Example:

Response Schema:

`POST /v1/deploy/`

Request Body:

Example:

Response Schema:

`GET /v1/deploy/status`

Example:

Response Schema:

`POST /v1/deploy/interrupt/`

Example:

Response Schema:

Jobs

`GET /v1/jobs/`

Example:

Response Schema:

`GET /v1/jobs/{job_id}/status/`

Parameters:

Example:

Response Schema:

`POST /v1/jobs/{job_id}/interrupt/`

Parameters:

Example:

Response Schema:

Running Performance Tests

`GET /v1/performance/`

`GET /v1/performance/status`

Example:

Response Schema if test is in progress:

Response Schema if test is completed

`POST /v1/performance/`

Example

Response Schema:

`POST /v1/performance/interrupt/`

Example:

Response Schema:

Running Quality Tests

`GET /v1/metrics/`

`GET /v1/metrics/status`

Example:

Response Schema if test is in progress:

Response Schema if test is completed

`POST /v1/metrics/`

Request for standard dataset

List of available standard datasets

Request for custom dataset

Response Schema:

`POST /v1/metrics/interrupt/`

Example:

Response Schema:

Self-Check Mode - Model self-assessment of the quality of its answers on a custom dataset without labels.

About the evaluation method

Response Schema: