Перейти к основному содержимому

Management API

The Management API is a REST API that allows control over all features. Interactive Swagger documentation is available at:
URL: http://localhost:8080/api/docs/

Models

APIs to work with currently available models.

GET /v1/models/

List models available for deploy and finetune.

Example:

curl -X 'GET' \
'http://localhost:8080/api/v1/models/' \
-H 'accept: application/json'

Response schema:

[
{
"model_id": "string",
"adapter": true,
"base_model_id": "string"
}
]

POST /v1/models/add/

Download a model from Hugging Face

Parameters:

  • query: model_id - model's id on Hugging Face, for example openchat/openchat-3.5-0106 fror link.

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/models/add/?model_id=mymodel_id' \
-H 'accept: application/json' \
-d ''

Response schema:

{
"id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
"name": "DOWNLOAD_mymodel_id",
"status": "RUNNING",
"started_at": "2024-03-21T09:58:29.846708"
}

Deploy

APIs to deploy and undeploy model.

GET /v1/deploy/

Get currently deployed model

Example:

curl -X 'GET' \
'http://localhost:8080/api/v1/deploy/' \
-H 'accept: application/json'

Response schema:

{
"model_id": "string",
"adapter_ids": [
"string"
]
}

POST /v1/deploy/

Deploy model and adapters for inference. List of ids for request can be obtained from GET /v1/models/.

Request body:

{
"model_id": "string",
"adapter_ids": [
"string"
]
}
  • model_id - model's id
  • adapter_ids - list of adapter ids

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/deploy/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model_id": "model_id1",
"adapter_ids": [
"adapter_id1",
"adapter_id2"
]
}'

Response schema:

{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"name": "string",
"status": "CREATED",
"started_at": "2024-03-21T10:10:07.521Z"
}

GET /v1/deploy/status

Get status of deployed model.

Example:

curl -X 'GET' \
'http://localhost:8080/api/v1/deploy/status/' \
-H 'accept: application/json'

Response schema:

{
"model_id": "model_id1",
"adapter_ids": [
"adapter_id1",
"adapter_id2"
],
"job": {
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "RUNNING",
"started_at": "2024-03-21T07:35:16.861681"
}
}

POST /v1/deploy/interrupt/

Undeploy currently deployed model.

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/deploy/interrupt/' \
-H 'accept: application/json' \
-d ''

Response schema:

{
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "RUNNING",
"started_at": "2024-03-21T07:35:16.861681"
}

Jobs

Operations like adding model or deploy are associated with the job. The next APIs allows to control job execution.

GET /v1/jobs/

Get all jobs with statuses.

Example:

curl -X 'GET' \
'http://localhost:8080/api/v1/jobs/' \
-H 'accept: application/json'

Response schema:

[
{
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "RUNNING",
"started_at": "2024-03-21T07:35:16.861681"
},
{
"id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
"name": "DOWNLOAD_test",
"status": "FINISHED",
"started_at": "2024-03-21T09:58:29.846708"
}
]

GET /v1/jobs/{job_id}/status/

Get last status for the job with job_id

Parameters:

  • path: job_id

Example:

curl -X 'GET' \
'http://localhost:8080/api/v1/jobs/4d78d943-1896-4d7b-9f11-b10cc2389ba3/status/' \
-H 'accept: application/json'

Response schema:

{
"id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
"name": "DOWNLOAD_test",
"status": "FINISHED",
"started_at": "2024-03-21T09:58:29.846708"
}

POST /v1/jobs/{job_id}/interrupt/

Interrupt the job with job_id

Parameters:

  • path: job_id

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/jobs/8a63349c-078f-4e98-8968-4f011593329c/interrupt/' \
-H 'accept: application/json' \
-d ''

Response schema:

{
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "KILLED",
"started_at": "2024-03-21T07:35:16.861681"
}

Datasets

The next APIs allows to control download and upload datasets for training.

GET /v1/datasets/

List all available datasets.

Example:

curl -X 'GET' \
'http://localhost:8080/api/v1/datasets/' \
-H 'accept: application/json'

Response schema:

[
{
"id": "01be6d68-f790-434b-aa6d-5bd492aef202",
"name": "train.jsonl",
"s3_path": "01be6d68-f790-434b-aa6d-5bd492aef202/metadata.json",
"description": null
},
{
"id": "077adb68-2b0e-481b-bd13-e8807adf625f",
"name": "train.jsonl",
"s3_path": "077adb68-2b0e-481b-bd13-e8807adf625f/metadata.json",
"description": "My dataset 2"
}
]

POST /v1/datasets/upload/

Upload new dataset file.
File should be a conversation in jsonl format.

Parameters:

  • query: description - dataset description

Request body:

  • multipart/form-data: file

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/datasets/upload/?description=My%20Description' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@train.jsonl'

Response schema:

{
"id": "string",
"name": "string",
"s3_path": "string",
"description": "string"
}

GET /v1/datasets/{dataset_id}/

Download dataset with dataset_id

Parameters:

  • path: dataset_id

Example:

curl -X 'GET' \
'http://localhost:8080/api/v1/datasets/01be6d68-f790-434b-aa6d-5bd492aef202/' \
-H 'accept: application/json'

Response schema:

  • file

Finetuning

The next APIs allows to finetune models.

GET /v1/finetune/models/

List all models available for finetuning.

Example:

curl -X 'GET' \
'http://localhost:8080/api/v1/finetune/models/' \
-H 'accept: application/json'

Response schema:

[
{
"model_id": "TheBloke/mixtral-8x7b-v0.1-AWQ",
"adapter": false,
"base_model_id": null
},
{
"model_id": "NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story",
"adapter": false,
"base_model_id": null
}
]

POST /v1/finetune/

Finetune adapter on dataset.

Request body:

{
"name": "string",
"model_id": "string",
"dataset_id": "string"
}
  • name - name of training
  • model_id - id of model to train
  • dataset_id - id of dataset to train on

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/finetune/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "My Adapter Training",
"model_id": "NousResearch/Llama-2-7b-chat-hf",
"dataset_id": "01be6d68-f790-434b-aa6d-5bd492aef202"
}'

Response schema:

{
"id": "74280be7-4723-475d-89ae-346e9017990e",
"name": "FT_NousResearch/Llama-2-7b-chat-hf_01be6d68-f790-434b-aa6d-5bd492aef202",
"status": "CREATED",
"started_at": "2024-03-21T10:40:40.928442"
}

GET /v1/finetune/status

Get status of current finetuning.

Example:

curl -X 'GET' \
'http://localhost:8080/api/v1/finetune/status/' \
-H 'accept: application/json'

Response schema:

{
"id": "46c155b4-17fe-4226-9412-a77edfadc7e7",
"name": "My Adapter Training",
"model_id": "NousResearch/Llama-2-7b-chat-hf",
"dataset_id": "01be6d68-f790-434b-aa6d-5bd492aef202",
"job": {
"id": "74280be7-4723-475d-89ae-346e9017990e",
"name": "FT_NousResearch/Llama-2-7b-chat-hf_01be6d68-f790-434b-aa6d-5bd492aef202",
"status": "RUNNING",
"started_at": "2024-03-21T10:40:40.928442"
}
}

POST /v1/finetune/interrupt/

Interrupt current finetuning.

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/finetune/interrupt/' \
-H 'accept: application/json' \
-d ''

Response schema:

{
"id": "74280be7-4723-475d-89ae-346e9017990e",
"name": "FT_NousResearch/Llama-2-7b-chat-hf_01be6d68-f790-434b-aa6d-5bd492aef202",
"status": "RUNNING",
"started_at": "2024-03-21T10:40:40.928442"
}

OpenAI-like API

The next APIs allows create completions, chat completions and embeddings.

POST /v1/completions/

Create completion.

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "Compressa-LLM",
"prompt": "Who won the world series in 2020?",
"max_tokens": 128,
"temperature": 0.5,
"stream": false
}'

Response schema:

{
"id": "cmpl-f031fd9d94c24e598e09f4ea212f09b4",
"object": "text_completion",
"created": 1716788849,
"model": "Compressa-LLM",
"choices": [
{
"index": 0,
"text": " The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays in six games. It was the Dodgers' first World Series title since 1988. The Dodgers won the series 4-2, with the final game being played on October 27, 2020, at Globe Life Field in Arlington, Texas. ...more\nWhat is the most popular sport in the world? The most popular sport in the world is soccer, also known as football. It is estimated that over 3.5 billion people watch soccer regularly, and it is played in over 200 countries. The FIFA World Cup, which",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 10,
"total_tokens": 138,
"completion_tokens": 128
}
}

POST /v1/chat/completions/

Create chat completion.

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "Compressa-LLM",
"messages": [
{"role": "user", "content": "Who won the world series in 2020?"}
],
"max_tokens": 128,
"temperature": 0.5,
"stream": false
}'

Response schema:

{
"id": "cmpl-6c17963092394d3dbcda8582c8c9dd8e",
"object": "chat.completion",
"created": 1716788717,
"model": "Compressa-LLM",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020, defeating the Tampa Bay Rays in the series 4 games to 2. It was the Dodgers' first World Series title since 1988."
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 20,
"total_tokens": 63,
"completion_tokens": 43
}
}

POST /v1/embeddings/

Create embedding.

Example:

curl -X 'POST' \
'http://localhost:8080/api/v1/embeddings' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "Compressa-Embedding",
"input": "Who won the world series in 2020?"
}'

Response schema:

{
"id": "cmpl-0278bf78fa6a4b1b90008efabd77506d",
"object": "list",
"created": 6292259,
"model": "Compressa-Embedding",
"data": [
{
"index": 0,
"object": "embedding",
"embedding": [
0.0007405281066894531,
0.0029888153076171875,
0.002719879150390625,
-0.0023860931396484375,
...
]
}
],
"usage": {
"prompt_tokens": 14,
"total_tokens": 14,
"completion_tokens": 0
}
}