Management API
The Management API is a REST API that allows control over all features.
Interactive Swagger documentation is available at:
URL: http://localhost:8080/api/docs/
Models
APIs to work with currently available models.
GET /v1/models/
List models available for deploy and finetune.
Example:
curl -X 'GET' \
'http://localhost:8080/api/v1/models/' \
-H 'accept: application/json'
Response schema:
[
{
"model_id": "string",
"adapter": true,
"base_model_id": "string"
}
]
POST /v1/models/add/
Download a model from Hugging Face
Parameters:
- query:
model_id
- model's id on Hugging Face, for exampleopenchat/openchat-3.5-0106
fror link.
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/models/add/?model_id=mymodel_id' \
-H 'accept: application/json' \
-d ''
Response schema:
{
"id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
"name": "DOWNLOAD_mymodel_id",
"status": "RUNNING",
"started_at": "2024-03-21T09:58:29.846708"
}
Deploy
APIs to deploy and undeploy model.
GET /v1/deploy/
Get currently deployed model
Example:
curl -X 'GET' \
'http://localhost:8080/api/v1/deploy/' \
-H 'accept: application/json'
Response schema:
{
"model_id": "string",
"adapter_ids": [
"string"
]
}
POST /v1/deploy/
Deploy model and adapters for inference.
List of ids for request can be obtained from GET /v1/models/
.
Request body:
{
"model_id": "string",
"adapter_ids": [
"string"
]
}
model_id
- model's idadapter_ids
- list of adapter ids
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/deploy/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model_id": "model_id1",
"adapter_ids": [
"adapter_id1",
"adapter_id2"
]
}'
Response schema:
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"name": "string",
"status": "CREATED",
"started_at": "2024-03-21T10:10:07.521Z"
}
GET /v1/deploy/status
Get status of deployed model.
Example:
curl -X 'GET' \
'http://localhost:8080/api/v1/deploy/status/' \
-H 'accept: application/json'
Response schema:
{
"model_id": "model_id1",
"adapter_ids": [
"adapter_id1",
"adapter_id2"
],
"job": {
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "RUNNING",
"started_at": "2024-03-21T07:35:16.861681"
}
}
POST /v1/deploy/interrupt/
Undeploy currently deployed model.
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/deploy/interrupt/' \
-H 'accept: application/json' \
-d ''
Response schema:
{
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "RUNNING",
"started_at": "2024-03-21T07:35:16.861681"
}
Jobs
Operations like adding model or deploy are associated with the job. The next APIs allows to control job execution.
GET /v1/jobs/
Get all jobs with statuses.
Example:
curl -X 'GET' \
'http://localhost:8080/api/v1/jobs/' \
-H 'accept: application/json'
Response schema:
[
{
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "RUNNING",
"started_at": "2024-03-21T07:35:16.861681"
},
{
"id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
"name": "DOWNLOAD_test",
"status": "FINISHED",
"started_at": "2024-03-21T09:58:29.846708"
}
]
GET /v1/jobs/{job_id}/status/
Get last status for the job with job_id
Parameters:
- path:
job_id
Example:
curl -X 'GET' \
'http://localhost:8080/api/v1/jobs/4d78d943-1896-4d7b-9f11-b10cc2389ba3/status/' \
-H 'accept: application/json'
Response schema:
{
"id": "4d78d943-1896-4d7b-9f11-b10cc2389ba3",
"name": "DOWNLOAD_test",
"status": "FINISHED",
"started_at": "2024-03-21T09:58:29.846708"
}
POST /v1/jobs/{job_id}/interrupt/
Interrupt the job with job_id
Parameters:
- path:
job_id
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/jobs/8a63349c-078f-4e98-8968-4f011593329c/interrupt/' \
-H 'accept: application/json' \
-d ''
Response schema:
{
"id": "8a63349c-078f-4e98-8968-4f011593329c",
"name": "DEPLOY_model_id1_adapters_id1_id2",
"status": "KILLED",
"started_at": "2024-03-21T07:35:16.861681"
}
Datasets
The next APIs allows to control download and upload datasets for training.
GET /v1/datasets/
List all available datasets.
Example:
curl -X 'GET' \
'http://localhost:8080/api/v1/datasets/' \
-H 'accept: application/json'
Response schema:
[
{
"id": "01be6d68-f790-434b-aa6d-5bd492aef202",
"name": "train.jsonl",
"s3_path": "01be6d68-f790-434b-aa6d-5bd492aef202/metadata.json",
"description": null
},
{
"id": "077adb68-2b0e-481b-bd13-e8807adf625f",
"name": "train.jsonl",
"s3_path": "077adb68-2b0e-481b-bd13-e8807adf625f/metadata.json",
"description": "My dataset 2"
}
]
POST /v1/datasets/upload/
Upload new dataset file.
File should be a conversation in jsonl
format.
Parameters:
- query:
description
- dataset description
Request body:
- multipart/form-data:
file
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/datasets/upload/?description=My%20Description' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@train.jsonl'
Response schema:
{
"id": "string",
"name": "string",
"s3_path": "string",
"description": "string"
}
GET /v1/datasets/{dataset_id}/
Download dataset with dataset_id
Parameters:
- path:
dataset_id
Example:
curl -X 'GET' \
'http://localhost:8080/api/v1/datasets/01be6d68-f790-434b-aa6d-5bd492aef202/' \
-H 'accept: application/json'
Response schema:
file
Finetuning
The next APIs allows to finetune models.
GET /v1/finetune/models/
List all models available for finetuning.
Example:
curl -X 'GET' \
'http://localhost:8080/api/v1/finetune/models/' \
-H 'accept: application/json'
Response schema:
[
{
"model_id": "TheBloke/mixtral-8x7b-v0.1-AWQ",
"adapter": false,
"base_model_id": null
},
{
"model_id": "NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story",
"adapter": false,
"base_model_id": null
}
]
POST /v1/finetune/
Finetune adapter on dataset.
Request body:
{
"name": "string",
"model_id": "string",
"dataset_id": "string"
}
name
- name of trainingmodel_id
- id of model to traindataset_id
- id of dataset to train on
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/finetune/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "My Adapter Training",
"model_id": "NousResearch/Llama-2-7b-chat-hf",
"dataset_id": "01be6d68-f790-434b-aa6d-5bd492aef202"
}'
Response schema:
{
"id": "74280be7-4723-475d-89ae-346e9017990e",
"name": "FT_NousResearch/Llama-2-7b-chat-hf_01be6d68-f790-434b-aa6d-5bd492aef202",
"status": "CREATED",
"started_at": "2024-03-21T10:40:40.928442"
}
GET /v1/finetune/status
Get status of current finetuning.
Example:
curl -X 'GET' \
'http://localhost:8080/api/v1/finetune/status/' \
-H 'accept: application/json'
Response schema:
{
"id": "46c155b4-17fe-4226-9412-a77edfadc7e7",
"name": "My Adapter Training",
"model_id": "NousResearch/Llama-2-7b-chat-hf",
"dataset_id": "01be6d68-f790-434b-aa6d-5bd492aef202",
"job": {
"id": "74280be7-4723-475d-89ae-346e9017990e",
"name": "FT_NousResearch/Llama-2-7b-chat-hf_01be6d68-f790-434b-aa6d-5bd492aef202",
"status": "RUNNING",
"started_at": "2024-03-21T10:40:40.928442"
}
}
POST /v1/finetune/interrupt/
Interrupt current finetuning.
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/finetune/interrupt/' \
-H 'accept: application/json' \
-d ''
Response schema:
{
"id": "74280be7-4723-475d-89ae-346e9017990e",
"name": "FT_NousResearch/Llama-2-7b-chat-hf_01be6d68-f790-434b-aa6d-5bd492aef202",
"status": "RUNNING",
"started_at": "2024-03-21T10:40:40.928442"
}
OpenAI-like API
The next APIs allows create completions, chat completions and embeddings.
POST /v1/completions/
Create completion.
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "Compressa-LLM",
"prompt": "Who won the world series in 2020?",
"max_tokens": 128,
"temperature": 0.5,
"stream": false
}'
Response schema:
{
"id": "cmpl-f031fd9d94c24e598e09f4ea212f09b4",
"object": "text_completion",
"created": 1716788849,
"model": "Compressa-LLM",
"choices": [
{
"index": 0,
"text": " The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays in six games. It was the Dodgers' first World Series title since 1988. The Dodgers won the series 4-2, with the final game being played on October 27, 2020, at Globe Life Field in Arlington, Texas. ...more\nWhat is the most popular sport in the world? The most popular sport in the world is soccer, also known as football. It is estimated that over 3.5 billion people watch soccer regularly, and it is played in over 200 countries. The FIFA World Cup, which",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 10,
"total_tokens": 138,
"completion_tokens": 128
}
}
POST /v1/chat/completions/
Create chat completion.
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "Compressa-LLM",
"messages": [
{"role": "user", "content": "Who won the world series in 2020?"}
],
"max_tokens": 128,
"temperature": 0.5,
"stream": false
}'
Response schema:
{
"id": "cmpl-6c17963092394d3dbcda8582c8c9dd8e",
"object": "chat.completion",
"created": 1716788717,
"model": "Compressa-LLM",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020, defeating the Tampa Bay Rays in the series 4 games to 2. It was the Dodgers' first World Series title since 1988."
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 20,
"total_tokens": 63,
"completion_tokens": 43
}
}
POST /v1/embeddings/
Create embedding.
Example:
curl -X 'POST' \
'http://localhost:8080/api/v1/embeddings' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "Compressa-Embedding",
"input": "Who won the world series in 2020?"
}'
Response schema:
{
"id": "cmpl-0278bf78fa6a4b1b90008efabd77506d",
"object": "list",
"created": 6292259,
"model": "Compressa-Embedding",
"data": [
{
"index": 0,
"object": "embedding",
"embedding": [
0.0007405281066894531,
0.0029888153076171875,
0.002719879150390625,
-0.0023860931396484375,
...
]
}
],
"usage": {
"prompt_tokens": 14,
"total_tokens": 14,
"completion_tokens": 0
}
}