Skip to main content

Fine-tuning

Compressa platform allows fine-tuning LLM models quickly and efficiently using LoRA/QLoRA adapters or PFT. Fine-tuning allows improving answer quality for specific business tasks, focusing on a specific topic, or setting answer format/style.

Important! Model fine-tuning is only available in the on-premise version of the platform running on your servers.

Before proceeding to fine-tuning, it's recommended to conduct experiments with selecting the optimal model and prompting.

Fine-tuning is based on the Axolotl framework, AIM is used for monitoring.

Limitations:

  • FlashAttention is not available in the fine-tuning engine
  • Datasets must be in jsonl format
  • 3 types of FT tasks:
    • LORA / QLORA
    • PEFT
    • DPO with LORA / QLORA
  • 3 types of datasets:
    • alpaca — {"instruction": "", "input": "", "output": ""}
    • chat_template — {"messages": [{"role": "user", "content": "content"}, {"role": "assistant", "content": "content"}]}
    • dpo_chat_template — {"instruction": "instruction", "input": "", "output": "chosen", "messages": [{"role": "user", "content": "content"}, {"role": "assistant", "content": "content"}], "chosen": {"role": "assistant", "content": "content"}, "rejected": {"role": "assistant", "content": "content"}}
  • Only vLLM inference engine for adapters

Preparation

Set DISABLE_AUTODEPLOY=TRUE in docker-compose.yaml or interrupt the deployed model after pod startup using curl -X POST http://localhost:5100/v1/deploy/interrupt

Execute

cd deploy/pod
set -a
source .env
set +a
docker compose up compressa-pod compressa-client-finetune aim-ui aim-server -d

Chat UI will be available in browser at http://localhost:8501/chat

Fine-tuning UI will be available at http://localhost:8501/finetune

aim-UI is available at http://localhost:43800/aim-ui

Prepare datasets for fine-tuning.

You can use your own datasets by uploading them using a POST request to http://localhost:5100/v1/datasets/upload/, example is in packages/pod/scripts/examples_finetune. Or you can load datasets from HuggingFace using a POST request to http://localhost:5100/v1/datasets/add/?name=%HF_REPO_ID%

After uploading or adding, your datasets will be in the host folder DATASET_PATH and can be viewed using a GET request to http://localhost:5100/v1/datasets

Fine Tuning

Custom Config

You can use your own .yaml config file (recommended, especially for training PFT with parameter unfreezing, as layer names differ for different models). Examples are available in Axolotl examples

  • Select dataset from http://localhost:5100/v1/datasets
  • Select model from http://localhost:5100/v1/finetune/models
  • Important: only models with task=llm will be available!
  • Edit base_model and dataset.path in your config
  • Start training process:
import requests
url = "http://localhost:5100/v1/finetune/custom"
headers = {
"Authorization": "Bearer test",
}
response = requests.post(url, headers=headers, files={"config_file": open("cfg.yaml", "rb")})
print(response.json())

You can check status or interrupt the process via status and interrupt endpoints

Process visualization is available in Aim-UI http://localhost:43800/aim-ui

Adapters or fine-tuned model weights will be available in a separate folder (doesn't affect base model): base_model_path_finetuned_YYYYMMDD_job_id[:8]

Standard Config

  • Can use one of the included config templates
  • Select dataset from http://localhost:5100/v1/datasets
  • Select model from http://localhost:5100/v1/finetune/models
  • Important: only models with task=llm will be available! Start fine-tuning process:
import requests
url_finetune = "http://localhost:5100/v1/finetune/"
headers_finetune = {
"Authorization": "Bearer test",
"Content-Type": "application/json"
}

data = {
"name": "test_ft",
"model_id": "Qwen/Qwen2.5-0.5B-Instruct",
"dataset_name": "glayout_chatml.jsonl",
"training_task": "qlora",
"num_train_epochs": 10,
"learning_rate": 5e-5,
"batch_size": 2,
"lora_r": 8,
"lora_alpha": 16,
"lora_dropout": 0.05,
"quantization": "int4",
"launcher": "accelerate",
}
response = requests.post(url_finetune, headers=headers_finetune, json=data)
if response.status_code == 200:
print(f"Finetune started successfully: {response.json()}")
else:
print(f"Failed to start finetune: {response.status_code} {response.text}")

You can check status or interrupt the process via status and interrupt endpoints

Process visualization is available in Aim-UI http://localhost:43800/aim-ui

Adapters or fine-tuned model weights will be available in a separate folder (doesn't affect base model): base_model_path_finetuned_YYYYMMDD_job_id[:8]

It's strongly recommended to restart Pod after fine-tuning before merging weights or starting a new experiment to avoid Pod state collisions.

Running Fine-tuned Model

Running Base Model with Adapters

Edit deploy_config.json:

{
"model_id": "Qwen/Qwen2.5-0.5B-Instruct",
"served_model_name": "Compressa-LLM",
"dtype": "auto",
"backend": "llm",
"task": "llm",
"adapter_ids": ["Qwen/Qwen2.5-0.5B-Instruct_finetuned_261fbfb6-22fe-4897-9f68-a28c723b072e"]
}

Start Pod

Merging Base Model and Adapter

  • Restart pod.
  • Select fine-tuned model name and checkpoint number (if None — last checkpoint will be used).
  • If needed, check compressa-config.json file in model folder.
import requests
url = "http://localhost:5100/v1/finetune/merge"
headers = {
"Authorization": "Bearer test",
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers,
json={
"model_id": "Qwen/Qwen2.5-0.5B-Instruct",
"lora_model_dir": "Qwen_Qwen2.5-0.5B-Instruct_finetuned_20251020_26e93c7c",
"checkpoint": None,
})
print(response.json())

There are no endpoints for checking status and interrupting merge job, you can check status by job_id:

curl http://localhost:5100/v1/jobs/{job_id}/status/

Merged weights will be available in a separate folder (doesn't affect base model): base_model_path_merged_finetuned_YYYYMMDD_job_id[:8]_checkpoint" Check model model_id using GET request to http://localhost:5100/v1/models and edit deploy_config.json to start pod with model after merge.

{
"model_id": "Qwen/Qwen2.5-0.5B-Instruct_merged_415a70b8",
"served_model_name": "Compressa-LLM",
"dtype": "auto",
"backend": "llm",
"task": "llm"
}

Dataset loading, fine-tuning, monitoring and merge are also available via UI at http://localhost:8501/finetune

Parameters

Running

AIM

REST API Description

Dataset Management

GET /v1/datasets/

Get list of available datasets.

Request Example:

curl -X 'GET' \
'http://localhost:5100/v1/datasets/' \
-H 'accept: application/json'

Response Example:

[
{
"id": "01be6d68-f790-434b-aa6d-5bd492aef202",
"name": "train.jsonl",
"path": "01be6d68-f790-434b-aa6d-5bd492aef202/metadata.json",
"description": null
},
{
"id": "077adb68-2b0e-481b-bd13-e8807adf625f",
"name": "train.jsonl",
"path": "077adb68-2b0e-481b-bd13-e8807adf625f/metadata.json",
"description": "My dataset 2"
}
]

POST /v1/datasets/upload/

Upload a new dataset (only 'jsonl' format).

Parameters:

  • query: description - dataset description

Request Body:

  • multipart/form-data: file

Request Example:

curl -X 'POST' \
'http://localhost:5100/v1/datasets/upload/?description=My%20Description' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@train.jsonl'

Response Example:

{
"id": "string",
"name": "string",
"path": "string",
"description": "string"
}

POST /v1/datasets/add/

Download a new dataset from HuggingFace.

Request Example:

curl -X 'POST' \
'http://localhost:5100/v1/datasets/add/?name=REPO_ID' \
-H 'accept: application/json' \

Response Example:

{
"id": "string",
"name": "string",
"path": "string",
"description": "string"
}

GET /v1/datasets/{dataset_id}/

Download a specific dataset by id dataset_id

Parameters:

  • path: dataset_id

Request Example:

curl -X 'GET' \
'http://localhost:5100/v1/datasets/01be6d68-f790-434b-aa6d-5bd492aef202/' \
-H 'accept: application/json'

Response Example:

  • file

Model Fine-tuning

GET /v1/finetune/models/

Get list of models available for fine-tuning

Request Example:

curl -X 'GET' \
'http://localhost:5100/v1/finetune/models/' \
-H 'accept: application/json'

Response Example:

[
{
"model_id": "TheBloke/mixtral-8x7b-v0.1-AWQ",
"adapter": false,
"base_model_id": null
},
{
"model_id": "NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story",
"adapter": false,
"base_model_id": null
}
]

POST /v1/finetune/

Start fine-tuning a model on a dataset

Examples see above.

GET /v1/finetune/status

Get status of current fine-tuning process

Request Example:

curl -X 'GET' \
'http://localhost:5100/v1/finetune/status/' \
-H 'accept: application/json'

Response Example:

{
"id": "46c155b4-17fe-4226-9412-a77edfadc7e7",
"name": "My Adapter Training",
"model_id": "NousResearch/Llama-2-7b-chat-hf",
"dataset_id": "01be6d68-f790-434b-aa6d-5bd492aef202",
"job": {
"id": "74280be7-4723-475d-89ae-346e9017990e",
"name": "FT_NousResearch/Llama-2-7b-chat-hf_01be6d68-f790-434b-aa6d-5bd492aef202",
"status": "RUNNING",
"started_at": "2024-03-21T10:40:40.928442"
}
}

POST /v1/finetune/interrupt/

Interrupt fine-tuning process

Request Example:

curl -X 'POST' \
'http://localhost:5100/v1/finetune/interrupt/' \
-H 'accept: application/json' \
-d ''

Response Example:

{
"id": "74280be7-4723-475d-89ae-346e9017990e",
"name": "FT_NousResearch/Llama-2-7b-chat-hf_01be6d68-f790-434b-aa6d-5bd492aef202",
"status": "RUNNING",
"started_at": "2024-03-21T10:40:40.928442"
}

POST /v1/finetune/merge

Perform Merge of base model and LORA adapter.

Example see above.