Skip to main content

Installation and Deployment

Compressa LLM is distributed as Docker containers, which are available in Github package storage and can be deployed with a single command.

Requirements

1. Linux Server with Supported Nvidia GPU

Current version has been tested on the following models:

  • Nvidia H100
  • Nvidia A100
  • Nvidia V100
  • Nvidia T4
  • Nvidia 4090
  • Nvidia 4080
  • Nvidia 4070 / 4070Ti
  • Nvidia 3080 / 3080Ti

On GPUs older than Nvidia A100, operation of all available inference engines and all models is not guaranteed.

The server must have RAM volume of at least the GPU memory volume (1.2x GPU memory volume recommended).

2. Installed CUDA Drivers

You need to install the latest compatible drivers.

note

The default CUDA driver version can be installed using the following commands:

sudo apt update
sudo apt install software-properties-common -y
sudo apt install ubuntu-drivers-common -y
sudo ubuntu-drivers autoinstall
sudo apt install nvidia-cuda-toolkit

3. Docker

Installation instructions for Ubuntu:
https://docs.docker.com/engine/install/ubuntu/

You need to install a version that supports Docker Compose V2.

4. Nvidia Container Toolkit

Installation instructions for Linux:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Deployment

1. Docker Authentication:

export PAT=<TOKEN>
echo $PAT | docker login -u compressa --password-stdin

2. Configuration Files:

First, clone the repository with configuration:

git clone git@github.com:compressa-ai/compressa-deploy.git
cd pod

3. Download the Latest Compressa Version:

docker compose pull

4. Select LLM

The system allows you to choose the model that will be launched by default.
Configuration files available:

  1. deploy-qwen25-14.json - Compressa-Qwen2.5-14B-Instruct
  2. deploy-qwq.json - QwQ-32B
  3. deploy-qwen3-14.json - Qwen3-14B

In addition to the models listed above, other models with other engines based on the same base Compressa image are available. Example configuration files are available in pod/configs/

To change the default model, you can change the following line in docker-compose.yaml:

  ...
- ./deploy-qwen3-14.json:/configs/deploy.json:ro

5. Set Environment Variables and Start the Service:

  • DOCKER_GPU_IDS - list of GPU identifiers that will be available for Compressa

  • RESOURCES_PATH - path to directory for storing models, e.g. ./data.

  • HF_HOME - path to directory for caching files ./data/cache.

  • COMPRESSA_API_KEY - your Compressa key
    Set read and write permissions for this directory using chmod -R 777 ./data

    note

    If you're deploying Compressa in a private network without internet access, use the instructions for loading resources.

    export DOCKER_GPU_IDS=0
    export RESOURCES_PATH=./data/compressa

6. Start the Service

docker compose up

Done! The service is available on port 8501 (UI), model API is available on port 5000.