Installation and Deployment
Compressa LLM is distributed as Docker containers, which are available in Github package storage and can be deployed with a single command.
Requirements
1. Linux Server with Supported Nvidia GPU
Current version has been tested on the following models:
- Nvidia H100
- Nvidia A100
- Nvidia V100
- Nvidia T4
- Nvidia 4090
- Nvidia 4080
- Nvidia 4070 / 4070Ti
- Nvidia 3080 / 3080Ti
On GPUs older than Nvidia A100, operation of all available inference engines and all models is not guaranteed.
The server must have RAM volume of at least the GPU memory volume (1.2x GPU memory volume recommended).
2. Installed CUDA Drivers
You need to install the latest compatible drivers.
The default CUDA driver version can be installed using the following commands:
sudo apt update
sudo apt install software-properties-common -y
sudo apt install ubuntu-drivers-common -y
sudo ubuntu-drivers autoinstall
sudo apt install nvidia-cuda-toolkit
3. Docker
Installation instructions for Ubuntu:
https://docs.docker.com/engine/install/ubuntu/
You need to install a version that supports Docker Compose V2.
4. Nvidia Container Toolkit
Installation instructions for Linux:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Deployment
1. Docker Authentication:
export PAT=<TOKEN>
echo $PAT | docker login -u compressa --password-stdin
2. Configuration Files:
First, clone the repository with configuration:
git clone git@github.com:compressa-ai/compressa-deploy.git
cd pod
3. Download the Latest Compressa Version:
docker compose pull
4. Select LLM
The system allows you to choose the model that will be launched by default.
Configuration files available:
deploy-qwen25-14.json- Compressa-Qwen2.5-14B-Instructdeploy-qwq.json- QwQ-32Bdeploy-qwen3-14.json- Qwen3-14B
In addition to the models listed above, other models with other engines based on the same base Compressa image are available.
Example configuration files are available in pod/configs/
To change the default model, you can change the following line in docker-compose.yaml:
...
- ./deploy-qwen3-14.json:/configs/deploy.json:ro
5. Set Environment Variables and Start the Service:
-
DOCKER_GPU_IDS- list of GPU identifiers that will be available for Compressa -
RESOURCES_PATH- path to directory for storing models, e.g../data. -
HF_HOME- path to directory for caching files./data/cache. -
COMPRESSA_API_KEY- your Compressa key
Set read and write permissions for this directory usingchmod -R 777 ./datanoteIf you're deploying Compressa in a private network without internet access, use the instructions for loading resources.
export DOCKER_GPU_IDS=0
export RESOURCES_PATH=./data/compressa
6. Start the Service
docker compose up
Done! The service is available on port 8501 (UI), model API is available on port 5000.