Server Requirements
Specifications
For installing Compressa platform, a server with the following specifications is recommended:
- Linux server
- 3 GPUs (graphics cards) A100 40GB
- 1 - for LLM
- 2 - for embeddings
- 3 - for ETL, Rerank, Audio
- 8 CPU threads
- 160 GB RAM
- 1 TB disk space
It's possible to install embedding, reranking, and audio models on one GPU, however performance and reliability may be lower.
If you're satisfied with lower performance / answer quality or won't install this module, server requirements can be reduced.
This configuration has load limitations. When scaling, you may need to expand available resources.
CUDA Drivers
You need to install the latest compatible drivers.
The default CUDA driver version can be installed using the following commands:
sudo apt update
sudo apt install software-properties-common -y
sudo apt install ubuntu-drivers-common -y
sudo ubuntu-drivers autoinstall
sudo apt install nvidia-cuda-toolkit
Docker
Installation instructions for Ubuntu:
https://docs.docker.com/engine/install/ubuntu/
You need to install a version that supports Docker Compose V2.
Nvidia Container Toolkit
Installation instructions for Linux:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html