List of Available Models and Engines
| Engine | LLM | Embeddings | Rerank |
|---|---|---|---|
| vLLM | ✅ - Qwen2.5-14B-Instruct - Qwen3-14B - Qwen3-32B-AWQ - QwQ-32B-AWQ | ✅ - SFR-Embedding-Mistral - gte-Qwen2-1.5B-instruct - mxbai-embed-large-v1 | ✅ - bge-reranker-base |
| LMDeploy | ✅ - Qwen2.5-14B-Instruct - Qwen3-14B - Qwen3-32B-AWQ - QwQ-32B-AWQ | ❌ | ❌ |
| SGLang | ✅ - Qwen2.5-14B-Instruct - Qwen3-14B - Qwen3-32B-AWQ - QwQ-32B-AWQ | ✅ - SFR-Embedding-Mistral - gte-Qwen2-1.5B-instruct - mxbai-embed-large-v1 | ❌ |
| Llamacpp CPU-only mode | ✅ - Mistral-7B-Instruct-v0.1-GGUF - TinyLLaMA-1.1B-Chat-v1.0-GGUF | ✅ - mxbai-embed-large-v1 | ❌ |
| Infinity | ❌ | ✅ - SFR-Embedding-Mistral - gte-Qwen2-1.5B-instruct - mxbai-embed-large-v1 | ✅ - mxbai-rerank-large-v1 - bge-reranker-base |
The list of models in the table are those models that have been tested by the Compressa team. Other models compatible with the listed engines can also be deployed.
Other Models
- TTS (Text-To-Speech) - XTTS-v2 model, based on Coqui
- ASR (Automatic Speech Recognition) - T-One model
For ASR model, 3 modes are available:
- offline - file upload -> get response Open-AI transcription
- streaming - file upload -> stream response Open-AI voice stream
- audio input streaming and response streaming (Web Socket) Open-AI Realtime