Skip to main content

List of Available Models and Engines

EngineLLMEmbeddingsRerank
vLLM
- Qwen2.5-14B-Instruct
- Qwen3-14B
- Qwen3-32B-AWQ
- QwQ-32B-AWQ

- SFR-Embedding-Mistral
- gte-Qwen2-1.5B-instruct
- mxbai-embed-large-v1

- bge-reranker-base
LMDeploy
- Qwen2.5-14B-Instruct
- Qwen3-14B
- Qwen3-32B-AWQ
- QwQ-32B-AWQ
SGLang
- Qwen2.5-14B-Instruct
- Qwen3-14B
- Qwen3-32B-AWQ
- QwQ-32B-AWQ

- SFR-Embedding-Mistral
- gte-Qwen2-1.5B-instruct
- mxbai-embed-large-v1
Llamacpp
CPU-only mode

- Mistral-7B-Instruct-v0.1-GGUF
- TinyLLaMA-1.1B-Chat-v1.0-GGUF

- mxbai-embed-large-v1
Infinity
- SFR-Embedding-Mistral
- gte-Qwen2-1.5B-instruct
- mxbai-embed-large-v1

- mxbai-rerank-large-v1
- bge-reranker-base

The list of models in the table are those models that have been tested by the Compressa team. Other models compatible with the listed engines can also be deployed.

Other Models

  • TTS (Text-To-Speech) - XTTS-v2 model, based on Coqui
  • ASR (Automatic Speech Recognition) - T-One model

For ASR model, 3 modes are available: