Skip to main content

List of Available Models and Engines

Engine	LLM	Embeddings	Rerank
vLLM	✅ - Qwen2.5-14B-Instruct - Qwen3-14B - Qwen3-32B-AWQ - QwQ-32B-AWQ	✅ - SFR-Embedding-Mistral - gte-Qwen2-1.5B-instruct - mxbai-embed-large-v1	✅ - bge-reranker-base
LMDeploy	✅ - Qwen2.5-14B-Instruct - Qwen3-14B - Qwen3-32B-AWQ - QwQ-32B-AWQ	❌	❌
SGLang	✅ - Qwen2.5-14B-Instruct - Qwen3-14B - Qwen3-32B-AWQ - QwQ-32B-AWQ	✅ - SFR-Embedding-Mistral - gte-Qwen2-1.5B-instruct - mxbai-embed-large-v1	❌
Llamacpp CPU-only mode	✅ - Mistral-7B-Instruct-v0.1-GGUF - TinyLLaMA-1.1B-Chat-v1.0-GGUF	✅ - mxbai-embed-large-v1	❌
Infinity	❌	✅ - SFR-Embedding-Mistral - gte-Qwen2-1.5B-instruct - mxbai-embed-large-v1	✅ - mxbai-rerank-large-v1 - bge-reranker-base

The list of models in the table are those models that have been tested by the Compressa team. Other models compatible with the listed engines can also be deployed.

Other Models

TTS (Text-To-Speech) - XTTS-v2 model, based on Coqui
ASR (Automatic Speech Recognition) - T-One model

For ASR model, 3 modes are available:

offline - file upload -> get response Open-AI transcription
streaming - file upload -> stream response Open-AI voice stream
audio input streaming and response streaming (Web Socket) Open-AI Realtime

Other Models