Quickstart: On-Premises InsightStream
This guide shows how to deploy InsightStream RAG chatbot together with Compressa for model inference.
Requirements
Deploying InsightStream with Compressa requires a server with 2 GPUs.
The requirements for GPU versions and server setup can be found on this page.
Setup
First, clone the repository with the configuration:
git clone -b insight-stream git@github.com:compressa-ai/compressa-deploy.git
cd compressa-deploy
The repository contains two main files that we’ll configure:
.env
docker-compose.yml
Set up the IDs for GPUs in the .env
file:
DOCKER_GPU_IDS_CHAT=<ID1>
DOCKER_GPU_IDS_EMB=<ID2>
With the default configuration, the services use the following ports:
- qdrant - 6333
- compressa - 5500
- insight-stream-bot - 80
If you need to modify these, update the port mappings in docker-compose.yml
for containers qdrant
, openai-api
, nginx
accordingly.
The SERVER_NAME
variable should be set to the URL on which the InsightStream bot will be used.
For example localhost:80
if you are running the solution locally or forwarding port 80 of the server to port 80 of localhost.
Setup storage
By default, the containers use the following storage paths:
- qdrant -
./data/qdrant
- compressa -
./data/compressa
This directory should have 777 permissions, which can be set via:chmod 777 -R ./data/compressa
- document's storage -
./data/documents
This directory should have 755 permissions for the usersystemd-network
and the groupsystemd-journal
, which can be set via:sudo chown systemd-network:systemd-journal ./data/documents && sudo chmod 755 ./data/documents
You can change the storage paths in docker-compose.yml
.
Then, you can run the solution with:
docker compose up --build
Deploy Inference and Embedding Models
When the services are running, we need to deploy LLM models to Compressa.
The solution uses the LLama3-8B model for chat and the SFR-Embedding-Mistral model for embeddings.
Models can be deployed using the REST API or using Swagger's UI.
The REST APIs are available at:
SERVER_NAME:5500/api/chat/
SERVER_NAME:5500/api/embeddings/
Swagger’s UI is available at:
SERVER_NAME:5500/api/chat/docs
SERVER_NAME:5500/api/embeddings/docs
Here are the commands to deploy models using curl
:
Add LLama3-8B model in Compressa:
curl -X 'POST' \
'http://localhost:5500/api/chat/v1/models/add/?model_id=compressa-ai%2FLlama-3-8B-Instruct' \
-H 'accept: application/json' \
-d ''
Add embedding model in Compressa:
curl -X 'POST' \
'http://localhost:5500/api/embeddings/v1/models/add/?model_id=Salesforce%2FSFR-Embedding-Mistral' \
-H 'accept: application/json' \
-d ''
When downloading is finished, we can deploy the models:
Deploy LLama3-8B
curl -X 'POST' \
'http://localhost:5500/api/chat/v1/deploy/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model_id": "compressa-ai/Llama-3-8B-Instruct",
"dtype": "float16"
}'
Deploy embedding model
curl -X 'POST' \
'http://localhost:5500/api/embeddings/v1/deploy/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model_id": "Salesforce/SFR-Embedding-Mistral",
"dtype": "float16"
}'
When the models are deployed, the server is ready to use!
CLI
Install
The CLI tool is used to add new documents into RAG.
The CLI can be installed from the same repository (python3.10+
required):
cd compressa-deploy/cli
pip install -r requirements.txt
Setup
The CLI tool has to have access to the deployed chatbot, models, and qdrant.
Please set the URL to them in the .env
file:
SERVER_NAME=<SERVER_NAME> # in case of port 80
QDRANT_URL=<SERVER_NAME>:6333
OPENAI_BASE=<SERVER_NAME>:5500/v1 # Compressa
QDRANT_KEY=your_secret_api_key_here
Usage
Add Documents in Index
When all environment variables are set, documents can be added to the system using one of the following commands:
python3 create_bot.py <BOT_ID> /path/to/document.pdf
python3 create_bot.py <BOT_ID> /path/to/folder
The InsightStream bot supports .docx
and .pdf
documents.
When documents are uploaded, the bot is available at <SERVER_NAME>/agent/<bot_id>
.
Ask InsightStream bot
You can open the InsightStream bot at <SERVER_NAME>/agent/<bot_id>
and ask a question in the Chat UI:
The bot can also be used via REST API.
REST API
Ask a question to the bot:
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"question": "<your_question_here>"
}' \
"<SERVER_NAME>/v.1.0/<bot_id>"
Download a file from the server:
curl <SERVER_NAME>/documents/<filename> > <filename>
Upload a new document to the server:
curl -X PUT -T /path/to/file <SERVER_NAME>/documents/<filename>