In this guide, we will show how to deploy a local AI model with Ollama and Open WebUI, connect a text generation model, configure embeddings for a knowledge base, and test RAG using a simple server price list example.

This setup can be used as a foundation for an internal company AI assistant — for example, to help sales managers quickly find information about products, services, pricing, or technical documentation.

Requirements

For the installation, you will need:

Windows
Docker Desktop
WSL2
PowerShell
Ollama container
Open WebUI container

It is recommended to have an NVIDIA GPU, but for the first test you can also run models on CPU. A GPU significantly speeds up response generation, especially for 7B and 14B models.

1. Installing Docker Desktop

First, install Docker Desktop for Windows.

After installation, open Docker Desktop. At the beginning, there should be no running containers in the Containers section.

Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 1

2. Creating the Project Folder

Open PowerShell and create a folder for the local AI project:

mkdir C:local-ai
cd C:local-ai

Create a docker-compose.yml file:

notepad docker-compose.yml

Paste the configuration into the file:

services:
 ollama:
   image: ollama/ollama:latest
   container_name: ollama
   ports:
     - "127.0.0.1:11434:11434"
   volumes:
     - ollama_data:/root/.ollama
   environment:
     - OLLAMA_HOST=0.0.0.0:11434
     - OLLAMA_KEEP_ALIVE=30m
   restart: unless-stopped

 open-webui:
   image: ghcr.io/open-webui/open-webui:main
   container_name: open-webui
   depends_on:
     - ollama
   ports:
     - "3000:8080"
   environment:
     - OLLAMA_BASE_URL=http://ollama:11434
     - WEBUI_SECRET_KEY=change-this-secret-key
   volumes:
     - open_webui_data:/app/backend/data
   restart: unless-stopped

volumes:
 ollama_data:
 open_webui_data:

If you plan to use an NVIDIA GPU, you can add GPU passthrough to the ollama service, but for a basic test, the configuration above is enough.

3. Starting Ollama and Open WebUI

Start the containers:

docker compose up -d

Docker will download the images:

ollama/ollama:latest
ghcr.io/open-webui/open-webui:main

After completion, check the containers:

docker ps

Two containers should be running:

ollama
open-webui

Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 2 Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 3 Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 4

4. Checking Ollama

Check that Ollama responds locally:

curl http://127.0.0.1:11434

Expected response:

Ollama is running

Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 5

5. Logging in to Open WebUI

Open in your browser:

http://localhost:3000

On the first launch, Open WebUI will ask you to create an administrator account.

Fill in:

Name
Email
Password

and click Create Admin Account.

After logging in, the main Open WebUI interface will open.

Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 6 Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 7

6. Downloading the Main Model

Now you need to download the model that will answer questions.

For testing, you can use:

docker exec -it ollama ollama pull qwen2.5:7b

If you have more resources, you can use a larger model:

docker exec -it ollama ollama pull qwen2.5:14b

Check the list of models:

docker exec -it ollama ollama list

After that, the model will appear in Open WebUI in the list of available models.

7. Downloading the Embedding Model for the Knowledge Base

To work with documents and RAG, you need an embedding model. It converts document text into a vector representation so the system can search for relevant fragments.

Download the model:

docker exec -it ollama ollama pull nomic-embed-text

Check it:

docker exec -it ollama ollama list

The list should include the model:

nomic-embed-text

8. Configuring Embeddings in Open WebUI

In Open WebUI, go to:

Admin Settings → Documents

In the Embedding section, specify:

Embedding Model Engine: Ollama
Base URL: http://ollama:11434
Embedding Model: nomic-embed-text

After that, click Save.

Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 8

Important: if the documents were uploaded before changing the embedding model, they need to be reindexed or uploaded again.

9. Creating a Knowledge Base

Go to:

Workspace → Knowledge

Create a new knowledge base:

New Knowledge

Example name:

Server Price List

This knowledge base will be used as a test RAG source. You can upload a server price list, service description, FAQ, or internal instruction to it.

Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 9

10. Uploading a File to the Knowledge Base

Inside the collection, click the + button and upload a file, for example:

Server Price List.txt

Example of simple file content:

Server Price List

GPU Server 1
Server ID: DED-START-GPU
GPU Model: NVIDIA Tesla K80
Monthly Price: $300

GPU Server 2
Server ID: DED-BUSINESS-GPU
GPU Model: NVIDIA Tesla T4
Monthly Price: $1500

After uploading, the file should appear inside the collection.

If the collection says:

No content found

it means the file has not been uploaded yet. In that case, RAG will not work.

Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 10

11. Common Error: No Sources Found

If the model replies in the chat with:

No sources found

it means that Open WebUI could not find suitable fragments in the knowledge base.

Main reasons:

  • the collection was created, but there is no file inside it;
  • the file was uploaded, but has not been indexed yet;
  • the embedding model was changed, but reindexing was not performed;
  • the Knowledge Base is not connected to the current chat;
  • the query is too general or the document is poorly structured.

In our test, the collection was created first, but there was no content inside it. Because of that, Open WebUI could not find any sources.

12. Connecting the Knowledge Base to the Chat

To use the knowledge base in the chat, start your message with # and select the required collection.

For example:

#Server Price List
Use only this knowledge base.
Find all GPU servers. Show server ID, GPU model and monthly price.

If everything works correctly, Open WebUI will show:

Retrieved 1 source

or:

Retrieved 2 sources

This means that RAG found fragments from the knowledge base and passed them to the model.

Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 11

13. Testing RAG Using the Price List Example

Test request:

Use only this knowledge base.

Find all GPU servers. Show server ID, GPU model and monthly price.

Example result:
Server ID: DED-START-GPU
GPU Model: NVIDIA Tesla K80
Monthly Price: $300

Server ID: DED-BUSINESS-GPU
GPU Model: NVIDIA Tesla T4
Monthly Price: $1500

If the response contains specific servers from the uploaded file, it means that RAG is working.

Installing and Initial Configuration of Ollama + Open WebUI for a Local AI Assistant - Image 12

14. What to Do If RAG Finds the Wrong Fragment

Sometimes the knowledge base is connected, but the model does not answer as expected. For example, Open WebUI shows Retrieved 1 source, but the retrieved fragment does not contain the required information.

This is a normal situation for RAG. The quality of search depends on the document structure.

To improve the result:

  • use clear headings;
  • separate different categories into separate files;
  • do not upload overly chaotic documents;
  • use exact service and server names;
  • increase Top K in the Documents settings;
  • run reindex after making changes.

For example, for servers, you can split the knowledge base like this:

gpu-servers.md
streaming-servers.md
storage-servers.md
database-servers.md
vps-servers.md

This will make it easier for the model to find the right fragment.

Conclusion

As a result, we have created a local AI assistant based on:

Docker
Ollama
Open WebUI
Qwen
nomic-embed-text
Knowledge Base

RAG

The base model can answer general questions, but the real value appears after connecting a knowledge base. RAG allows the model to answer not only from general knowledge, but also based on specific company documents.

Even a simple example with a server price list shows how this can be used in practice: upload the product range, connect it to the chat, and get answers about specific items, specifications, and prices.