Prepare

You may have heard of Retrieval Augmented Generation or RAG frequently if you're a developer working with generative AI.

In simple terms, RAG is about sending your own data (via a retrieval tool) as part of the prompt to the large language model. As a result, you get output that is more accurate and relevant to your data. This technique helps avoid hallucinations and fact-checks the output even when the data isn't in the LLM's training datasets.

In this tutorial, we will walk through a basic RAG system step by step with the help of the vector search capabilities of TiDB Cloud Serverless.

First, go to TiDB Cloud to create a TiDB Serverless cluster and get the connection string. TiDB Cloud provides a free database with vector search capabilities so we can use it to store all the embeddings.
Optionally, you can setup ollama on your local machine. This runs the LLM for text generation with your own computation for free. For example if you are on a mac, you can just run following commands in your terminal:
```
# install ollama and start the server
```
```
brew install ollama
```
```
ollama serve
```
Now you have ollama running on your local machine! Then open another terminal window and run ollama run llama3.2 to download and start the llama3.2 model. This is a 3B model that takes around 2GB of space to download. After downloading, you can click the button below to say hello to ollama.
You can also pick other models from ollama's model list here. By the way, if you are visiting this page from a non-localhost origin, you will also need to set the OLLAMA_ORIGINS environment variable to tell ollama to accept requests from this origin by running export OLLAMA_ORIGINS="" && ollama servein your terminal.

How to build a RAG app with TiDB Vector Search

Prepare