Tutorial to Build a RAG with Google Bigquery
Step by step guide to setup a conversational chat app to RAG a Google Big Query datasource (or any other data source)
1. Setup Models and Credentials
Go to the /models
screen and add two models:
Add Embedding Model
Add Embedding Model
Add new model of type fastembed add select the text-embedding-3-small
Add LLM Model
Add LLM Model
Add a new model of type Open AI and select gpt-4o-mini
2. Setup Datasource
If running locally via Docker, during this process we reccommend running docker compose logs -f
in your terminal to follow along and catch any errors if they occur.
For Advanced debugging you can also open up apps like Qdrant or Airbyte to see progress as data passes through each system.
Click Advanced Debugging for instructions on how to access these UIs.
Advanced Debugging
Advanced Debugging
If you want to debug the applications running under the hood, you can access them via the following addresses:
App | Location | Authentication |
---|---|---|
Airbyte | http://localhost:8000 | username: airbyte password: password |
Qdrant | http://localhost:6333/dashboard#/collections | N/A |
Rabbit MQ | http://localhost:15672/ | username: guest password: guest |
Sample Elon Musk tweets dataset
Sample Elon Musk tweets dataset
You can get this dataset from Github here. If you want to use Bigquery, you can upload the csv to Bigquery via their GUI.
Go to the /datasources
screen, select New Connection and add a Bigquery data source:
Provide a name and sync schedule
Provide a name and sync schedule
Provide a name and sync schedule for the data source. Select Manual
for now.
full refresh - overwrite
. We still need to enable other refresh options such as Incremental - Append
or Full Refresh - Append
in future versions, which will only sync new data via a provided cursor field (such as create date field). You can read more about it here Provide Data Source Credentials
Provide Data Source Credentials
keyfile.json
.Select Table and Fields to Sync
Select Table and Fields to Sync
Select the table and fields to sync to Agent Cloud
Select field to Embed
Select field to Embed
Select the Field you would like to embed and the model you want to embed with. You can use the text-embedding-3-small
which we setup in step 1 or add it here if you haven’t done so already.
Sync will begin
Sync will begin
The sync will begin, you should see the process in your logs. You can open up Airbyte and Qdrant to see if the data has been synced and the collection has been created. (refer to Advanced Troubleshooting tab above)
3. Setup an Agent
Go to the /agents
screen and create a new Agent
Provide a Role, Goal and Backstory
Provide a Role, Goal and Backstory
You can copy and paste the following prompts
Select the model
Select the model
Provide the LLM model that was created in step 1 gpt-4o-mini
and optionally add it as the Function Calling model.
Select the datasource
Select the datasource
Select the RAG tool that you created earlier. You can provide multiple tools if you have multiple data sources.
4. Create a Chat App
Go to the /apps
screen and create a new Chat App
Select Conversational App Type
Select Conversational App Type
Use the conversational App type.
Select the Agent
Select the Agent
Select the Conversational Agent you just created.
Run the App
Run the App
Save the App and Run it. Ask a question based on the dataset.
5. Have a chat!
If you want to make sure the agent always uses the tool, you can update the agent prompt and tell it, ALWAYS use the … tool. Otherwise if you want to build an agent with a bit more autonomy to decide on multiple tools, you can keep the config light and let it infer which tool is required. For example in the video we prompted the tool by writing According to Elon… which helped guide the LLM to the correct tool.