Tutorial to Build a RAG with Google Bigquery

1. Setup Models and Credentials

Go to the /models screen and add two models:

Add Embedding Model

Add new model of type fastembed add select the text-embedding-3-small

Add LLM Model

Add a new model of type Open AI and select gpt-4o-mini

For best performance first time we reccommend using Open AI. If using Open AI sign up to https://platform.openai.com/api-keys and generate an API Key. Copy and paste this into the Credenital screen within agentcloud. You can also connect to a local model via LM Studio or Ollama - if there any issues please raise on #help on Discord.

2. Setup Datasource

If running locally via Docker, during this process we reccommend running docker compose logs -f in your terminal to follow along and catch any errors if they occur. For Advanced debugging you can also open up apps like Qdrant or Airbyte to see progress as data passes through each system. Click Advanced Debugging for instructions on how to access these UIs.

Advanced Debugging

If you want to debug the applications running under the hood, you can access them via the following addresses:

App	Location	Authentication
Airbyte	http://localhost:8000	username: airbyte password: password
Qdrant	http://localhost:6333/dashboard#/collections	N/A
Rabbit MQ	http://localhost:15672/	username: guest password: guest

Sample Elon Musk tweets dataset

You can get this dataset from Github here. If you want to use Bigquery, you can upload the csv to Bigquery via their GUI.

Go to the /datasources screen, select New Connection and add a Bigquery data source:

Provide a name and sync schedule

Provide a name and sync schedule for the data source. Select Manual for now.

For now, syncs configured via Agent Cloud, will sync all data from the source everytime. In airbyte terms this is called a full refresh - overwrite. We still need to enable other refresh options such as Incremental - Append or Full Refresh - Append in future versions, which will only sync new data via a provided cursor field (such as create date field). You can read more about it here

Provide Data Source Credentials

If you have many datasets in bigquery, make sure to complete the dataset field, otherwise Airbyte will search all datasets and take too long.

Make sure to copy and paste your keyfile json into a code editor and turn it into a single line before pasting it in to agent cloud. Failing to do so may result in an error.

Provide the GCP project ID, a dataset name and copy and paste your keyfile.json.

Select Table and Fields to Sync

Select the table and fields to sync to Agent Cloud

Selecting the top level checkbox will sync all fields.

Select field to Embed

Select the Field you would like to embed and the model you want to embed with. You can use the text-embedding-3-small which we setup in step 1 or add it here if you haven’t done so already.

Sync will begin

The sync will begin, you should see the process in your logs. You can open up Airbyte and Qdrant to see if the data has been synced and the collection has been created. (refer to Advanced Troubleshooting tab above)

3. Setup an Agent

Go to the /agents screen and create a new Agent

Provide a Role, Goal and Backstory

You can copy and paste the following prompts

👤️ Role
🎯️ Goal
📖 Backstory

You are an AI Assistant.

Select the model

Provide the LLM model that was created in step 1 gpt-4o-mini and optionally add it as the Function Calling model.

Select the datasource

Select the RAG tool that you created earlier. You can provide multiple tools if you have multiple data sources.

4. Create a Chat App

Go to the /apps screen and create a new Chat App

Select Conversational App Type

Use the conversational App type.

By default this adds a human input tool to the agent which will enable the chat experience to require input on every question.

Select the Agent

Select the Conversational Agent you just created.

Run the App

Save the App and Run it. Ask a question based on the dataset.

5. Have a chat!

If you want to make sure the agent always uses the tool, you can update the agent prompt and tell it, ALWAYS use the … tool. Otherwise if you want to build an agent with a bit more autonomy to decide on multiple tools, you can keep the config light and let it infer which tool is required. For example in the video we prompted the tool by writing According to Elon… which helped guide the LLM to the correct tool.

Get Started

Installation

Components

Tutorials & Examples

Concepts

Contribute

Community

Tutorial to Build a RAG with Google Bigquery

1. Setup Models and Credentials

2. Setup Datasource

3. Setup an Agent

4. Create a Chat App

5. Have a chat!

Get Started

Installation

Components

Tutorials & Examples

Concepts

Contribute

Community

​1. Setup Models and Credentials

​2. Setup Datasource

​3. Setup an Agent

​4. Create a Chat App

​5. Have a chat!

1. Setup Models and Credentials

2. Setup Datasource

3. Setup an Agent

4. Create a Chat App

5. Have a chat!