Datasources
What are Datasources in Agent Cloud?
🚧 These docs are still under construction. Reach out on Discord if you’d like more information on anything about Agent Cloud.
Datasources are the heart of Agent Cloud. Agents can only do so much using only their training data, datasources allow you to give agents private access to your data so that you can manipulate, analyse or retrieve using the power of AI.
The versatility of Agent Cloud’s integrations allows agents to privately access data such as marketing analytics, sales analytics, or raw databases. This private access to your data keeps your data as yours while also allowing the agents to personalise their funcitonality to any dataset for any application.
Datasource Form
Allowing for many different connections means manny different ways of connecting to providers. Each one can’t be covered here but the Airbyte Docs provide an ability to search for any connector, take a look at it’s docs and see what’s required to connect Agent Cloud to your data provider.
Example Finding Form Values
An example of using the linked Airbyte documentation to find form values for Apple Search Ads. You can also use the search functionlity on Airbyte to search using `crtl+k` or by clicking on the search bar on the top right
Choosing Streams to Sync
Once you input the required form fields, Agent Cloud will test the connection to ensure it is correctly configured and will also prompt you to select the streams to sync
, which essentially is the field(s) of data that are required to synchronise.
An example of the platform prompting to select which fields are to be synchronised, using BigQuery
Sync Mode
The Sync mode is how you wish to synchronise the data. Different sync modes have different effects on how the data is transferred and stored by Agent Cloud
Incremental Append
synchronises the data by only getting newely added data.Full Refresh Append
synchornises the entire datasource to synchronisePlease note that this kind of synchronising can incur costs from the datasource provider and may take longer
To find out what sync mode is best for you, Learn more about sync modes here
Primary Key
The Primary key is the data that will be used by agents within the retrieval process, select the data that you wish to have accessed by the agents (this may be a product description, a product name etc). This will be one of the fields within your datasource.
Cursor Field
The cursor field is the field that will be used as the reference for embedding the data. This should be a unique field such as date created
or object ID
to ensure that the data is uniuely stored within the vector database.
Description
This is simply an arbitrary description of the field, this isn’t used by the agents and isn’t accessed, it’s simply used as a descriptor for the user.
Configure Chunking & Embedding
Once the datasource has been connnected and the streams to sync are configured, the embedding and chunking must be configured. Simply put, this covers the embedding process for the data and the type of retrieval to use.
Field to Embed
This field is simply the data you wish to have the agents access, this will embed the field and put it into the vector databse for the agents to be able to access. Select the field that you wish to have acessed by the agent(s).
Embedding Model
The process of embedding is usually done by an LLM, it is best to already have embedding models configured before starting this step of the setup but you are able to configure on within the popup by selecting the ”+ Create New Model” dropdown option.
If you wish you quickly continue with setup without configuring an api key for an external vendor of an LLM you can use FastEmbed.
Retrieval Strategy
The retrieval strategy is the strategy used by Agents to query the embedded data within the vector database.
Different retrieval strategies have high impacts on the type of data retrieved by the agents.
Top K Results
The top k results is the number of results to return to the chat app. If you would like this datasource to return a large amount of data then set this number higher.
Schedule Type
The synchronisation of data can happen on a manual or scheduled basis.
- Manual requires you to click a button to update it whenever you choose.
- Scheduled gives you the option to synchronise the datasource with Agent Cloud on an hourly, daily, weekly or monthly basis
Please note that frequent synchornises may incur provider costs with your datasource provider