🚧 These docs are still under construction. Reach out on Discord if you’d like more information on anything about Agent Cloud.

Datasources are the heart of Agent Cloud. Agents can only do so much using only their training data, datasources allow you to give agents private access to your data so that you can manipulate, analyse or retrieve using the power of AI.

Our platform integrates with hundreds of providers, see all our possible integrations or contact us to suggest integration with a new one.

The versatility of Agent Cloud’s integrations allows agents to privately access data such as marketing analytics, sales analytics, or raw databases. This private access to your data keeps your data as yours while also allowing the agents to personalise their funcitonality to any dataset for any application.

Datasource Form

Allowing for many different connections means manny different ways of connecting to providers. Each one can’t be covered here but the Airbyte Docs provide an ability to search for any connector, take a look at it’s docs and see what’s required to connect Agent Cloud to your data provider.

Example Finding Form Values

An example of using the linked Airbyte documentation to find form values for Apple Search Ads. You can also use the search functionlity on Airbyte to search using `crtl+k` or by clicking on the search bar on the top right

Choosing Streams to Sync

Once you input the required form fields, Agent Cloud will test the connection to ensure it is correctly configured and will also prompt you to select the streams to sync, which essentially is the field(s) of data that are required to synchronise.

An example of the platform prompting to select which fields are to be synchronised, using BigQuery

Sync Mode

The Sync mode is how you wish to synchronise the data. Different sync modes have different effects on how the data is transferred and stored by Agent Cloud

  • Incremental Append synchronises the data by only getting newely added data.
  • Full Refresh Append synchornises the entire datasource to synchronise
    Please note that this kind of synchronising can incur costs from the datasource provider and may take longer

To find out what sync mode is best for you, Learn more about sync modes here

Primary Key

The Primary key is the data that will be used by agents within the retrieval process, select the data that you wish to have accessed by the agents (this may be a product description, a product name etc). This will be one of the fields within your datasource.

Cursor Field

The cursor field is the field that will be used as the reference for embedding the data. This should be a unique field such as date created or object ID to ensure that the data is uniuely stored within the vector database.

Knowledge of the funcitonality of vector databases is not required for use of Agent Cloud but if you would like to understand the cursor field further, see the Qdrnt docs here

Description

This is simply an arbitrary description of the field, this isn’t used by the agents and isn’t accessed, it’s simply used as a descriptor for the user.

Configure Chunking & Embedding

Once the datasource has been connnected and the streams to sync are configured, the embedding and chunking must be configured. Simply put, this covers the embedding process for the data and the type of retrieval to use.

Field to Embed

This field is simply the data you wish to have the agents access, this will embed the field and put it into the vector databse for the agents to be able to access. Select the field that you wish to have acessed by the agent(s).

Embedding Model

The process of embedding is usually done by an LLM, it is best to already have embedding models configured before starting this step of the setup but you are able to configure on within the popup by selecting the ”+ Create New Model” dropdown option.

If you wish you quickly continue with setup without configuring an api key for an external vendor of an LLM you can use FastEmbed.

Please note: FastEmbed is not as intelligent as other models, it allows for easy setup but may not be as efficient, accurate or intelligent as other models. We strongly reccomend changing this later
Learn more about models in Agent Cloud here

Retrieval Strategy

The retrieval strategy is the strategy used by Agents to query the embedded data within the vector database.
Different retrieval strategies have high impacts on the type of data retrieved by the agents.

Top K Results

The top k results is the number of results to return to the chat app. If you would like this datasource to return a large amount of data then set this number higher.

Note: as ‘k’ increases in size so does the chance of returning irrelevant documents, settings this too high will result in unreliable and inaccurate results

Schedule Type

The synchronisation of data can happen on a manual or scheduled basis.

  • Manual requires you to click a button to update it whenever you choose.
  • Scheduled gives you the option to synchronise the datasource with Agent Cloud on an hourly, daily, weekly or monthly basis
    Please note that frequent synchornises may incur provider costs with your datasource provider

Advanced Features

Row Chunking