Configuration for chunking unstructured data, including partitioning and chunking strategies, character limits, and similarity thresholds.

partitioning
enum<string>
required

The partitioning strategy used for unstructured data.

Available options:
auto,
fast,
hi_res,
ocr_only
strategy
enum<string>
required

The chunking strategy used for unstructured data.

Available options:
basic,
by_title,
by_page,
by_similarity
max_characters
integer
required

The maximum number of characters allowed per chunk.

new_after_n_chars
integer
required

The number of characters after which a new chunk is created.

overlap
integer
required

The number of characters to overlap between chunks.

similarity_threshold
number
required

Threshold for similarity when chunking by similarity, with a value between 0.0 and 1.0.

Required range: 0 < x < 1
overlap_all
boolean
required

Indicates whether to apply overlap to all chunks or only between adjacent chunks.