Purpose
- Recursively split text using multiple separators
- Preserve more semantic meaning in chunks
- Try separators in order (paragraphs, then sentences, then words)
- Default configuration balances context preservation and chunk size
How it works
The function tries each separator in order. If a chunk is still too large after applying a separator, it tries the next separator in the list. This helps preserve natural text boundaries like paragraphs and sentences.Samples
Default recursive splitting
Use the default separator hierarchy:Custom chunk size and overlap
Custom separator hierarchy
Try newlines first, then spaces:Arguments
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
chunk_size | int | 800 | ✖ | Maximum number of characters per chunk |
chunk_overlap | int | 400 | ✖ | Number of characters to overlap between chunks |
separators | text[] | array[E'\n\n', E'\n', '.', '?', '!', ' ', ''] | ✖ | Array of separators to try in order |
is_separator_regex | bool | false | ✖ | Set to true if separators are regular expressions |
Returns
A JSON configuration object for use increate_vectorizer().
Related functions
chunking_character_text_splitter(): simpler single-separator splittingchunking_none(): disable chunking