Purpose
- Split text into chunks based on a specified separator
- Control the chunk size and amount of overlap between chunks
- Simple, predictable chunking strategy
Samples
Basic character splitting
Split content into 128-character chunks with 10-character overlap:Custom separator
Split on newlines:Regex separator
Split using a regular expression:Arguments
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
chunk_size | int | 800 | ✖ | Maximum number of characters in a chunk |
chunk_overlap | int | 400 | ✖ | Number of characters to overlap between chunks |
separator | text | E'\n\n' | ✖ | String or character used to split the text |
is_separator_regex | bool | false | ✖ | Set to true if separator is a regular expression |
Returns
A JSON configuration object for use increate_vectorizer().
Related functions
chunking_recursive_character_text_splitter(): more sophisticated recursive splittingchunking_none(): disable chunking