chunking_recursive_character_text_splitter()

Recursively split text into chunks using multiple separators. This provides more fine-grained control over the chunking process and can better preserve semantic meaning by trying separators in order.

Purpose

Recursively split text using multiple separators
Preserve more semantic meaning in chunks
Try separators in order (paragraphs, then sentences, then words)
Default configuration balances context preservation and chunk size

How it works

The function tries each separator in order. If a chunk is still too large after applying a separator, it tries the next separator in the list. This helps preserve natural text boundaries like paragraphs and sentences.

Samples

Default recursive splitting

Use the default separator hierarchy:

SELECT ai.create_vectorizer(
    'blog_posts'::regclass,
    loading => ai.loading_column('content'),
    embedding => ai.embedding_openai('text-embedding-3-small', 768),
    chunking => ai.chunking_recursive_character_text_splitter()
);

Custom chunk size and overlap

SELECT ai.create_vectorizer(
    'documents'::regclass,
    loading => ai.loading_column('content'),
    embedding => ai.embedding_openai('text-embedding-3-small', 768),
    chunking => ai.chunking_recursive_character_text_splitter(256, 20)
);

Custom separator hierarchy

Try newlines first, then spaces:

SELECT ai.create_vectorizer(
    'text_data'::regclass,
    loading => ai.loading_column('text'),
    embedding => ai.embedding_openai('text-embedding-3-small', 768),
    chunking => ai.chunking_recursive_character_text_splitter(
        chunk_size => 512,
        chunk_overlap => 50,
        separators => array[E'\n\n', E'\n', ' ', '']
    )
);

Arguments

Name	Type	Default	Required	Description
`chunk_size`	`int`	`800`	✖	Maximum number of characters per chunk
`chunk_overlap`	`int`	`400`	✖	Number of characters to overlap between chunks
`separators`	`text[]`	`array[E'\n\n', E'\n', '.', '?', '!', ' ', '']`	✖	Array of separators to try in order
`is_separator_regex`	`bool`	`false`	✖	Set to `true` if separators are regular expressions

Returns

A JSON configuration object for use in create_vectorizer().

chunking_character_text_splitter(): simpler single-separator splitting
chunking_none(): disable chunking

OpenAI

Ollama

Anthropic

Cohere

Voyage AI

LiteLLM

Vectorizer

chunking_recursive_character_text_splitter()

Purpose

How it works

Samples

Default recursive splitting

Custom chunk size and overlap

Custom separator hierarchy

Arguments

Returns

OpenAI

Ollama

Anthropic

Cohere

Voyage AI

LiteLLM

Vectorizer

​Purpose

​How it works

​Samples

​Default recursive splitting

​Custom chunk size and overlap

​Custom separator hierarchy

​Arguments

​Returns

​Related functions

Purpose

How it works

Samples

Default recursive splitting

Custom chunk size and overlap

Custom separator hierarchy

Arguments

Returns

Related functions