parsing_auto()

Automatically select an appropriate parser based on detected file types. Documents with unrecognizable formats won’t be processed and will generate an error in the ai.vectorizer_errors table.

Parser selection

The parser selection examines file extensions and content types:

PDF files, images, Office documents (DOCX, XLSX, etc.): Uses Docling
EPUB and MOBI (e-book formats): Uses PyMuPDF
Text formats (TXT, MD, etc.): No parser used (content read directly)

Samples

Use automatic parser selection

SELECT ai.create_vectorizer(
    'documents'::regclass,
    loading => ai.loading_uri('file_path'),
    parsing => ai.parsing_auto(),
    embedding => ai.embedding_openai('text-embedding-3-small', 768)
);

Arguments

This function takes no parameters.

Returns

A JSON configuration object for use in create_vectorizer().

parsing_none(): skip parsing for textual data
parsing_docling(): explicitly use Docling parser
parsing_pymupdf(): explicitly use PyMuPDF parser
loading_uri(): load data from file URIs

OpenAI

Ollama

Anthropic

Cohere

Voyage AI

LiteLLM

Vectorizer

Parser selection

Samples

Use automatic parser selection

Arguments

Returns

OpenAI

Ollama

Anthropic

Cohere

Voyage AI

LiteLLM

Vectorizer

​Parser selection

​Samples

​Use automatic parser selection

​Arguments

​Returns

​Related functions

Parser selection

Samples

Use automatic parser selection

Arguments

Returns

Related functions