Skip to main content
Automatically select an appropriate parser based on detected file types. Documents with unrecognizable formats won’t be processed and will generate an error in the ai.vectorizer_errors table.

Parser selection

The parser selection examines file extensions and content types:
  • PDF files, images, Office documents (DOCX, XLSX, etc.): Uses Docling
  • EPUB and MOBI (e-book formats): Uses PyMuPDF
  • Text formats (TXT, MD, etc.): No parser used (content read directly)

Samples

Use automatic parser selection

SELECT ai.create_vectorizer(
    'documents'::regclass,
    loading => ai.loading_uri('file_path'),
    parsing => ai.parsing_auto(),
    embedding => ai.embedding_openai('text-embedding-3-small', 768)
);

Arguments

This function takes no parameters.

Returns

A JSON configuration object for use in create_vectorizer().