Purpose
- Automate the process of creating embeddings for table data
- Set up necessary infrastructure (tables, views, triggers, columns)
- Configure the embedding generation process
- Integrate with AI providers for embedding creation
- Set up scheduling for background processing
Samples
Table destination (default)
Create a separate table to store embeddings with a view that joins with the source table:- A vectorizer named ‘website_blog_vectorizer’ for the
website.blogtable - A separate table
website.blog_embeddings_storeto store embeddings - A view
website.blog_embeddingsjoining source and embeddings - Loads the
contentscolumn - Uses Ollama
nomic-embed-textmodel to create 768 dimensional embeddings - Chunks content into 128-character pieces with 10-character overlap
- Formats each chunk with title and published date
- Grants necessary permissions to roles
bobandalice
Column destination
Store embeddings directly in the source table (requires no chunking):- A vectorizer named ‘product_descriptions_vectorizer’
- A column
description_embeddingdirectly in the source table - Loads the
descriptioncolumn - No chunking (required for column destination)
- Uses OpenAI’s embedding model to create 768 dimensional embeddings
- Grants necessary permissions to role
marketing_team
Arguments
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
source | regclass | - | ✔ | The source table that embeddings are generated for |
name | text | Auto-generated | ✖ | Unique name for the vectorizer. Auto-generated based on destination type if not provided. Must follow snake_case pattern ^[a-z][a-z_0-9]*$ |
destination | Destination config | ai.destination_table() | ✖ | How embeddings will be stored: ai.destination_table() (default) or ai.destination_column() |
embedding | Embedding config | - | ✔ | How to embed the data using ai.embedding_*() functions |
loading | Loading config | - | ✔ | How to load data from source table using ai.loading_*() functions |
parsing | Parsing config | ai.parsing_auto() | ✖ | How to parse the data using ai.parsing_*() functions |
chunking | Chunking config | ai.chunking_recursive_character_text_splitter() | ✖ | How to split text data using ai.chunking_*() functions |
indexing | Indexing config | ai.indexing_default() | ✖ | How to index embeddings using ai.indexing_*() functions |
formatting | Formatting config | ai.formatting_python_template() | ✖ | How to format data before embedding |
scheduling | Scheduling config | ai.scheduling_default() | ✖ | How often to run the vectorizer using ai.scheduling_*() functions |
processing | Processing config | ai.processing_default() | ✖ | How to process embeddings |
queue_schema | name | - | ✖ | Schema where the work queue table is created |
queue_table | name | - | ✖ | Name of the work queue table |
grant_to | Grant config | ai.grant_to_default() | ✖ | Which users can use objects created by the vectorizer |
enqueue_existing | bool | true | ✖ | Whether existing rows should be immediately queued for embedding |
if_not_exists | bool | false | ✖ | Avoid error if the vectorizer already exists |
Returns
INT: The ID of the vectorizer created. You can also reference the vectorizer by its name in management functions.
Related functions
drop_vectorizer(): remove a vectorizerdestination_table(): store embeddings in separate tabledestination_column(): store embeddings in source tableenable_vectorizer_schedule(): resume automatic processingdisable_vectorizer_schedule(): pause automatic processing