openai_embed()

Generate vector embeddings from text, text arrays, or tokens using OpenAI’s embedding models. Embeddings are numerical representations of text that capture semantic meaning, making them ideal for semantic search, recommendations, and clustering.

Samples

Generate an embedding from text

Create a vector embedding for a single piece of text:

SELECT ai.openai_embed(
    'text-embedding-ada-002',
    'PostgreSQL is a powerful database'
);

Generate embeddings for multiple texts

Process multiple texts at once for efficiency:

SELECT ai.openai_embed(
    'text-embedding-ada-002',
    array[
        'PostgreSQL is a powerful database',
        'TimescaleDB extends PostgreSQL for time-series',
        'pgai brings AI capabilities to PostgreSQL'
    ]
);

Specify embedding dimensions

Control the size of the output vector (model-dependent):

SELECT ai.openai_embed(
    'text-embedding-3-small',
    'PostgreSQL is a powerful database',
    dimensions => 768
);

Use pre-tokenized input

Provide tokens directly instead of text:

SELECT ai.openai_embed(
    'text-embedding-ada-002',
    array[1820, 25977, 46840, 23874, 389, 264, 2579, 58466]
);

Store embeddings in a table

Generate and store embeddings for your data:

UPDATE documents
SET embedding = ai.openai_embed(
    'text-embedding-ada-002',
    content
)
WHERE embedding IS NULL;

Arguments

Name	Type	Default	Required	Description
`model`	`TEXT`	-	✔	The OpenAI embedding model to use (e.g., `text-embedding-ada-002`, `text-embedding-3-small`)
`input_text`	`TEXT`	-	✔	Single text input to embed (use this OR `input_texts` OR `input_tokens`)
`input_texts`	`TEXT[]`	-	✔	Array of text inputs to embed in a batch
`input_tokens`	`INT[]`	-	✔	Pre-tokenized input as an array of token IDs
`api_key`	`TEXT`	`NULL`	✖	OpenAI API key. If not provided, uses `ai.openai_api_key` setting
`api_key_name`	`TEXT`	`NULL`	✖	Name of the secret containing the API key
`dimensions`	`INT`	`NULL`	✖	Number of dimensions for the output embedding (only supported by some models)
`openai_user`	`TEXT`	`NULL`	✖	Unique identifier for the end-user for abuse monitoring
`encoding_format`	`TEXT`	`NULL`	✖	Format for the embeddings (`float` or `base64`)
`extra_headers`	`JSONB`	`NULL`	✖	Additional HTTP headers to include in the API request
`extra_query`	`JSONB`	`NULL`	✖	Additional query parameters for the API request
`verbose`	`BOOLEAN`	`FALSE`	✖	Enable verbose logging for debugging
`client_config`	`JSONB`	`NULL`	✖	Advanced client configuration options

Returns

For single text input:

vector: A pgvector compatible vector containing the embedding

For array input:

TABLE(index INT, embedding vector): A table with an index and embedding for each input text

openai_embed_with_raw_response(): get the full API response including metadata
openai_tokenize(): convert text to tokens before embedding
openai_list_models(): see available embedding models

OpenAI

Ollama

Anthropic

Cohere

Voyage AI

LiteLLM

Vectorizer

Samples

Generate an embedding from text

Generate embeddings for multiple texts

Specify embedding dimensions

Use pre-tokenized input

Store embeddings in a table

Arguments

Returns

OpenAI

Ollama

Anthropic

Cohere

Voyage AI

LiteLLM

Vectorizer

​Samples

​Generate an embedding from text

​Generate embeddings for multiple texts

​Specify embedding dimensions

​Use pre-tokenized input

​Store embeddings in a table

​Arguments

​Returns

​Related functions

Samples

Generate an embedding from text

Generate embeddings for multiple texts

Specify embedding dimensions

Use pre-tokenized input

Store embeddings in a table

Arguments

Returns

Related functions