openai_tokenize()

Convert text into an array of token IDs using OpenAI’s tokenization algorithm. This is useful for counting tokens to estimate API costs, stay within model limits, and understand how your text is processed.

Samples

Tokenize text

Convert a string into tokens:

SELECT ai.openai_tokenize(
    'text-embedding-ada-002',
    'Timescale is Postgres made Powerful'
);

Returns:

          openai_tokenize
----------------------------------------
 {19422,2296,374,3962,18297,1903,75458}

Count tokens

Determine how many tokens a text will use:

SELECT array_length(
    ai.openai_tokenize(
        'text-embedding-ada-002',
        'Timescale is Postgres made Powerful'
    ),
    1
) AS token_count;

Returns:

 token_count
-------------
           7

Check token count before API call

Ensure your text fits within model limits:

SELECT
    content,
    array_length(ai.openai_tokenize('gpt-4o-mini', content), 1) AS tokens
FROM documents
WHERE array_length(ai.openai_tokenize('gpt-4o-mini', content), 1) > 8000;

Arguments

Name	Type	Default	Required	Description
`model`	`TEXT`	-	✔	The OpenAI model to tokenize for (e.g., `text-embedding-ada-002`, `gpt-4o`)
`text_input`	`TEXT`	-	✔	The text to convert into tokens

Returns

INT[]: An array of token IDs representing the input text.

openai_detokenize(): convert tokens back into text
openai_embed(): generate embeddings from text or tokens
openai_chat_complete(): use tokens for completion

OpenAI

Ollama

Anthropic

Cohere

Voyage AI

LiteLLM

Vectorizer

Samples

Tokenize text

Count tokens

Check token count before API call

Arguments

Returns

OpenAI

Ollama

Anthropic

Cohere

Voyage AI

LiteLLM

Vectorizer

​Samples

​Tokenize text

​Count tokens

​Check token count before API call

​Arguments

​Returns

​Related functions

Samples

Tokenize text

Count tokens

Check token count before API call

Arguments

Returns

Related functions