Skip to main content
Convert text into an array of token IDs using OpenAI’s tokenization algorithm. This is useful for counting tokens to estimate API costs, stay within model limits, and understand how your text is processed.

Samples

Tokenize text

Convert a string into tokens:
SELECT ai.openai_tokenize(
    'text-embedding-ada-002',
    'Timescale is Postgres made Powerful'
);
Returns:
          openai_tokenize
----------------------------------------
 {19422,2296,374,3962,18297,1903,75458}

Count tokens

Determine how many tokens a text will use:
SELECT array_length(
    ai.openai_tokenize(
        'text-embedding-ada-002',
        'Timescale is Postgres made Powerful'
    ),
    1
) AS token_count;
Returns:
 token_count
-------------
           7

Check token count before API call

Ensure your text fits within model limits:
SELECT
    content,
    array_length(ai.openai_tokenize('gpt-4o-mini', content), 1) AS tokens
FROM documents
WHERE array_length(ai.openai_tokenize('gpt-4o-mini', content), 1) > 8000;

Arguments

NameTypeDefaultRequiredDescription
modelTEXT-The OpenAI model to tokenize for (e.g., text-embedding-ada-002, gpt-4o)
text_inputTEXT-The text to convert into tokens

Returns

INT[]: An array of token IDs representing the input text.