Skip to main content
Call Ollama’s local LLM API directly from SQL to generate embeddings, completions, and chat responses using open-source models running on your infrastructure.

What is Ollama?

Ollama is a tool for running large language models locally on your own hardware. Unlike cloud-based APIs, Ollama provides complete control over your models, data privacy, and costs. It supports popular open-source models like Llama, Mistral, and CodeLlama.

Key features

  • Privacy-first: All data stays on your infrastructure
  • Cost-effective: No per-token API costs
  • Offline operation: Works without internet connectivity
  • Open-source models: Access to Llama 2, Mistral, CodeLlama, and more
  • Full control: Manage model versions and configurations

Prerequisites

Before using Ollama functions, you need to:
  1. Install and run Ollama on your infrastructure
  2. Pull the models you want to use
  3. Ensure your database can access the Ollama host
For installation instructions, visit ollama.com.

Quick start

Generate embeddings

Create vector embeddings using a local model:
SELECT ai.ollama_embed(
    'llama2',
    'PostgreSQL is a powerful database',
    host => 'http://localhost:11434'
);

Generate completions

Get text completions from a local model:
SELECT ai.ollama_generate(
    'llama2',
    'Explain what PostgreSQL is in one sentence'
)->'response';

Chat completion

Have a conversation with a local model:
SELECT ai.ollama_chat_complete(
    'llama2',
    jsonb_build_array(
        jsonb_build_object('role', 'user', 'content', 'What is PostgreSQL?')
    )
)->'message'->>'content';

Available functions

Embeddings

Completions and chat

Model management

Configuration

All Ollama functions accept a host parameter to specify the Ollama server location:
-- Use default host (http://localhost:11434)
SELECT ai.ollama_embed('llama2', 'sample text');

-- Specify custom host
SELECT ai.ollama_embed('llama2', 'sample text', host => 'http://ollama-server:11434');

Resources