Skip to main content
Early access 1.8.0 Count the number of times a value appears in a column, using the probabilistic count-min sketch data structure and its associated algorithms. For applications where a small error rate is tolerable, this can result in huge savings in both CPU time and memory, especially for large datasets. The count-min sketch produces a biased estimator of frequency. It might overestimate the item count, but it can’t underestimate. You can control the relative error and the probability that the estimate falls outside the error bounds.

Two-step aggregation

This group of functions uses the two-step aggregation pattern. Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function. Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions. The two-step aggregation pattern has several advantages:
  1. More efficient because multiple accessors can reuse the same aggregate
  2. Easier to reason about performance, because aggregation is separate from final computation
  3. Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
  4. Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result
To learn more, see the blog post on two-step aggregates.

Samples

Estimate counts for specific values

Create a count-min sketch and estimate how many times specific user IDs appear:
WITH sketch AS (
    SELECT toolkit_experimental.count_min_sketch(
        user_id::text,
        0.01,  -- 1% error tolerance
        0.01   -- 1% probability of exceeding error bounds
    ) AS cms
    FROM user_events
)
SELECT
    toolkit_experimental.approx_count(cms, 'user123') AS user123_count,
    toolkit_experimental.approx_count(cms, 'user456') AS user456_count
FROM sketch;

Available functions

Aggregate

Accessor

  • approx_count(): estimate the number of times a value appears in a count-min sketch