freq_agg allows you to specify a minimum frequency, and mcv_agg allows you to specify the target number of values to keep.
To estimate the absolute number of times a value appears, use count_min_sketch.
Two-step aggregation
This group of functions uses the two-step aggregation pattern. Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function. Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions. The two-step aggregation pattern has several advantages:- More efficient because multiple accessors can reuse the same aggregate
- Easier to reason about performance, because aggregation is separate from final computation
- Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
- Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result
Samples
Get the 5 most common values from a table
This test uses a table of randomly generated data. The values used are the integer square roots of a random number in the range 0 to 400.Generate a table with frequencies of the most commonly seen values
Return values that represent more than 5% of the input:Available functions
Aggregates
freq_agg(): aggregate data into a space-saving aggregate with a minimum frequency cutoffmcv_agg(): aggregate data into a space-saving aggregate with a target number of values
Accessors
into_values(): return the values and their estimated frequencies from a frequency aggregatemax_frequency(): get the maximum frequency of a value from a frequency aggregatemin_frequency(): get the minimum frequency of a value from a frequency aggregatetopn(): get the N most common values from a frequency aggregate
Rollup
rollup(): combine multiple frequency aggregates