Skip to main content
Since 1.3.0 Perform linear regression analysis, for example to calculate correlation coefficient and covariance, on two-dimensional data. You can also calculate common statistics, such as average and standard deviation, on each dimension separately. These functions are similar to the PostgreSQL statistical aggregates, but they include more features and are easier to use in continuous aggregates and window functions. The linear regressions are based on the standard least-squares fitting method. These functions work on two-dimensional data. To work with one-dimensional data, for example to calculate the average and standard deviation of a single variable, see the one-dimensional stats_agg functions.

Two-step aggregation

This group of functions uses the two-step aggregation pattern. Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function. Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions. The two-step aggregation pattern has several advantages:
  1. More efficient because multiple accessors can reuse the same aggregate
  2. Easier to reason about performance, because aggregation is separate from final computation
  3. Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
  4. Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result
To learn more, see the blog post on two-step aggregates.

Samples

Calculate regression and statistical properties

Create a statistical aggregate that summarizes daily statistical data about two variables, val2 and val1, where val2 is the dependent variable and val1 is the independent variable. Use the statistical aggregate to calculate the average of the dependent variable and the slope of the linear-regression fit:
WITH t AS (
    SELECT
        time_bucket('1 day'::interval, ts) AS dt,
        stats_agg(val2, val1) AS stats2D
    FROM foo
    WHERE id = 'bar'
    GROUP BY time_bucket('1 day'::interval, ts)
)
SELECT
    average_x(stats2D),
    slope(stats2D)
FROM t;

Available functions

Aggregate

  • stats_agg(): aggregate data into an intermediate statistical aggregate form for further calculation

Accessors for y variable statistics

  • average_y(): calculate the average of the dependent variable from a statistical aggregate
  • stddev_y(): calculate the standard deviation of the dependent variable from a statistical aggregate
  • variance_y(): calculate the variance of the dependent variable from a statistical aggregate
  • skewness_y(): calculate the skewness of the dependent variable from a statistical aggregate
  • kurtosis_y(): calculate the kurtosis of the dependent variable from a statistical aggregate
  • sum_y(): calculate the sum of the dependent variable from a statistical aggregate

Accessors for regression analysis

  • corr(): calculate the correlation coefficient from a statistical aggregate
  • covariance(): calculate the covariance from a statistical aggregate
  • determination_coeff(): calculate the coefficient of determination (R²) from a statistical aggregate
  • slope(): calculate the slope of the linear regression line from a statistical aggregate
  • intercept(): calculate the y-intercept of the linear regression line from a statistical aggregate
  • x_intercept(): calculate the x-intercept of the linear regression line from a statistical aggregate

Accessors for aggregate information

  • num_vals(): get the number of values contained in a statistical aggregate

Rollup

  • rollup(): combine multiple two-dimensional statistical aggregates

Mutator

  • rolling(): create a rolling window aggregate for use in window functions