Window functions - do more with your SQL

Bjørn Hamre

Frode Borlaug

Half-day workshop - in English

Description

SQL is a highly expressive and rich language offering incredible features at a declarative level. One of the coolest features are window functions, whose coolness is in no proportion to their incredibly low popularity. The low popularity can only be due to developers being oblivious of this cool stuff. SQL has changed a lot since SQL-92, which is what we learn in school and tend to use. Window functions was part of the SQL-2003 standard but have seen a slow adoption rate - we want to change this. As big (and small) data become more and more important to all businesses, we believe that window functions and Common Table Expressions (CTEs) are very useful tools for (almost) every developer, not only data scientists and analytics.

Window functions define a frame or window of rows with a given length around the current row, and performs a calculation across the set of data in the window. This makes comparing values or ranking them across rows extremely efficient and easy with no need for post-processing in a reporting tool.

The whole idea behind window functions is to allow you to process several values of the result set at a time: you see through the window some peer rows and are able to compute a single output value from them, much like when using an aggregate function. 

This workshop will have three parts:

Part 1:
Introduction to Common Table Expression (CTE).

WITH … AS … SQL

CTE is useful on its own, but window function queries tend to be easier to write and understand if they are expressed with a CTE.

Introduction to basic Window Functions Syntax.

SELECT … OVER ( [PARTITION BY] ) WHERE

Offset functions

Many reports need to compare some value (revenue, stock price) with an earlier time period, e.g. revenue same month last year. Show how this can be expressed with offset functions like lead and lag. 

Exercises

Walk-through of solution proposals

 

Part 2:
Aggregate functions with order by, partitioning and framing

SELECT … OVER ( … [PARTITION BY] [ORDER BY] [FRAME_CLAUSE] ) WHERE

Aggregate used in window functions will be calculated over the defined window/partition and returned as a column in each row, not reduced to a single row as in “ordinary” SQL.

Use partition by to define the window over which the calculation is done (per customer, per month and so on).

Use order by and framing to create reports with moving or running aggregates.

Exercises

Walk-through of solution proposals

 

Part 3:
Introduction to ranking functions and top-n problems.

Find top n values per partition (top 3 customers by sales, top 3 departments by revenue, and so on). 

Show different ways of ranking the data

Unique rank: 1, 2, 3, 4

Rank with ties and no gaps: 1, 2, 2, 3

Rank with ties and gaps: 1, 2, 2, 4

 

Exercises

Walk-through of solution proposals and wrap-up

 

Part “Bonus”:

Island Problems

Identifying ranges ranges of existing values in a sequence.

Gap Problems

Identifying ranges of missing values in a sequence.

 

Intended audience and requirements

Participants must have a basic knowledge of SQL (SELECT, FROM, WHERE, GROUP BY, ORDER BY, aggregate functions like sum, avg, count, and joins.

Primarily for: Developers, Others

Participant requirements: Participants must have a SQL editor installed as this is a practical workshop. Database will be provided as a docker image, sql-file to import into a local database or provided by us - details will be provided later. All examples and exercises will work on PostgreSQL.