MS SQL 2005: window functions

Books and articles about SQL

Сменить язык на: Русский

25 April 2024 21:59:31

Main

Database fundamentals

Articles about MS SQL

Articles about Oracle

Articles about PostgreSQL

Additions for sql-ex.ru

• Oracle optimization

• Common articles

• Additional materials

• Errors in exercises on database "Computer firm"

• Errors in exercises on database "Ships"

• Errors in exercises on database "Recycling firm"

• Errors in exercises on database "Aeroflot"

• Database scripts

• DML exercises

• Feedback

• Developers & Thanks

MS SQL 2005: window functions

Ivan Bodjagin (Merle) (Source)
The summary by Karasyova N.V. with examples from training databases.

The main difference between analytic and scalar functions is that analytic function deals with prepared sample.

First the sample is produced, all unions, conditions WHERE, GROUP BY, HAVING are executed - all, except ordering, and only afterward the analytic function is used for the practically ready data set. That is why analytic functions may be used in query selection list or in ordering conditions only.

Generally, the work principle of analytic functions can be presented in the following way. Let us suppose we have the resulting data set derived by the way described above – all instructions are executed except ordering. So-called “window” is applied onto each record of this set.

“Window” size and position are determined in accordance with some analytic condition (this, in fact, explains the name of these functions – window functions).

Some other records from the same data set are included into this virtual window. The “window” may be formed in such way that it will include not necessarily the adjacent records, but practically any records of the data set.

And the current record, which was used to form the “window”, may not even be included.

When the “window” has been formed, analytic function calculates an aggregated value based on records in the “window”, and then goes to the next record. New “window” is formed for this record; new value is calculated, and so on for every record of data set. Size and position of the window may change from record to record.

In the strict sense, practically everything that can be produced by using analytic function can be produced using common T-SQL, but it will be less convenient and often not as efficient.

There are two types of analytical functions in MS SQL Server 2005 –aggregate and ranking functions.

Aggregate functions

Since an analytical function returns aggregate result of handling data, common aggregate function may act as analytical.

The difference is that ordinary aggregates reduce detailed elaboration, while in the case of analytical functions the level of detail is not reduced. This also relates to other types of analytic functions. Let’s compare the results of two queries (“Painting” data base).

In the first query, COUNT is a common aggregate:

SELECT B_Q_ID,B_V_ID,COUNT(*)C
FROM UTB
GROUP BY B_Q_ID,B_V_ID

Result
B_Q_ID   B_V_ID   C
-------------------
1 1   2
2 2   1
3 3   1
1 4   1
2 5        1
3 6   1
1 7   1
2 8   1
3 9   1

But in the next query, COUNT is analytic function:

SELECT B_DATETIME, B_Q_ID, B_V_ID,
COUNT(*)OVER(PARTITION BY B_Q_ID,B_V_ID) C
FROM UTB

Result
B_DATETIME            B_Q_ID B_V_ID     C
-----------------------------------------
2003-01-01 01:12:01.000 1 1 2
2003-06-23 01:12:02.000 1  1 2
2003-01-01 01:12:05.000 1  4 1
2003-01-01 01:12:08.000 1 7 1
2003-01-01 01:12:03.000 2 2 1
2003-01-01 01:12:06.000 2 5 1
2003-01-01 01:12:09.000 2 8 1
2003-01-01 01:12:04.000 3 3 1
2003-01-01 01:12:07.000 3 6 1
2003-01-01 01:12:10.000 3 9 1

We can see that the server does not warn on the column B_DATETIME in the sample, in spite of absence of this column from the aggregate function or from grouping. At least one of these conditions has to be met for “usual” aggregates - otherwise ambiguity appears. But this limitation is not applied to analytic aggregates because the level of detail is not reduced. We can see this in the second example – the result of the aggregate function is repeated for every record in the group.

Syntax is simple. The following construction is used after the function

OVER ( [ PARTITION BY value_expression , ... [ n ] ])

- where [n] is a list of fields in group without aliases or expressions. This is the way to organize a “window” for analytic function.

All records grouped by columns from list [n] are included in the “window”. This grouping does the same as GROUP BY operator, but with two distinctions.

Firstly, this grouping is applied to organized selection; secondly, it is applied only to the aggregate before construction OVER(…), not to all columns.

If we want to use two analytic functions, we must use two constructions OVER(…) separately for each function.

In the strict sense, the result of a query with an analytic function equals to the result of an “old-fashioned” query:

SELECT A.B_DATETIME,A.B_Q_ID,A.B_V_ID,C
FROM UTB A
JOIN
(SELECT B_Q_ID,B_V_ID,COUNT(*) C
FROM UTB
GROUP BY B_Q_ID,B_V_ID) B
ON A.B_Q_ID=B.B_Q_ID AND A.B_V_ID=B.B_V_ID

Ranking functions

Ranking functions are introduced with analytic queries besides of usual aggregates. These functions return the rank for every record in the “window”. Rank is the number corresponding to the position or “weight” of the record relative to other records from the same “window”. The “window” is organized in the same way as the one used for aggregate functions. But we must specify the sequence of records in the “window” by using the construction ORDER BY.

Depending on the function we use, some records may have the same rank.

Ranking functions are not determinate.

There are 4 ranking functions:

ROW_NUMBER()

This function enumerates records according to specified order in the “window”. But if we omit the section PARTITION BY in construction OVER, we would enumerate all records in specified order. The enumeration always starts from 1.

RANK()

This function is used for ranking records in the “window”, and if a column for grouping is not specified the entire sample is recognized as a “window”. This is the same enumeration that ROW_NUMBER() produces and it starts from 1. Identical records get identical numbers, and the next different record gets the same number as if ROW_NUMBER() has been used and all previous records have got their own unique numbers. So a gap is produced, which is equal to the quantity of identical records minus 1.

DENSE_RANK()

This function produces “thick” ranking, in other words it does the same without the gap in enumeration.

NTILE()

This function divides records in the “window” into specified quantity of groups. It returns the number of the group for every record that belongs to this group. Group enumeration starts from 1.

If quantity of records is not divisible by quantity of groups, two group types are produced. First group type has the quantity of records that differs from the quantity of records for second group type by 1.

The following query can be executed to demonstrate ranking functions’ differences:

SELECT B_Q_ID, B_V_ID,
  ROW_NUMBER() OVER(PARTITION BY B_Q_ID ORDER
                              BY B_V_ID DESC) N_Row,
  RANK() OVER(PARTITION BY B_Q_ID ORDER
                              BY B_V_ID DESC)RANK,
  DENSE_RANK() OVER(PARTITION BY B_Q_ID ORDER
                              BY B_V_ID DESC)DENSE_RANK,
  NTILE(2) OVER(PARTITION BY B_Q_ID ORDER BY B_V_ID DESC)NTILE
FROM UTB WHERE B_Q_ID = 4

Результат
B_Q_ID      B_V_ID  N_Row  RANK  DENSE_RANK  NTILE
----------- ------- ------ ----- ----------- ------
4           37      1      1     1           1
4           37      2      1     1           1
4           10      3      3     2           2

Main page

Print Print version

Usage of any materials of this site is possible only under condition of mandatory allocation of the direct link to a site http://www.sqlbooks.ru on each page where used materials are placed.
Main	Articles	Books