Main page
MS SQL 2005: window functions
Ivan Bodjagin (Merle) (Source )
The summary by Karasyova
N.V. with examples from training databases.
The main difference between analytic and scalar functions is that analytic function
deals with prepared sample.
First the sample is produced, all unions, conditions WHERE, GROUP BY, HAVING are
executed - all, except ordering, and only afterward the analytic function is used
for the practically ready data set. That is why analytic functions may be used in
query selection list or in ordering conditions only.
Generally, the work principle of analytic functions can be presented in the following
way. Let us suppose we have the resulting data set derived by the way described
above – all instructions are executed except ordering. So-called “window” is applied
onto each record of this set.
“Window” size and position are determined in accordance with some analytic condition
(this, in fact, explains the name of these functions – window functions).
Some other records from the same data set are included into this virtual window.
The “window” may be formed in such way that it will include not necessarily the
adjacent records, but practically any records of the data set.
And the current record, which was used to form the “window”, may not even be included.
When the “window” has been formed, analytic function calculates an aggregated value
based on records in the “window”, and then goes to the next record. New “window”
is formed for this record; new value is calculated, and so on for every record of
data set. Size and position of the window may change from record to record.
In the strict sense, practically everything that can be produced by using analytic
function can be produced using common T-SQL, but it will be less convenient and
often not as efficient.
There are two types of analytical functions in MS SQL Server 2005 –aggregate and
ranking functions.
Aggregate functions
Since an analytical function returns aggregate result of handling data, common aggregate
function may act as analytical.
The difference is that ordinary aggregates reduce detailed elaboration, while in
the case of analytical functions the level of detail is not reduced. This also relates
to other types of analytic functions. Let’s compare the results of two queries (“Painting”
data base).
In the first query, COUNT is a common aggregate:
SELECT B_Q_ID,B_V_ID,COUNT(*)C
FROM UTB
GROUP BY B_Q_ID,B_V_ID
Result
B_Q_ID B_V_ID C
-------------------
1 1 2
2 2 1
3 3 1
1 4 1
2 5 1
3 6 1
1 7 1
2 8 1
3 9 1
But in the next query, COUNT is analytic function:
SELECT B_DATETIME, B_Q_ID, B_V_ID,
COUNT(*)OVER(PARTITION BY B_Q_ID,B_V_ID) C
FROM UTB
Result
B_DATETIME B_Q_ID B_V_ID C
-----------------------------------------
2003-01-01 01:12:01.000 1 1 2
2003-06-23 01:12:02.000 1 1 2
2003-01-01 01:12:05.000 1 4 1
2003-01-01 01:12:08.000 1 7 1
2003-01-01 01:12:03.000 2 2 1
2003-01-01 01:12:06.000 2 5 1
2003-01-01 01:12:09.000 2 8 1
2003-01-01 01:12:04.000 3 3 1
2003-01-01 01:12:07.000 3 6 1
2003-01-01 01:12:10.000 3 9 1
We can see that the server does not warn on the column B_DATETIME in the sample,
in spite of absence of this column from the aggregate function or from grouping.
At least one of these conditions has to be met for “usual” aggregates - otherwise
ambiguity appears. But this limitation is not applied to analytic aggregates because
the level of detail is not reduced. We can see this in the second example – the
result of the aggregate function is repeated for every record in the group.
Syntax is simple. The following construction is used after the function
OVER ( [ PARTITION BY value_expression , ... [ n ] ])
- where [n] is a list of fields in group without aliases or expressions. This is
the way to organize a “window” for analytic function.
All records grouped by columns from list [n] are included in the “window”. This
grouping does the same as GROUP BY operator, but with two distinctions.
Firstly, this grouping is applied to organized selection; secondly, it is applied
only to the aggregate before construction OVER(…), not to all columns.
If we want to use two analytic functions, we must use two constructions OVER(…)
separately for each function.
In the strict sense, the result of a query with an analytic function equals to the
result of an “old-fashioned” query:
SELECT A.B_DATETIME,A.B_Q_ID,A.B_V_ID,C
FROM UTB A
JOIN
(SELECT B_Q_ID,B_V_ID,COUNT(*) C
FROM UTB
GROUP BY B_Q_ID,B_V_ID) B
ON A.B_Q_ID=B.B_Q_ID AND A.B_V_ID=B.B_V_ID
Ranking functions
Ranking functions are introduced with analytic queries besides of usual aggregates.
These functions return the rank for every record in the “window”. Rank is the number
corresponding to the position or “weight” of the record relative to other records
from the same “window”. The “window” is organized in the same way as the one used
for aggregate functions. But we must specify the sequence of records in the “window”
by using the construction ORDER BY.
Depending on the function we use, some records may have the same rank.
Ranking functions are not determinate.
There are 4 ranking functions:
ROW_NUMBER()
This function enumerates records according to specified order in the “window”. But
if we omit the section PARTITION BY in construction OVER, we would enumerate all
records in specified order. The enumeration always starts from 1.
RANK()
This function is used for ranking records in the “window”, and if a column for grouping is not specified the entire sample is recognized as a “window”. This is the same enumeration that ROW_NUMBER() produces and it starts from 1. Identical records get identical numbers, and the next different record gets the same number as if ROW_NUMBER() has been used and all previous records have got their own unique numbers. So a gap is produced, which is equal to the quantity of identical records minus 1.
DENSE_RANK()
This function produces “thick” ranking, in other words it does the same without the gap in enumeration.
NTILE()
This function divides records in the “window” into specified quantity of groups.
It returns the number of the group for every record that belongs to this group.
Group enumeration starts from 1.
If quantity of records is not divisible by quantity of groups, two group types are
produced. First group type has the quantity of records that differs from the quantity
of records for second group type by 1.
The following query can be executed to demonstrate ranking functions’ differences:
SELECT B_Q_ID, B_V_ID,
ROW_NUMBER() OVER(PARTITION BY B_Q_ID ORDER
BY B_V_ID DESC) N_Row,
RANK() OVER(PARTITION BY B_Q_ID ORDER
BY B_V_ID DESC)RANK,
DENSE_RANK() OVER(PARTITION BY B_Q_ID ORDER
BY B_V_ID DESC)DENSE_RANK,
NTILE(2) OVER(PARTITION BY B_Q_ID ORDER BY B_V_ID DESC)NTILE
FROM UTB WHERE B_Q_ID = 4
Ðåçóëüòàò
B_Q_ID B_V_ID N_Row RANK DENSE_RANK NTILE
----------- ------- ------ ----- ----------- ------
4 37 1 1 1 1
4 37 2 1 1 1
4 10 3 3 2 2
Main page