MS SQL 2005: window functions
Ivan Bodjagin (Merle) (Source)
The summary by Karasyova
N.V. with examples from training databases.
The main difference between analytic and scalar functions is that analytic function
deals with prepared sample.
First the sample is produced, all unions, conditions WHERE, GROUP BY, HAVING are
executed - all, except ordering, and only afterward the analytic function is used
for the practically ready data set. That is why analytic functions may be used in
query selection list or in ordering conditions only.
Generally, the work principle of analytic functions can be presented in the following
way. Let us suppose we have the resulting data set derived by the way described
above Ц all instructions are executed except ordering. So-called УwindowФ is applied
onto each record of this set.
УWindowФ size and position are determined in accordance with some analytic condition
(this, in fact, explains the name of these functions Ц window functions).
Some other records from the same data set are included into this virtual window.
The УwindowФ may be formed in such way that it will include not necessarily the
adjacent records, but practically any records of the data set.
And the current record, which was used to form the УwindowФ, may not even be included.
When the УwindowФ has been formed, analytic function calculates an aggregated value
based on records in the УwindowФ, and then goes to the next record. New УwindowФ
is formed for this record; new value is calculated, and so on for every record of
data set. Size and position of the window may change from record to record.
In the strict sense, practically everything that can be produced by using analytic
function can be produced using common T-SQL, but it will be less convenient and
often not as efficient.
There are two types of analytical functions in MS SQL Server 2005 Цaggregate and
Since an analytical function returns aggregate result of handling data, common aggregate
function may act as analytical.
The difference is that ordinary aggregates reduce detailed elaboration, while in
the case of analytical functions the level of detail is not reduced. This also relates
to other types of analytic functions. LetТs compare the results of two queries (УPaintingФ
In the first query, COUNT is a common aggregate:
GROUP BY B_Q_ID,B_V_ID
B_Q_ID B_V_ID C
1 1 2
2 2 1
3 3 1
1 4 1
2 5 1
3 6 1
1 7 1
2 8 1
3 9 1
But in the next query, COUNT is analytic function:
SELECT B_DATETIME, B_Q_ID, B_V_ID,
COUNT(*)OVER(PARTITION BY B_Q_ID,B_V_ID) C
B_DATETIME B_Q_ID B_V_ID C
2003-01-01 01:12:01.000 1 1 2
2003-06-23 01:12:02.000 1 1 2
2003-01-01 01:12:05.000 1 4 1
2003-01-01 01:12:08.000 1 7 1
2003-01-01 01:12:03.000 2 2 1
2003-01-01 01:12:06.000 2 5 1
2003-01-01 01:12:09.000 2 8 1
2003-01-01 01:12:04.000 3 3 1
2003-01-01 01:12:07.000 3 6 1
2003-01-01 01:12:10.000 3 9 1
We can see that the server does not warn on the column B_DATETIME in the sample,
in spite of absence of this column from the aggregate function or from grouping.
At least one of these conditions has to be met for УusualФ aggregates - otherwise
ambiguity appears. But this limitation is not applied to analytic aggregates because
the level of detail is not reduced. We can see this in the second example Ц the
result of the aggregate function is repeated for every record in the group.
Syntax is simple. The following construction is used after the function
OVER ( [ PARTITION BY value_expression , ... [ n ] ])
- where [n] is a list of fields in group without aliases or expressions. This is
the way to organize a УwindowФ for analytic function.
All records grouped by columns from list [n] are included in the УwindowФ. This
grouping does the same as GROUP BY operator, but with two distinctions.
Firstly, this grouping is applied to organized selection; secondly, it is applied
only to the aggregate before construction OVER(Е), not to all columns.
If we want to use two analytic functions, we must use two constructions OVER(Е)
separately for each function.
In the strict sense, the result of a query with an analytic function equals to the
result of an Уold-fashionedФ query:
FROM UTB A
(SELECT B_Q_ID,B_V_ID,COUNT(*) C
GROUP BY B_Q_ID,B_V_ID) B
ON A.B_Q_ID=B.B_Q_ID AND A.B_V_ID=B.B_V_ID
Ranking functions are introduced with analytic queries besides of usual aggregates.
These functions return the rank for every record in the УwindowФ. Rank is the number
corresponding to the position or УweightФ of the record relative to other records
from the same УwindowФ. The УwindowФ is organized in the same way as the one used
for aggregate functions. But we must specify the sequence of records in the УwindowФ
by using the construction ORDER BY.
Depending on the function we use, some records may have the same rank.
Ranking functions are not determinate.
There are 4 ranking functions:
This function enumerates records according to specified order in the УwindowФ. But
if we omit the section PARTITION BY in construction OVER, we would enumerate all
records in specified order. The enumeration always starts from 1.
This function is used for ranking records in the УwindowФ, and if a column for grouping is not specified the entire sample is recognized as a УwindowФ. This is the same enumeration that ROW_NUMBER() produces and it starts from 1. Identical records get identical numbers, and the next different record gets the same number as if ROW_NUMBER() has been used and all previous records have got their own unique numbers. So a gap is produced, which is equal to the quantity of identical records minus 1.
This function produces УthickФ ranking, in other words it does the same without the gap in enumeration.
This function divides records in the УwindowФ into specified quantity of groups.
It returns the number of the group for every record that belongs to this group.
Group enumeration starts from 1.
If quantity of records is not divisible by quantity of groups, two group types are
produced. First group type has the quantity of records that differs from the quantity
of records for second group type by 1.
The following query can be executed to demonstrate ranking functionsТ differences:
SELECT B_Q_ID, B_V_ID,
ROW_NUMBER() OVER(PARTITION BY B_Q_ID ORDER
BY B_V_ID DESC) N_Row,
RANK() OVER(PARTITION BY B_Q_ID ORDER
BY B_V_ID DESC)RANK,
DENSE_RANK() OVER(PARTITION BY B_Q_ID ORDER
BY B_V_ID DESC)DENSE_RANK,
NTILE(2) OVER(PARTITION BY B_Q_ID ORDER BY B_V_ID DESC)NTILE
FROM UTB WHERE B_Q_ID = 4
B_Q_ID B_V_ID N_Row RANK DENSE_RANK NTILE
----------- ------- ------ ----- ----------- ------
4 37 1 1 1 1
4 37 2 1 1 1
4 10 3 3 2 2