In this tutorial, we will see as how to use Row_Number function with lots of practicals.
Table of Content
- Introduction
- Objective
- Daily Row_Number usage
- Generate sequential numbers on the fly
- Helps in Pagination
- Helps to find duplicates
- Helps to find out rows from a selected range
- Helps to generate number table on the fly with recursive CTE
- An alternate for TOP clause
- An alternate for Distinct clause
- Helps to sort record and insert into view from the source
- Helps to find appropriate record when use in conjunction with Aggregate function treated as Analytical function
- Helps to find the last inserted row
- References
- Conclusion
Though too late in writing an article on this topic, still I feel that it's worth to have so.Because in our day to day activities, we use this function so much
that it indeed needs some special attention and in this article we will look into as how in our regular programing area the Row_Number function, which got introduce in Sql Server 2005
has make a permanent place in helping the developers to write queries in an easy and comfortable way that seemed to be a little harder when we had to simulate the
same in lower versions.
In this tutorial we will look into the day to day usages (at least the one which I encounter so far....It can be even more so suggestions are always welcome) of Row_Number function with examples.This article will not measure the performance of the two approaches but will show at least one example of doing the same in Sql Server 2000 and how easy it is to do the same using Row_Number function of Sql Server 2005.
In the following paras, we will look into the facts as where Row_Number finds it's usage
Suppose we have a table as under
Declare @t Table(Name Varchar(20),Age int)
Insert Into @t
Select 'Name1', 20 Union All Select 'Name2', 21 Union All
Select 'Name3', 22 Union All Select 'Name4', 23 Union All
Select 'Name5', 24
Select *
From @t
/* Result */
Name Age
Name1 20
Name2 21
Name3 22
Name4 23
Name5 24
Objective: To generate sequential numbers on the fly
Sql Server 2000 approach:
Case 1: Using Co-Related Subquery and an Aggregate Function
Select
(
Select Count(*)
From @t t2
Where t2.Name <= t1.Name
And t2.Age <= t1.Age
) As Rn
,t1.Name
,t1.Age
From @t t1
OR
Select
(
Select Max(1)
From @t t2
Where t2.Name <= t1.Name
And t2.Age <= t1.Age
) As Rn
,t1.Name
,t1.Age
From @t t1
OR
Select
(
Select Sum(1)
From @t t2
Where t2.Name <= t1.Name
And t2.Age <= t1.Age
) As Rn
,t1.Name
,t1.Age
From @t t1
/* Result */
Rn Name Age
1 Name1 20
2 Name2 21
3 Name3 22
4 Name4 23
5 Name5 24
Case 2: Using identity column
Declare @t Table(Rn Int Identity(1,1), Name Varchar(20),Age int)
Insert Into @t
Select 'Name1', 20 Union All Select 'Name2', 21 Union All
Select 'Name3', 22 Union All Select 'Name4', 23 Union All
Select 'Name5', 24
Select *
From @t
/* Result */
Rn Name Age
1 Name1 20
2 Name2 21
3 Name3 22
4 Name4 23
5 Name5 24
Case 3: Using inner join
Select
Rn = COUNT(*)
,t1.Name
,t1.Age
From @t t1
Join @t t2 On t1.Name >= t2.Name
Group By t1.Name,t1.Age
Order By t1.Name,t1.Age
/* Result */
Rn Name Age
1 Name1 20
2 Name2 21
3 Name3 22
4 Name4 23
5 Name5 24
Sql Server 2005 Row_Number approach
Select Rn =Row_Number() Over(Order by (Select 1)),t.*
From @t t
/* Result */
Rn Name Age
1 Name1 20
2 Name2 21
3 Name3 22
4 Name4 23
5 Name5 24
Paging is a very common implementation in most of the applications for displaying the records. Now this can be done either at the client side application or at the server side application. But doing so in the client side will increase the load on the client application as fetching the whole records set and keeping them into the memory, then choosing the records within the range will give a serious performance impact. On the other hand, if it can be done at the database side, then the client application will get only those records in which they will be interested in at that point of time and hence the client application’s performance will boost.
Let us first set up the environment for performing the demonstration.We will populate 100 records to a tblPagingExperiment table
--Drop the table tblPagingExperiment if it exists
IF OBJECT_ID('tblPagingExperiment','U') IS NOT NULL BEGIN
DROP TABLE tblPagingExperiment
END
GO
-- Create the table
Create Table tblPagingExperiment (
[Person ID] Int Identity
,[Person Name] Varchar(100)
,Age Int
,DOB Datetime
,Address Varchar(100)
)
GO
-- Populate 100 data to the table
Insert into tblPagingExperiment
Select
'Person Name' + CAST(Number AS VARCHAR)
, Number
,DATEADD(D,Number, '1900-01-01')
,'Address' + CAST(Number AS VARCHAR)
From master..spt_values
Where Type = 'p'
And Number Between 1 and 100
-- Project the records
Select *
From tblPagingExperiment
Objective: To skip first 10 rows and display next 20 rows.i.e. from row number 11 to 30
Paging using Sql Server 2000 approach
-- Variable to hold the offset value
Declare @RowSkip As int
-- Variable to hold the fetch value
Declare @RowFetch As int
--Set the value of rows to skip
Set @RowSkip = 10
--Set the value of rows to fetch
Set @RowFetch = 20
--If the #Temp object exists in the tempdb, then drop it
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
BEGIN
Drop Table #Temp
END
--Create a temporary table
Create Table #Temp
(
Rn int Identity
,[Person ID] int
,[Person Name] Varchar(50)
,Age int
,DOB datetime
,Address Varchar(100)
)
-- Insert the records into the Temporary table
Insert Into #Temp([Person ID],[Person Name],Age,DOB,Address)
Select [Person ID],[Person Name],Age,DOB,Address
From tblPagingExperiment
-- Fetch the records from 11 to 30
Select
[Person ID]
,[Person Name]
,Age
,DOB
,Address
From #Temp
Where Rn Between (@RowSkip+1) And (@RowSkip+ @RowFetch)
Paging using Sql Server 2005 Row_Number approach
-- Variable to hold the offset value
Declare @RowSkip As int
-- Variable to hold the fetch value
Declare @RowFetch As int
--Set the value of rows to skip
Set @RowSkip = 10
--Set the value of rows to fetch
Set @RowFetch = 20
;With Cte As
(
Select
rn=ROW_NUMBER()
Over(Order by (Select 1) /* generating some dummy column*/ )
,*
From tblPagingExperiment
)
-- Fetch the records from 11 to 30
Select
[Person ID]
,[Person Name]
,Age
,DOB
,Address
From Cte
Where rn Between (@RowSkip+1)And (@RowSkip+ @RowFetch)
/* Result */
Person ID Person Name Age DOB Address
11 Person Name11 11 1900-01-12 00:00:00.000 Address11
12 Person Name12 12 1900-01-13 00:00:00.000 Address12
13 Person Name13 13 1900-01-14 00:00:00.000 Address13
14 Person Name14 14 1900-01-15 00:00:00.000 Address14
15 Person Name15 15 1900-01-16 00:00:00.000 Address15
16 Person Name16 16 1900-01-17 00:00:00.000 Address16
17 Person Name17 17 1900-01-18 00:00:00.000 Address17
18 Person Name18 18 1900-01-19 00:00:00.000 Address18
19 Person Name19 19 1900-01-20 00:00:00.000 Address19
20 Person Name20 20 1900-01-21 00:00:00.000 Address20
21 Person Name21 21 1900-01-22 00:00:00.000 Address21
22 Person Name22 22 1900-01-23 00:00:00.000 Address22
23 Person Name23 23 1900-01-24 00:00:00.000 Address23
24 Person Name24 24 1900-01-25 00:00:00.000 Address24
25 Person Name25 25 1900-01-26 00:00:00.000 Address25
26 Person Name26 26 1900-01-27 00:00:00.000 Address26
27 Person Name27 27 1900-01-28 00:00:00.000 Address27
28 Person Name28 28 1900-01-29 00:00:00.000 Address28
29 Person Name29 29 1900-01-30 00:00:00.000 Address29
30 Person Name30 30 1900-01-31 00:00:00.000 Address30
Consider the below table
Declare @t Table(name varchar(20), email varchar(20))
insert into @t values ('name1', '1stEmail@email.com')
insert into @t values ('name1', '2ndEmail@email.com')
insert into @t values ('name1', '3rdEmail@email.com')
insert into @t values ('name2', '1stEmail@email.com')
insert into @t values ('name3', '1stEmail@email.com')
insert into @t values ('name3', '2ndEmail@email.com')
insert into @t values ('name4', '1stEmail@email.com')
Select *
From @t
/* Result */
name email
name1 1stEmail@email.com
name1 2ndEmail@email.com
name1 3rdEmail@email.com
name2 1stEmail@email.com
name3 1stEmail@email.com
name3 2ndEmail@email.com
name4 1stEmail@email.com
Sql server 2000 approach to find the duplicate email entries
SELECT name, EmailCount = count(email)
FROM @t
GROUP BY name
HAVING count(*) > 1
/* Result */
name EmailCount
name1 3
name3 2
Sql server 2005 Row_Number approach to find the duplicate email entries i.e. Persons having multiple emails
Select distinct name,count(email)+1
From
(
Select
Rn = Row_Number() Over(Partition By name Order By (Select 1))
,name,email
From @t
)x
Where Rn > 1
Group By name
/* Result */
name EmailCount
name1 3
name3 2
We are using the partition by clause to perform the partition which will yield the below
Rn name email
1 name1 1stEmail@email.com
2 name1 2ndEmail@email.com
3 name1 3rdEmail@email.com
1 name2 1stEmail@email.com
1 name3 1stEmail@email.com
2 name3 2ndEmail@email.com
1 name4 1stEmail@email.com
From the result generated we can make out that, Rn with value > 1 has multiple entries.
Suppose we have a table as under
Declare @t Table(Name Varchar(20),Age int)
Insert Into @t
Select 'Name1', 20 Union All Select 'Name2', 21 Union All
Select 'Name3', 22 Union All Select 'Name4', 23 Union All
Select 'Name5', 24
Select *
From @t
/* Result */
Name Age
Name1 20
Name2 21
Name3 22
Name4 23
Name5 24
Objective: To select the third and fourth row
Sql Server 2000 approach:
Select Name,Age
From(
Select
(
Select Count(*)
From @t t2
Where t2.Name <= t1.Name
And t2.Age <= t1.Age
) As Rn
,t1.Name
,t1.Age
From @t t1
)x
Where x.Rn Between 3 And 4
/* Result */
Name Age
Name3 22
Name4 23
Sql Server 2005 Row_Number approach:
Select Name,Age
From(
Select Rn = Row_Number() Over(Order By (select 1)),*
From @t
)x
Where x.Rn Between 3 And 4
First we are generating the sequence in the inner query and then filtering them out in the outer query
Sometimes we need to generate a Number table or Tally table. The importance of that can be found here
In Sql Server 2000, we use to do so by using While loop
--If the #Temp object exists in the tempdb, then drop it
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
BEGIN
Drop Table #Temp
END
--Create a temporary table
Create Table #Temp
(
Number int
)
Declare @i int,@maxLimit int
Set @i = 1
Set @maxLimit = 100
-- Polutae the table using while loop
While( @i <= @maxLimit)
Begin
Insert Into #Temp(Number) Values(@i)
Set @i = @i + 1
End
Select *
From #Temp
Sql Server 2005 Row_Number approach with Recursive CTE to generate Tally/Number table:
;With NumTable As
(
Select Rn = Row_Number() Over ( Order By (Select 1))
Union All
Select Rn + 1
From NumTable Where Rn < 100
)
Select *
From NumTable
To get the Top N records, we use the TOP fucntion for achieving so till Sql Server 2000
Declare @t Table(Name Varchar(20),Age int)
Insert Into @t
Select 'Name1', 20 Union All Select 'Name2', 21 Union All
Select 'Name3', 22 Union All Select 'Name4', 23 Union All
Select 'Name5', 24
Select Top 2 *
From @t
/* Result */
Name Age
Name1 20
Name2 21
Fetches the Top 2 records
We can use Row_Numnber as an alternative way to of TOP
Select Name,Age
From(
Select Rn = Row_Number() Over(Order By (select 1)),*
From @t
)x
Where x.Rn <= 2
/* Result */
Name Age
Name1 20
Name2 21
Consider the below table
Declare @t Table(Name Varchar(20),Age int)
Insert Into @t
Select 'Name1', 20 Union All Select 'Name1', 20 Union All
Select 'Name2', 22 Union All Select 'Name2', 22 Union All
Select 'Name3', 22 Union All Select 'Name3', 23
Select *
From @t
/* Result */
Name Age
Name1 20
Name1 20
Name2 22
Name2 22
Name3 22
Name3 23
Sql server 2000 approach to get distinct records
Select
Distinct Name,Age
From @t
/* Result */
Name Age
Name1 20
Name1 21
Name2 22
Name3 22
Name3 23
Sql server 2005 Row_Number approach to simulate the Distinct clause
Select Name,Age
From
(Select
Rn = Row_Number() Over(Partition By Name,Age Order By (Select 1))
,Name,Age
From @t)x
Where Rn = 1
/* Result */
Name Age
Name1 20
Name1 21
Name2 22
Name3 22
Name3 23
We are using the partition by clause to perform the partition which will yield the below
Rn Name Age
1 Name1 20
1 Name1 21
1 Name2 22
2 Name2 22
1 Name3 22
1 Name3 23
From the result generated we can make out that, Rn with value 1 are the only unique records.
Suppose we want to insert some record into a View from a physical table after performing a sort operation as shown under
Create View TestView
As
Select Geo_Id, Description
From dbo.Geo
Order By Description
It throws error
Msg 1033, Level 15, State 1, Procedure TestView, Line 5
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
We can eliminate this by using a Row_Number funciton at this juncture
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[TestView]'))
DROP VIEW [dbo].[TestView]
GO
Create View TestView
As
Select Geo_Id, Description
From(
Select
Rn = Row_Number() Over(Order By Description)
,Geo_Id
, Description
From dbo.Geo
)x
GO
-- Project the records
Select *
From TestView
Suppose we have a requirement where we need to calculate teh points earned by an individual in a game as display the Name,TotalPointsEarned and EmailAddress as shown under
Declare @t Table(Name varchar(20), PointsEarned int,EmailAddress varchar(50))
insert into @t values ('name1', 30, '1stEmail@email.com')
insert into @t values ('name1', 10, '2ndEmail@email.com')
insert into @t values ('name1', 50, '3rdEmail@email.com')
insert into @t values ('name2',60, '1stEmail@email.com')
insert into @t values ('name3',70, '1stEmail@email.com')
insert into @t values ('name3',10, '2ndEmail@email.com')
insert into @t values ('name4',34, '1stEmail@email.com')
Select
Name
,TotalPointEarned = Sum(PointsEarned) Over(Partition By Name)
,EmailAddress
From @t
/* Result */
Name TotalPointEarned EmailAddress
name1 90 1stEmail@email.com
name1 90 2ndEmail@email.com
name1 90 3rdEmail@email.com
name2 60 1stEmail@email.com
name3 80 1stEmail@email.com
name3 80 2ndEmail@email.com
name4 34 1stEmail@email.com
As can be figure out that though the expected result is obtained, but the records are been repeated.Inorder to get the desired output, we need the help of Row_Number function as under
;With Cte As
(
Select
Rn = Row_Number() Over(Partition By Name Order By(Select 1))
,Name
,TotalPointEarned = Sum(PointsEarned) Over(Partition By Name)
,EmailAddress
From @t
)
Select
Name
,TotalPointEarned
,EmailAddress
From Cte
Where Rn = 1
/* Result */
Name TotalPointEarned EmailAddress
name1 90 1stEmail@email.com
name2 60 1stEmail@email.com
name3 80 1stEmail@email.com
name4 34 1stEmail@email.com
Inside the CTE we use the Row_Number function to get the records partitioned by Name where we are able to figure out the dulplicate entries(Rn > 1)
Rn Name TotalPointeEarned EmailAddress
1 name1 90 1stEmail@email.com
2 name1 90 2ndEmail@email.com
3 name1 90 3rdEmail@email.com
1 name2 60 1stEmail@email.com
1 name3 80 1stEmail@email.com
2 name3 80 2ndEmail@email.com
1 name4 34 1stEmail@email.com
We just filteredd out those records outside the Cte
N.B.~ We have seen that, only the first Emails has come in the output and leaving the others.Interesting readers will find the solution to do the same.Leaving it as an exercise.Hint: Email should be comma separated.
Select LastInsertedRow = Max(Rn)
From(
Select Rn = Row_Number()Over(Order By (Select 1))
From @t
)x
Other alternatives may be MAX,Count(*),Output.Inserted,Scope_Identity(), @@Identity etc.
ROW_NUMBER
So in this tutorial, we have seen atleast ten usages of Row_Number() function.Hope this will be helpful for many people.I am not claiming that there cannot be more applicability of this function.Astute readers please drop a line of comment of other usages for this function and I will include that in this list.
Thanks for reading