Let's learn Percentile_Disc Function of SQL Server 2012 (Denali)

Niladri.Biswas
Posted by in Sql Server category on for Beginner level | Points: 250 | Views : 3919 red flag

In this article we will learn Percentile_Disc Function of SQL Server 2012 (Denali)

Introduction

Sql Server 2012 has brought a lot of new functions for the T-Sql developers. In this article we will look into Percentile_Disc Function.

Purpose

An inverse distribution function which takes a percentile value and a sort specification and returns an element from the set.It accepts any numeric data-type or any non-numeric data-type that can be implicitly converted to a numeric data-type as it's argument and returns the same data-type as the numeric data-type of the argument.

Syntax

Percentile_Disc ( numeric_literal ) within group(Order by sort_expression [asc|desc]) OVER ([partition_by_clause])

Logic behind Percentile_Disc

Well this function can be defined as under

Percentile_Disc = Cumulative_Distribution  * N 

Where,

N = Total number of rows or records.

e.g. We have a set with 10 elements. We need to find the Percentile_Disc of the 7th row.Let us first find the Cumulative_distribution as

So, Cumulative_Distribution = (7)/(10) = 0.7

So, Percentile_Disc = Cumulative_Distribution * N = 0.7 * 10 = 7

The range of values return by this function is between 0 to 1 inclusive. The return data type is always a positive number.

A case study

Let us first create the environment

 IF EXISTS (SELECT * FROM sys.objects WHERE name = N'MatchTable' AND type = 'U')
    DROP TABLE MatchTable
GO
SET ANSI_NULLS ON
GO
--Create the table
CREATE TABLE [dbo].[MatchTable](
	[MatchID] [int] IDENTITY(1,1) NOT NULL,
	[MatchGroup] [varchar](8) NULL,
	[MatchBetween] [varchar](50) NULL,
	[ScheduleDate] [date] NULL
) ON [PRIMARY]
GO

--Insert records
Insert Into MatchTable Values
('Group-A','India VS Australia','08/14/2011')
,('Group-A','India VS Pakistan','08/15/2011')
,('Group-A','India VS Newzealand','08/16/2011')
,('Group-A','Australia VS Pakistan','08/17/2011')
,('Group-A','Australia VS Newzealand','08/18/2011')
,('Group-A','Newzealand VS Pakistan','08/19/2011')
,('Group-B','USA VS WestIndies','08/20/2011')
,('Group-B','USA VS Ireland','08/21/2011')
,('Group-B','USA VS Bangaladesh','08/22/2011')
,('Group-B','WestIndies VS Ireland','08/23/2011')
,('Group-B','WestIndies VS Bangaladesh','08/24/2011')
,('Group-B','Ireland VS Bangaladesh','08/25/2011')

-- Project the records
Select * From MatchTable

Now let us run the below query

SELECT [MatchID]
      ,[MatchGroup]
      ,[MatchBetween]
      ,[ScheduleDate]
      ,PercentileDisc =PERCENTILE_DISC(.6) WITHIN GROUP (ORDER BY MatchID)OVER(PARTITION BY [MatchGroup])
      ,Cume_Dist() Over(Partition By [MatchGroup] Order By MatchID) As CumilativeDistribution
     ,PercentileDiscByFormula = Cume_Dist() Over(Partition By [MatchGroup] Order By MatchID) * (6)	 
  FROM MatchTable
 

Now let us analyze the 4th Record i.e. Match Id = 4. We have a total of 6 records in Group-A hance N = 6. And the cumulative distribution was 0.666666666666667.

Now if we put these values into the formula, we get

Percentile_Disc = 0.666666666666667 * 6 = 4.000000000000002 ~ = 4

How to implement the same in SQL SERVER 2005/2008?

Here is an attempt to simulate Percentile_Disc function in SQL Server 2005/2008

;With Count_CTE AS
(
	Select [MatchGroup] ,N = COUNT([MatchGroup])	
	From MatchTable
	Group By [MatchGroup]
)
,Rank_CTE AS
(
	Select 		
		[MatchID]
		,[MatchGroup]
		,[MatchBetween]
		,[ScheduleDate]
		,R = Rank() Over(Partition By [MatchGroup] Order by MatchID)		
	From MatchTable
)
Select r.[MatchID]
		,r.[MatchGroup]
		,r.[MatchBetween]
		,r.[ScheduleDate]
		,CumilativeDistribution = CAST((r.R)  AS DECIMAL(10,2)) / CAST((c.N ) AS DECIMAL(10,2))
		,TotalRecord = c.N
		,PercentileDisc = Round((CAST((r.R)  AS DECIMAL(10,2)) / CAST((c.N ) AS DECIMAL(10,2))) * c.N,0)
		
From Rank_CTE r
Join Count_CTE c On r.[MatchGroup] = c.[MatchGroup]

Result

First we are getting the Cumulative Distribution and Total Record Count by Match and then putting that into the formula, we are getting the result

You may need to refer to my Cumulative Distribution article for getting an understanding as what it is.

Conclusion

SQL Server 2012 (Denali) seems to be very mature and promising and has embedded with many new functions.In this article we have looked into the Percentile_Disc function, it's internal formula and how we can do the same in lower version.Hope the article will be useful.

Page copy protected against web site content infringement by Copyscape

About the Author

Niladri.Biswas
Full Name: Niladri Biswas
Member Level: Platinum
Member Status: Member
Member Since: 10/25/2010 11:04:24 AM
Country: India
Best Regards, Niladri Biswas
http://www.dotnetfunda.com
Technical Lead at HCL Technologies

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)