Why Clustered Index (Primary Key) key Should be as Narrow as possible

Neerajprasadsharma
Posted by in Sql Server category on for Intermediate level | Points: 250 | Views : 698 red flag

In this article we will learn why a Clustered Index should be as narrow as possible.


 Download source code for Why Clustered Index (Primary Key) key Should be as Narrow as possible

Introduction

Narrow Clustered keys are preferred because of two main reasons, first Clustered key always goes to Non-Clustered Index leaf level. If the Clustered key is very wide than it will increase the page size of all the non-clustered indexes on that table, more page size means more IO and more CPU cycles. Second, Wide Clustered Index will lead more pages even in the clustered index, even clustered Index seeks has to travel using the intermediate pages to fetch the rows.

To demonstrate this with example, we will create two tables first would be narrow clustered index and second would be undesired  not narrow Clustered Index, we will look into the fact in details why primary key should be narrow. As this article is part of the series so the ideal clustered key is already created so in this part of an article we will create another table whose Clustered index won`t be narrow then  we will compare both the ideal and not narrow indexes and try to identify why people say clustered index (Clustered Primary Key) should be narrow.

This demonstration will be in practical not just theory.
Our both tables will have the same amount of data and index, we will have a just one difference in both table and that will be the definition of a primary key.

On the ideal table, we have a primary key on one integer type column and on  notnarrow table, we will have a primary key on Primarykey, Keycol, SearchCol and SomeData columns and we have made some changes in data types as well.
Please note we are using multiple column to make the index fat, but choosing the large data type would affect in the same fashion, since it is just a small table with 500000 rows, with large and real data problem increases.


Demo Script Below:
------------------------------------CREATE DUMMY TABLE


CREATE TABLE [DBO].[NotNarrowTable] (
Primarykey BIGINT NOT NULL , 
Keycol VARCHAR(50) NOT NULL
,SearchCol INT not  null
,SomeData char(20)not  null ,

Keycol2 VARCHAR(50) NOT NULL
,SearchCol2 INT
,SomeData2 char(20),


Keycol3 VARCHAR(50) NOT NULL
,SearchCol3 INT
,SomeData3 char(20) )



------------------------------------CREATE DUMMY TABLE


------------------------------------INSERT DATA IN DUMMY TABLE

INSERT INTO [DBO].[NotNarrowTable]
SELECT 
n ,

n,
n
,'Some text..'
,
n,
n
,'Some text..'

,

n,
n
,'Some text..'

FROM Numbers

------------------------------------INSERT DATA IN DUMMY TABLE
------------------------------------ADD PRIMARY KEY IN DUMMY TABLE

ALTER TABLE DBO.NotNarrowTable
ADD CONSTRAINT PK_NotNarrowTable PRIMARY KEY (Primarykey,Keycol,SearchCol,SomeData)
------------------------------------ADD PRIMARY KEY IN DUMMY TABLE





Create nonclustered index NC_NotNarrow_Keycol on NotNarrowTable(Keycol,SearchCol,SearchCol2)

Create nonclustered index NC_NotNarrow_SearchCol on NotNarrowTable(SearchCol,Keycol,SomeData)

Create nonclustered index NC_NotNarrow_SomeData on NotNarrowTable(SomeData,Keycol,SearchCol)

Create nonclustered index NC_NotNarrow_Keycol2 on NotNarrowTable(Keycol2,SearchCol,SearchCol2)

Create nonclustered index NC_NotNarrow_SearchCol2 on NotNarrowTable(SearchCol2,Keycol,SomeData)

Create nonclustered index NC_NotNarrow_SomeData2 on NotNarrowTable(SomeData2,Keycol,SearchCol)

To demonstrate this example more clear we will create one additional non clustered index on SearchCol3 column, so let us create that extra column and see the indexes for both tables.

Create Nonclustered index NC_IdealCI_SearchCol3 on IdealCI(SearchCol3);
Create Nonclustered index NC_NotNarrow_SearchCol3 on NotNarrowTable(SearchCol3);
Indexes are created on both tables, column SearchCol3.  Now let us check the number of pages in both tables using sys.dm_db_index_physical_stats  DMV.

SELECT SUM (Page_Count *8 ) Space_Used_KB

FROM sys.dm_db_index_physical_stats 
(db_id('TutorialSQLServer'), object_id ('NotNarrowTable' ), null ,null,'DETAILED')


SELECT SUM (Page_Count *8 ) Space_Used_KB

FROM sys.dm_db_index_physical_stats 
(db_id('TutorialSQLServer'), object_id ('IdealCI' ), null ,null,'DETAILED')




Look at the size of the tables used by both the table, 233928 KB space used by NotNarrow table and  177152 KB space taken by Ideal table.
 Everything is similar in the tables, number of records, number of indexes in the table and indexes definition, there is just a one difference that both tables have different clustered index definition and that has cost us 66776 KB extra more data pages.

Now let us see the physical stats for newly create indexes of both tables.


SELECT  MAX(Name) Name ,  MAX (s.index_id) Index_ID ,SUM (Page_Count *8 ) Space_Used_KB from sys.indexes s


Cross Apply

(
SELECT 
OBJECT_ID,
OBJECT_NAME(OBJECT_ID) Table_Name, index_id,index_type_desc,index_level,
avg_fragmentation_in_percent fragmentation ,avg_page_space_used_in_percent Page_Density ,page_count,Record_count

FROM sys.dm_db_index_physical_stats 
(db_id('TutorialSQLServer'), object_id ('NotNarrowTable' ), null ,null,'DETAILED')
) ps
where 
s.object_id= ps.object_id and 
s.index_id = ps.index_id and
s.Name = 'NC_NotNarrow_SearchCol3'



SELECT MAX(Name) Name ,  MAX (s.index_id) Index_ID ,SUM (Page_Count *8 ) Space_Used_KB from sys.indexes s

Cross Apply

(
SELECT 
OBJECT_ID,
OBJECT_NAME(OBJECT_ID) Table_Name, index_id,index_type_desc,index_level,
avg_fragmentation_in_percent fragmentation ,avg_page_space_used_in_percent Page_Density ,page_count,Record_count

FROM sys.dm_db_index_physical_stats 
(db_id('TutorialSQLServer'), object_id ('IdealCI' ), null ,null,'DETAILED')
) ps
where 
s.object_id= ps.object_id and 
s.index_id = ps.index_id and
s.Name = 'NC_IdealCI_SearchCol3'




The NotNarrow table`s NC_NotNarrow_SearchCol3 index required 25904 KB of data to build the index, whereas  table IdealCI`s index NC_IdealCI_SearchCol3 required 6960 KB of data, as said above more pages = more space=negative impact on performance.
As we don`t believe in only theory we will look into the index using DBCC IND and DBCC  PAGE command.
Since index id is 8 for both newly created indexes in two tables, so let us look inside the both indexes using DBCC IND and DBCC PAGE command if you new to both you can look the last part in the series.

DBCC IND ('TutorialSQLServer', 'NotNarrowTable', 8);



To understand and interpret the result better we have copied only required columns and changed the order of columns and pasted it into an excel file, if you are using SQL Server
2012 and above, you can use this dmv, Now let us look into the first leaf level page in the index and remember first pages do not have any previous page in the index.
Now let us find the first page and see inside the first page of the index.
Image

DBCC PAGE('TutorialSQLServer',1,145808,3) WITH TABLERESULTS

Look at the result, we have created an index on just one search column, but there are four extra columns and if you look closely these four columns are part of the primary key (Unique Clustered Index) column, remember why non-clustered index has to be having all the primary key columns, just to maintain the link with the clustered index and because our primary key is not narrow that has made the non clustered index fat and that is the reason why NotNarrow table`s  non cluster index has more pages that IdealCI index.



How much this decision cost us?

In this section we will use the execution plan to see, what the Query Processor thinks about the costs of this decision, choosing the not narrow clustered index cost us, In the query we will simply count the number of rows in the both tables the look into the execution plan:

SET STATISTICS IO ON

SELECT COUNT(*) FROM NotNarrowTable WITH (INDEX (8)) 

SELECT COUNT(*) FROM IDEALCI WITH(INDEX (8))



Both plan shapes are identical, but as per estimation collected by the Query Processor, execution plan with notnarrow table cost more than double compare to dbo.ideal table and if you look closely into the cost related results, you will find everything is same in the both execution plans except the IO  cost.
On the searchcol3 index it is showing 2.38238 units and on the bottom execution plan its 0.643866 and that cost has changed the overall cost of the plan.
However, these costs are just a tool of measurement of execution plan quality,
and once on a specific machine use to be the number of seconds on a specific machine you can look at here for more details.
Now let us see the logical reads for both tables.





You can see in the in the above image NotNarrowTable table required 3239 logical reads to satisfy the query, whereas IdealCI required only 871 data pages for the same about of data.

If you have any question about this topic yo can ask in the comment section.
Non clustered indexes are a bit different than clustered index and if you want to learn more, here are two link,link you should visit written by Kalen Delaney.

 Closing lines and Summary
In the above example we have exaggerated by using a composite column primary key, to make primary key fat, choosing wrong data type can affect not dramatic as above, but it results more pages in the index. 

We have seen above how bad it can be to make a clustered index not narrow, but it is not a hard and fast rule. 
Yes, most of the time it's true, but it depend and it should be tested because your database system can be different than mine, your table structure and business needs can be different than mine, your purpose can be different than mine so always test yourself and verify. But yes, this is general recommendation.
. I am attaching the less exaggerated test int vs bigint as primary key you can download it and test and verify the effect is more and less are same as stated above.

Question For Readers:

You might have known that clustered index should be unique if it won't, storage engine will make it unique for you right? But what about non clustered index is it true for non clustered index as well?


Quote
Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven't found it yet, keep looking. Don't settle. As with all matters of the heart, you'll know when you find it. -Steve Jobs






Page copy protected against web site content infringement by Copyscape

About the Author

Neerajprasadsharma
Full Name: NEERAJ Sharma
Member Level: Bronze
Member Status: Member
Member Since: 5/13/2016 8:42:37 AM
Country: India
I write technical articles mainly on SQL Server Query Optimizer for more information look at my BIO.

Neeraj Prasad Sharma is a SQL Server developer who started his work as a dot net programmer. He loves SQL Server query optimizer`s capability to process the queries optimally. For the last six years he has been experimenting and testing Query Optimizer default behaviour and if something goes wrong his goal is to identify the reason behind it and fix it. I write technical article here: https://www.sqlshack.com/author/neeraj/ https://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=12524731 https://www.mssqltips.com/sqlserverauthor/243/neeraj-prasad-sharma-/

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)