This article lays the foundations for my upcoming series on which we will learn why the clustered index should be Static, Unique, ever increasing and narrow. To understand and compare we are first creating an ideal primary key (Clustered Index) remember by default SQL Server creates clustered index on the primary key.
In this article we will create an ideal Clustered Index and we will look into the internals of indexes.
And trust me its easy topic.
Built the Table
In this first article, we are creating a table named with DBO.IdealCI, we will create a primary key, rest columns are random column, we will create some Non Clustered Index on those columns as well this is needed because it will be required to continue the series.
In the below script we are inserting 500000 rows in the DBO.
IdealCI table with the help of
number table. If you don`t have number table in your database, you can use
this code to populate the number table.
If you have no idea about number table, then I suggest read this link to understand how powerful the number tables are to perform some development work efficiently, if you want to learn more about number table you can visit
this link.
Script below for creating the sample table:
CREATE TABLE [DBO].[IdealCI] (
Primarykey int NOT NULL ,
Keycol VARCHAR(50) NOT NULL
,SearchCol INT
,SomeData char(20),
Keycol2 VARCHAR(50) NOT NULL
,SearchCol2 INT
,SomeData2 char(20),
Keycol3 VARCHAR(50) NOT NULL
,SearchCol3 INT
,SomeData3 char(20) )
INSERT INTO [DBO].[IdealCI]
SELECT
n ,
n,
n
,'Some text..'
,
n,
n
,'Some text..'
,
n,
n
,'Some text..'
FROM Numbers
ALTER TABLE DBO.[IdealCI]
ADD CONSTRAINT PK_IdealCI PRIMARY KEY (Primarykey)
Create nonclustered index NC_IdealCI_Keycol on [IdealCI](Keycol,SearchCol,SearchCol2)
Create nonclustered index NC_IdealCI_SearchCol on [IdealCI](SearchCol,Keycol,SomeData)
Create nonclustered index NC_IdealCI_SomeData on [IdealCI](SomeData,Keycol,SearchCol)
Create nonclustered index NC_IdealCI_Keycol2 on [IdealCI](Keycol2,SearchCol,SearchCol2)
Create nonclustered index NC_IdealCI_SearchCol2 on [IdealCI](SearchCol2,Keycol,SomeData)
Create nonclustered index NC_IdealCI_SomeData2 on [IdealCI](SomeData2,Keycol,SearchCol)
Fasten the seat belt, sit back and enjoy I am going to reveal the magic :)
Little Background On Indexes (Clustered and Non Clustered Only)
Indexes are stored in the B tree structure, in the SQL Server root page which is the entry point when queries read the data* (except the allocation order scan). This b tree structure helps in performing the read the data faster as it required to touch only those data pages which only contain the requested data. Look at the image below this is the logical representation on indexes:

Note: in the above image data pages are looking bigger than the index pages, but they are not both pages are actually 8 KB in size.
Surprised to see the NULLs in the indexes, well, that’s how SQL Server has implemented the B tree, Null value is actually the starting point of the non data pages (leaf level)
For example, if storage engine has to return the row against the key value 23, first it will check in the root page, in our example, there are only two intermediate pages so there are only two values is in the root page first starting point which is null second is key value 61, because 23 is lesser than 61 so it will go to the location of null pointer this is the first step.
In second step again storage engine will check 21, since 21 key value is greater than 20 and lesser than 40. It will go to the page where 20 key values `s data page.
At the third step, it will start looking for the 21 key value when it will find it, it will return the result back to the user. Look at the image below:

Further readings about indexes
here.
Back to the our example, the Dbo
.idial table has built, so let us get in and see some internals of indexes, in below script will use
sys.dm_db_index_physical_stats DMV to get some index related information.
SELECT OBJECT_NAME(OBJECT_ID) Table_Name, index_id,index_type_desc,index_level,
avg_fragmentation_in_percent fragmentation ,avg_page_space_used_in_percent Page_Density ,page_count,Record_count
FROM sys.dm_db_index_physical_stats
(db_id('tutorialsqlserver'), object_id ('IdealCI' ), null ,null,'DETAILED')
where
index_type_desc='CLUSTERED INDEX'

Look at the output of the above index id it is 1 that means it`s a Clustered Index, there are 3 levels of index, level zero is always contains the data pages, avg_page_space_used_in_percent column means the average percentage of data space occupied in the pages also known as page density, Page_count column tells how many pages are in that index level and Record_count tells how many records are in that index level.
If you look closely the page_count and the record_count you will find in there are 12 records in the root level (highest index level is always the Root Page) and exactly 12 pages are in the index_level 1 (intermediate page) and index level has 6903 records that's matched to the page count to index level 0 (always data pages).
Well, that is not the coincidence, because every record in the parent level index points to its child level page.
The more important term which we will use in the series all the time is the fragmentation and is a nice article on this topic so I am taking the summary from it see below:
"There are two different types of fragmentation in SQL Server: Internal and External. Internal fragmentation is the result of index pages taking up more space than needed.
It is like having a book where some of the pages are left blank; we do not know what pages are blank until we read the entire book and same applies for SQL Server, which has to read all the pages in the index wasting extra-time and server resources in the empty pages. External fragmentation occurs when the pages are not contiguous on the index. Following the book analogy,it is like having a book where pages are not ordered in a logical way (page 1, then page 2, then page 3 and so on) causing you to go back and forward to compound the information and make sense of the reading. Heavily used tables, that contains fragmented indexes will impact your database performance.
"
If you have any question fire in the comment box.
Look at the above output again of last ran DMV again and see there is almost no internal fragmentation in the index 0 (data pages) and no need to worry about root page and intermediate pages.
Look at the external fragmentation in the above index, there is no fragmentation, its perfect, Yes zero is perfect.
Note: if your test tables showing some fragmentation then
rebuild the table and check again.
Index Internals
Now we will look inside of the index using the undocumented IND command. There is a documented DMV in the SQL Server 2012 and above which shows the same result with more information, since I am using an old laptop which has SQL Server 2008 so I will use the IND command you can learn more about this IND command
here and
here.
----------- DBCC IND ('DBNANE', 'Table Name', Index ID);
DBCC IND ('tutorialsqlserver', 'IdealCI', 1);
Tons of information here, I am coping only required columns and pasting it into the Excel file for further investigation, you can download the copy of the excel file from the bottom of the article.
We will first investigate the root page, root page is the highest level page so we will look for higher level in our example which is 2.
Look at the page id of the root level page, we need this so let us copy it “65770”.
Let us look at the intermediate pages, in our example index level 1 is an intermediate level index. Image below:
How you will identify first and last page in the intermediate pages, simple the page id does not have any previous page it displays as zero, that is the first page of the index and the page which does not have any next page that is the last page.
In the intermediate pages there are 12 pages and the page ids is in the incremental order, all the page ids are next to each other look how the previous page and next page are in order.
However, the third page is missing from the sequence, this page is actually the root page.
In my testing I found that the third page of the child intermediate index usually is the root page.
Now let us look into the index 0.
Look into the index level 0 the leaf level or data pages.
There 6903 data pages those are sorted and linked in the incremented order that is why there is no fragmentation at all in the data pages.
SQL Server Page internals
In the next section we will go one step further and look what is inside these pages, so let us start with the Root Page using the DBCC PAGE command.
For more information about the DBCC Page Command look
this link.
--Root Page
---- DBCC PAGE('DBname',1,PageId,3) WITH TABLERESULTS
DBCC PAGE('tutorialsqlserver',1,65770,3) WITH TABLERESULTS
Look at the second tab which contain all the information about the child pages in the root page, in case of root page its the child pages are next level intermediate pages. Remember, we have seen these pages id in the output of IND command, but there is some additional information we can see which primary key is associated with which child page.
There are 12 pages in the intermediate level so there are 12 rows in the output, this same info we have seen in the output of sys.dm_db_index_physical_stats DMV.
The first child of the root page will be which have column name row 0 and the last child will be the last row of the result in our case its
11. There is a primary key (key) column, which contain the information of key value,
So we can conclude from the output that Null is associated with the page ID 65768 and
Primarykey value 46321 is associated with the page id 65769. etc etc.
Null actually is the starting point of the index.
Let us look into the first child of the root page which is Null with the PageId 65768.
--first child page
DBCC PAGE('tutorialsqlserver',1,65768,3) WITH TABLERESULTS
Again, look at the first child primarykey value its NULL and the pageid associated with this id is 65704.
The next row, primarykey value is "82" and pageid is 65705, this suggest all the keys those are lesser than "82" will be able to find in Null value`s respective pageid.
For example, you are looking for value 23 what would happen:
Step 1: Storage engine will check this value in the root page (65770), since 23 is lesser than the value 46321 so storage engine will go to the page 65768.
Step 2: Again storage engine will check this value 23 on this page (65768), since 23 is lesser than the value 82 so storage engine will go to the page 65705.
Now let us go to inside of the first data child page 65705 look at the image below:

Again, look at the first child primarykey value it`s NULL and the pageid associated with this id is 65704.
The next row, primarykey value is "82" and pageid is 65705, this suggest all the keys those are lesser than "82" will be able to find in Null value`s respective pageid.
For example, you are looking for value 23 what would happen:
Step 1: Storage engine will check this value in the root page (65770), since 23 is lesser than the value 46321 so storage engine will go to the page 65768.
Step 2: Again storage engine will check this value 23 on this page (65768), since 23 is lesser than the value 82 so storage engine will go to the page 65704.
Now let us go to inside of the first data child page 65704 look at the command and result below:
DBCC PAGE('tutorialsqlserver',1, 65704,3) WITH TABLERESULTS
Because it is a data page so unlike index pages you will see only one result set, in SQL Server 2008, the actual user rows with values can seen approx from row number 49 onwards look at the image above:
Let's continue our example:
Step 3: we were looking for primarykey 28 and the storage engine on the data page so it will read the page entries and return the row against the primary key 23. Look into the image below:
So how many pages we read?
1st the root page, 2nd the intermediate page and 3 rd the data page.
So total 3 pages.
Don`t you trust me? NO. Good, because trust no one test yourself. Let us use the above example and search value 23 against primary key, and look into the output of STATISTICS IO.
SET statistics io on
Select * from [DBO].[IdealCI] where primarykey =23
If still you find indexing concepts and internal difficult to understand, Let us know in the comment section.
"Stay hungry. Stay foolish."