SQL Server Index Internals with Example | Indexes In SQL Server

Neerajprasadsharma
Posted by in Sql Server category on for Advance level | Points: 250 | Views : 9376 red flag

In this article we will learn the internals of indexes and fragmentation


 Download source code for SQL Server Index Internals with Example | Indexes In SQL Server

Introduction

This article lays the foundations for my upcoming series on which we will learn why the clustered index should be Static, Unique,  ever increasing and narrow.  To understand and compare we are first creating an ideal primary key (Clustered Index) remember by default SQL Server creates clustered index on the primary key.
In this article we will create an ideal Clustered Index and we will look into the internals of indexes.
This series of article can be seen as the third  level of my earlier work on the dotnetfunda first level is about primary key, which was the introduction of primary key, then the second level is some question and answer on the Primary key and this is the third level. After completion of the entire series, we will link all series, for greater benefit to the SQL community.


Indexes in SQL Server

Most of the people are confused about the indexes because they don't have the exact mental picture about it. When I say people, it`s actually me six years back with shallow knowledge, my reaction use to be like on internals:


And trust me its easy topic.

Built the Table 
In this first article, we are creating a table named with DBO.IdealCI, we will create a primary key,  rest columns are random column, we will create some Non Clustered Index on those columns as well this is needed because it will be required to continue the series.
In the below script we are inserting 500000 rows in the DBO. IdealCI  table with the help of number table. If you don`t have number table in your database, you can use this code to populate the number table.


If you have no idea about number table, then I suggest read this link to understand how powerful the number tables  are to perform some development work efficiently, if you want to learn more about number table you can visit this link.

Script below for creating the sample table:


CREATE TABLE [DBO].[IdealCI] (
Primarykey int NOT NULL , 
Keycol VARCHAR(50) NOT NULL
,SearchCol INT
,SomeData char(20),

Keycol2 VARCHAR(50) NOT NULL
,SearchCol2 INT
,SomeData2 char(20),


Keycol3 VARCHAR(50) NOT NULL
,SearchCol3 INT
,SomeData3 char(20) )



INSERT INTO [DBO].[IdealCI]
SELECT 
n ,

n,
n
,'Some text..'
,
n,
n
,'Some text..'

,

n,
n
,'Some text..'

FROM Numbers


ALTER TABLE DBO.[IdealCI]
ADD CONSTRAINT PK_IdealCI PRIMARY KEY (Primarykey)


Create nonclustered index NC_IdealCI_Keycol on [IdealCI](Keycol,SearchCol,SearchCol2)

Create nonclustered index NC_IdealCI_SearchCol on [IdealCI](SearchCol,Keycol,SomeData)

Create nonclustered index NC_IdealCI_SomeData on [IdealCI](SomeData,Keycol,SearchCol)

Create nonclustered index NC_IdealCI_Keycol2 on [IdealCI](Keycol2,SearchCol,SearchCol2)

Create nonclustered index NC_IdealCI_SearchCol2 on [IdealCI](SearchCol2,Keycol,SomeData)

Create nonclustered index NC_IdealCI_SomeData2 on [IdealCI](SomeData2,Keycol,SearchCol)


Fasten the seat belt, sit back and enjoy I am going to reveal the magic :)


Little Background On  Indexes (Clustered and Non Clustered Only)

Indexes are stored in the B tree structure, in the SQL Server root page which is the entry point when queries read the data* (except the allocation order scan). This b tree structure helps in performing the read the data faster as it required to touch only those data pages which only contain the requested data.  Look at the image below this is the logical representation on indexes:


Note: in the above image data pages are looking bigger than the index pages, but they are not both pages are actually 8 KB in size.

Surprised to see the NULLs in the indexes, well, that’s how SQL Server has implemented the B tree, Null value is actually the starting point of the non data pages (leaf level)
For example, if storage engine has to return the row against the key value 23, first it will check in the root page, in our example, there are only two intermediate pages so there are only two values is in the root page first starting point which is null second is key value 61, because 23 is lesser than 61 so it will go to the location of null pointer this is the first step. 
In second step again storage engine will check 21, since 21 key value is greater than 20 and lesser than 40. It will go to the page where 20 key values `s data page.
At the third step, it will start looking for the 21 key value when it will find it, it will return the result back to the user. Look at the image below:

Further readings about indexes here.

Back to the our example, the Dbo.idial table has built, so let us get in and see some internals of indexes, in below script will use sys.dm_db_index_physical_stats DMV to get some index related information.

SELECT OBJECT_NAME(OBJECT_ID) Table_Name, index_id,index_type_desc,index_level,
avg_fragmentation_in_percent fragmentation ,avg_page_space_used_in_percent Page_Density ,page_count,Record_count
FROM sys.dm_db_index_physical_stats 
(db_id('tutorialsqlserver'), object_id ('IdealCI' ), null ,null,'DETAILED')

where 
index_type_desc='CLUSTERED INDEX'



Look at the output of the above index id it is 1 that means it`s a Clustered Index, there are 3 levels of index, level zero is always contains the data pages, avg_page_space_used_in_percent column means the average percentage of data space occupied in the pages also known as page density, Page_count column tells how many pages are in that index level and Record_count tells how many records are in that index level.
If you look closely the page_count and the  record_count you will find in there are 12 records in the root level (highest index level is always the Root Page) and exactly 12 pages are in the  index_level  1 (intermediate page) and index level  has 6903 records that's matched to the page count to index level 0 (always data pages).
Well, that is not the coincidence, because  every record in the parent level index points to its child level page.
The more important term which we will use in the series all the time is the fragmentation and  is a nice article on this topic so I am taking the summary from it see below:

"There are two different types of fragmentation in SQL Server: Internal and External.  Internal fragmentation is the result of index pages taking up more space than needed.
  It is like having a book where some of the pages are left blank; we do not know what pages are blank until we read the entire book and same applies for SQL Server, which has to read all the pages in the index wasting extra-time and server resources in the empty pages. External fragmentation occurs when the pages are not contiguous on the index. Following the book analogy,it is like having a book where pages are not ordered in a logical way (page 1, then page 2, then page 3 and so on) causing you to go back and forward to compound the information and make sense of the reading. Heavily used tables, that contains fragmented indexes will impact your database performance. 
"
For more details please visit the article

If you have any question fire in the comment box.
Look at the above output again of last ran DMV again and see there is almost no internal fragmentation in the index 0 (data pages) and no need to worry about root page and intermediate pages.
Look at the external fragmentation in the above index, there is no fragmentation, its perfect, Yes zero is perfect.
Note: if your test tables showing some fragmentation then rebuild the table and check again.

Index Internals   

Now we will look inside of the index using the undocumented IND command. There is a documented DMV in the SQL Server 2012 and above which shows the same result with more information, since I am using an old laptop which has SQL Server 2008 so I will use the IND command you can learn more about this IND command here and here.


----------- DBCC IND ('DBNANE', 'Table Name', Index ID);

DBCC IND ('tutorialsqlserver', 'IdealCI', 1);



Tons of information here, I am coping only required columns and pasting it into the Excel file for further investigation, you can download the copy of the excel file from the bottom of the article.
We will first investigate the root page, root page is the highest level page so we will look for higher level in our example which is 2.



Look at the page id of the root level page, we need this so let us copy it “65770”.
Let us look at the intermediate pages, in our example index level 1 is an intermediate level index. Image below:




How you will identify first and last page in the intermediate pages, simple the page id does not have any previous page it displays as zero, that is the first page of the index and the page which does not have any next page that is the last page.
 In the intermediate pages there are 12 pages and the page ids is in the incremental order, all the page ids are next to each other look how the previous page and next page are in order.
However, the third page is missing from the sequence, this page is actually the root page.
In my testing I found that the third page of the child intermediate index usually is  the root page.
Now let us look into the index 0.




Look into the index level 0 the leaf level or data pages.
 There 6903 data pages those are sorted and linked in the incremented order that is why there is no fragmentation at all  in the data pages.

SQL Server Page internals


In the next section we will go one step further and look what is inside these pages, so let us start with the Root Page using the DBCC PAGE command.
For more information about the DBCC Page Command look this link.


--Root Page
---- DBCC PAGE('DBname',1,PageId,3) WITH TABLERESULTS
DBCC PAGE('tutorialsqlserver',1,65770,3) WITH TABLERESULTS


Look at the second tab which contain all the information about the child pages in the root page, in case of root page its the child pages are next level intermediate pages. Remember, we have seen these pages id in the output of IND command, but there is some additional information  we can see which primary key is associated with which child page.
There are 12 pages in the intermediate level so there are 12 rows in  the output, this same info we have seen in the output of sys.dm_db_index_physical_stats DMV.
The first child of the root page will be which have column name row 0 and the last child will be the last row of the result in our case its
11. There is a primary key (key) column, which contain the information of key value,
So we can conclude from the output that Null is associated with the page ID 65768 and 
Primarykey value 46321 is associated with the page id 65769. etc etc.
Null actually is the starting point of the index. 


Let us look into the first child of the root page which is Null with the PageId  65768.

--first child page
DBCC PAGE('tutorialsqlserver',1,65768,3) WITH TABLERESULTS




Again, look at the first child primarykey value its NULL and the pageid associated with this id is  65704.
The next row,  primarykey value is "82" and pageid is 65705, this suggest all the keys those are lesser than "82" will be able to find in Null value`s respective pageid.

For example, you are looking for value 23 what would happen:
Step 1:   Storage engine will check this value in the root page (65770), since 23 is lesser than the value 46321 so storage engine will go to the page 65768.
Step 2:  Again storage engine will check this value 23 on this   page (65768), since 23 is lesser than the value 82 so storage engine will go to the page 65705.

Now let us go to inside of the first data child page 65705 look at the image below:



Again, look at the first child primarykey value it`s NULL and the pageid associated with this id is  65704.
The next row,  primarykey value is "82" and pageid is 65705, this suggest all the keys those are lesser than "82" will be able to find in Null value`s respective pageid.

For example, you are looking for value 23 what would happen:
Step 1:   Storage engine will check this value in the root page (65770), since 23 is lesser than the value 46321 so storage engine will go to the page 65768.
Step 2:  Again storage engine will check this value 23 on this   page (65768), since 23 is lesser than the value 82 so storage engine will go to the page 65704.

Now let us go to inside of the first data child page 65704 look at the command and result below:

DBCC PAGE('tutorialsqlserver',1, 65704,3) WITH TABLERESULTS


Because it is a data page so unlike index pages you will see only one result set, in SQL Server 2008, the actual user rows with values can seen  approx from row number 49 onwards look at the image above:


Let's continue our example:
Step 3:  we were looking for primarykey 28 and the storage engine on the data page so it will read the page entries and return the row against the primary key 23. Look into the image below:



So how many pages we read?
1st the root page, 2nd the intermediate page and 3 rd the data page.
 So total 3 pages.
Don`t you trust me? NO. Good, because trust no one test yourself. Let us use the above example and search value 23 against primary key, and look into the output of STATISTICS IO.

SET statistics io on
Select * from [DBO].[IdealCI] where primarykey =23

If still you find indexing concepts and internal difficult to understand, Let us know in the comment section.


 "Stay hungry. Stay foolish."







Page copy protected against web site content infringement by Copyscape

About the Author

Neerajprasadsharma
Full Name: Neeraj Prasad Sharma
Member Level: Bronze
Member Status: Member
Member Since: 5/13/2016 8:42:37 AM
Country: India
Contact for Free SQL Server Performance Consulting and Training for you or your Organization.

Neeraj Prasad Sharma is a SQL Server developer who started his work as a dot net programmer. He loves SQL Server query optimizer`s capability to process the queries optimally. For the last six years he has been experimenting and testing Query Optimizer default behaviour and if something goes wrong his goal is to identify the reason behind it and fix it. I write technical article here: https://www.sqlshack.com/author/neeraj/ https://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=12524731 https://www.mssqltips.com/sqlserverauthor/243/neeraj-prasad-sharma-/

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)