SQL Server Index Internals with Example | Indexes In SQL Server

Posted by Neeraj Prasad Sharma in Sql Server category on 5/28/2017 for Advance level | Points: 250 | Views : 12919

Post Article |

Search Articles |

Articles Home

In this article we will learn the internals of indexes and fragmentation

Download source code for SQL Server Index Internals with Example | Indexes In SQL Server

Introduction

This article lays the foundations for my upcoming series on which we will learn why the clustered index should be Static, Unique, ever increasing and narrow. To understand and compare we are first creating an ideal primary key (Clustered Index) remember by default SQL Server creates clustered index on the primary key.

In this article we will create an ideal Clustered Index and we will look into the internals of indexes.

This series of article can be seen as the third level of my earlier work on the dotnetfunda first level is about primary key, which was the introduction of primary key, then the second level is some question and answer on the Primary key and this is the third level. After completion of the entire series, we will link all series, for greater benefit to the SQL community.

Indexes in SQL Server

Most of the people are confused about the indexes because they don't have the exact mental picture about it. When I say people, it`s actually me six years back with shallow knowledge, my reaction use to be like on internals:

And trust me its easy topic.

Built the Table

In this first article, we are creating a table named with DBO.IdealCI, we will create a primary key, rest columns are random column, we will create some Non Clustered Index on those columns as well this is needed because it will be required to continue the series.

In the below script we are inserting 500000 rows in the DBO. IdealCI table with the help of number table. If you don`t have number table in your database, you can use this code to populate the number table.

If you have no idea about number table, then I suggest read this link to understand how powerful the number tables are to perform some development work efficiently, if you want to learn more about number table you can visit this link.

Script below for creating the sample table:

CREATE TABLE [DBO].[IdealCI] (
Primarykey int NOT NULL , 
Keycol VARCHAR(50) NOT NULL
,SearchCol INT
,SomeData char(20),

Keycol2 VARCHAR(50) NOT NULL
,SearchCol2 INT
,SomeData2 char(20),


Keycol3 VARCHAR(50) NOT NULL
,SearchCol3 INT
,SomeData3 char(20) )



INSERT INTO [DBO].[IdealCI]
SELECT 
n ,

n,
n
,'Some text..'
,
n,
n
,'Some text..'

,

n,
n
,'Some text..'

FROM Numbers


ALTER TABLE DBO.[IdealCI]
ADD CONSTRAINT PK_IdealCI PRIMARY KEY (Primarykey)


Create nonclustered index NC_IdealCI_Keycol on [IdealCI](Keycol,SearchCol,SearchCol2)

Create nonclustered index NC_IdealCI_SearchCol on [IdealCI](SearchCol,Keycol,SomeData)

Create nonclustered index NC_IdealCI_SomeData on [IdealCI](SomeData,Keycol,SearchCol)

Create nonclustered index NC_IdealCI_Keycol2 on [IdealCI](Keycol2,SearchCol,SearchCol2)

Create nonclustered index NC_IdealCI_SearchCol2 on [IdealCI](SearchCol2,Keycol,SomeData)

Create nonclustered index NC_IdealCI_SomeData2 on [IdealCI](SomeData2,Keycol,SearchCol)

Fasten the seat belt, sit back and enjoy I am going to reveal the magic :)

Little Background On Indexes (Clustered and Non Clustered Only)

Indexes are stored in the B tree structure, in the SQL Server root page which is the entry point when queries read the data* (except the allocation order scan). This b tree structure helps in performing the read the data faster as it required to touch only those data pages which only contain the requested data. Look at the image below this is the logical representation on indexes:

Note: in the above image data pages are looking bigger than the index pages, but they are not both pages are actually 8 KB in size.

Surprised to see the NULLs in the indexes, well, that’s how SQL Server has implemented the B tree, Null value is actually the starting point of the non data pages (leaf level)

For example, if storage engine has to return the row against the key value 23, first it will check in the root page, in our example, there are only two intermediate pages so there are only two values is in the root page first starting point which is null second is key value 61, because 23 is lesser than 61 so it will go to the location of null pointer this is the first step.

In second step again storage engine will check 21, since 21 key value is greater than 20 and lesser than 40. It will go to the page where 20 key values `s data page.

At the third step, it will start looking for the 21 key value when it will find it, it will return the result back to the user. Look at the image below:

Index Internals

Now we will look inside of the index using the undocumented IND command. There is a documented DMV in the SQL Server 2012 and above which shows the same result with more information, since I am using an old laptop which has SQL Server 2008 so I will use the IND command you can learn more about this IND command here and here.

----------- DBCC IND ('DBNANE', 'Table Name', Index ID);

DBCC IND ('tutorialsqlserver', 'IdealCI', 1);

Tons of information here, I am coping only required columns and pasting it into the Excel file for further investigation, you can download the copy of the excel file from the bottom of the article.

We will first investigate the root page, root page is the highest level page so we will look for higher level in our example which is 2.

Look at the page id of the root level page, we need this so let us copy it “65770”.

Let us look at the intermediate pages, in our example index level 1 is an intermediate level index. Image below:

How you will identify first and last page in the intermediate pages, simple the page id does not have any previous page it displays as zero, that is the first page of the index and the page which does not have any next page that is the last page.

In the intermediate pages there are 12 pages and the page ids is in the incremental order, all the page ids are next to each other look how the previous page and next page are in order.

However, the third page is missing from the sequence, this page is actually the root page.

In my testing I found that the third page of the child intermediate index usually is the root page.

Now let us look into the index 0.

Look into the index level 0 the leaf level or data pages.

There 6903 data pages those are sorted and linked in the incremented order that is why there is no fragmentation at all in the data pages.

SQL Server Page internals

In the next section we will go one step further and look what is inside these pages, so let us start with the Root Page using the DBCC PAGE command.

For more information about the DBCC Page Command look this link.

--Root Page
---- DBCC PAGE('DBname',1,PageId,3) WITH TABLERESULTS
DBCC PAGE('tutorialsqlserver',1,65770,3) WITH TABLERESULTS

Look at the second tab which contain all the information about the child pages in the root page, in case of root page its the child pages are next level intermediate pages. Remember, we have seen these pages id in the output of IND command, but there is some additional information we can see which primary key is associated with which child page.

There are 12 pages in the intermediate level so there are 12 rows in the output, this same info we have seen in the output of sys.dm_db_index_physical_stats DMV.

The first child of the root page will be which have column name row 0 and the last child will be the last row of the result in our case its

11. There is a primary key (key) column, which contain the information of key value,

So we can conclude from the output that Null is associated with the page ID 65768 and

Primarykey value 46321 is associated with the page id 65769. etc etc.

Null actually is the starting point of the index.

Let us look into the first child of the root page which is Null with the PageId 65768.

--first child page
DBCC PAGE('tutorialsqlserver',1,65768,3) WITH TABLERESULTS

Again, look at the first child primarykey value its NULL and the pageid associated with this id is 65704.

The next row, primarykey value is "82" and pageid is 65705, this suggest all the keys those are lesser than "82" will be able to find in Null value`s respective pageid.

For example, you are looking for value 23 what would happen:

Step 1: Storage engine will check this value in the root page (65770), since 23 is lesser than the value 46321 so storage engine will go to the page 65768.

Step 2: Again storage engine will check this value 23 on this page (65768), since 23 is lesser than the value 82 so storage engine will go to the page 65705.

Now let us go to inside of the first data child page 65705 look at the image below:

Again, look at the first child primarykey value it`s NULL and the pageid associated with this id is 65704.

The next row, primarykey value is "82" and pageid is 65705, this suggest all the keys those are lesser than "82" will be able to find in Null value`s respective pageid.

For example, you are looking for value 23 what would happen:

Step 1: Storage engine will check this value in the root page (65770), since 23 is lesser than the value 46321 so storage engine will go to the page 65768.

Step 2: Again storage engine will check this value 23 on this page (65768), since 23 is lesser than the value 82 so storage engine will go to the page 65704.

Now let us go to inside of the first data child page 65704 look at the command and result below:

DBCC PAGE('tutorialsqlserver',1, 65704,3) WITH TABLERESULTS

Because it is a data page so unlike index pages you will see only one result set, in SQL Server 2008, the actual user rows with values can seen approx from row number 49 onwards look at the image above:

Let's continue our example:

Step 3: we were looking for primarykey 28 and the storage engine on the data page so it will read the page entries and return the row against the primary key 23. Look into the image below:

So how many pages we read?

1st the root page, 2nd the intermediate page and 3 rd the data page.

So total 3 pages.

Don`t you trust me? NO. Good, because trust no one test yourself. Let us use the above example and search value 23 against primary key, and look into the output of STATISTICS IO.

SET statistics io on
Select * from [DBO].[IdealCI] where primarykey =23

If still you find indexing concepts and internal difficult to understand, Let us know in the comment section.

"Stay hungry. Stay foolish."

futhore reading

visualizing index fragmenation

sys dm db index physical stats transact sql

inside the storage engine using dbcc page and dbcc ind to find out if page splits ever roll back

more undocumented fun dbcc ind dbcc page and off row columns

sql server storage internals part 6 how to read clustered and non clustered index pages

About the Author

Full Name: Neeraj Prasad Sharma
Member Level: Bronze
Member Status: Member
Member Since: 5/13/2016 8:42:37 AM
Country: India
Contact for Free SQL Server Performance Consulting and Training for you or your Organization.

Neeraj Prasad Sharma is a SQL Server developer who started his work as a dot net programmer. He loves SQL Server query optimizer`s capability to process the queries optimally. For the last six years he has been experimenting and testing Query Optimizer default behaviour and if something goes wrong his goal is to identify the reason behind it and fix it. I write technical article here: https://www.sqlshack.com/author/neeraj/ https://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=12524731 https://www.mssqltips.com/sqlserverauthor/243/neeraj-prasad-sharma-/

Bookmark It

Login to vote for this post.

Latest Articles

Comments or Responses

Comment using

(Author doesn't get notification)