In this article you will have a clearer understanding about fragmentation in SQL Server, how it occurs, what are the causes and some problem related to it.
This article is dedicated to SQL Server fragmentation only, if you are following the current long running series, then you know, we wrote in the index internals article about basics of fragmentation, but as the series grows, we need to understand everything about fragmentation as it will help understand the complete series.
Background
Like all my articles, this article will be filled with practical based knowledge only. I will start with the same article I linked when I was trying to explain about SQL Server fragmentation so here as well, please read below.
"There are two different types of fragmentation in SQL Server: Internal and External. Internal fragmentation is the result of index pages taking up more space than needed.
It is like having a book where some of the pages are left blank; we do not know what pages are blank until we read the entire book and the same applies to SQL Server, which has to read all the pages in the index wasting extra-time and server resources in the empty pages. External fragmentation occurs when the pages are not contiguous on the index. Following the book analogy,it is like having a book where pages are not ordered in a logical way (page 1, then page 2, then page 3 and so on) causing you to go back and forward to compound the information and make sense of the reading. Heavily used tables, that contains fragmented indexes will impact your database performance. "
Ok, let us try to understand it with an example, suppose in SQL Server a page can hold only 6 pages, for every 7th row Storage Engine has to assign a new page in the storage devices.
The storage engines always try to insert the SQL server index data page in a contiguous sequential manner, for example:
Suppose a user has a unique index on the ID column and insert data like 2,4,6,8,10,12,16,18,20,22,24,26,28,30,32 then most probably you won't see any external fragmentation (for efficient, faster insert Some time storage engine can create a little external fragmentation, but that is negligible) because Storage Engine takes care of that, but still you will find little internal fragmentation because when the some pages are Not full by the data, technically it is internally fragmented right?
Look into the image below:
Look at how the data in the index are nicely continuing
physically, but what if user insert 3 in the index?
There would be two options first inserts the row into the index at the right
place and rearrange all the index accordingly.
That would be certainly very-very expensive approach, so the better approach
would be to allocate a new page and link
it to Page 1 and 3, move the data accordingly.
That is what Storage
Engine does it allocate a new page for index, move approx half of rows in the new page and link it back.
Look at the image below:
User inserted row 3 in the index, since id 3 should be in
the page 1 and there is no space left on the page so storage engine has allocated page id 4 and
move half of the data in the page id 4. Now technically the index has fragmented internally and
externally.
How internally? Because first page which has a capacity of
containing 5 rows now only contains 3 rows in page id 1 and 4, and externally
because now the pages are not physically contiguous, so if a user requested
rows in the key order, storage engine has to move forward and then come
backward to get IDs in order.
Thus, when the index is
externally fragmented you will have random IO and random IOs are pretty slower
than sequential IOs specially in the conventional rotating disks. I am quoting
BOL on that and want readers to visit the link on that
"Accessing data sequentially is much faster than accessing it
randomly because of the way in which the disk hardware works. The seek
operation, which occurs when the disk head positions itself at the right disk
cylinder to access data requested, takes more time than any other part of the
I/O process. Because reading randomly involves a higher number of seek
operations than does sequential reading, random reads deliver a lower rate of
throughput"
Set up the table
to test fragmentation in
SQL Server
We will try to use the almost same example, what we tried to
explain above, but we will create a large table because Storage Engine doesn`t
take care of fragmentation in the small indexes for obvious reasons.
As we used the number table to
populate the table in this series, again we will use the number table to
populate the table, but this time the table will have even rows to match the
above example.
Script below:
CREATE TABLE [DBO].Fragmented (
Primarykey int NOT NULL,
Keycol VARCHAR(50) NOT NULL
,SearchCol INT
,SomeData char(20),
Keycol2 VARCHAR(50) NOT NULL
,SearchCol2 INT
,SomeData2 char(20),
Keycol3 VARCHAR(50) NOT NULL
,SearchCol3 INT
,SomeData3 char(20))
GO
INSERT INTO [DBO].Fragmented
SELECT
n,
n,
n
,'Some text..'
,
n,
n
,'Some text..'
,
n,
n
,'Some text..'
FROM Numbers WHERE N/2.0=N/2
GO
CREATE UNIQUE CLUSTERED INDEX CI_Fragmented ON [DBO].Fragmented (Primarykey)
GO
Ok, the table has created with even clustered index IDs so
first look at the external fragmentation and internal fragmentation using sys.dm_db_index_physical_stats DMV, in this DMV external fragmentation shows
as “avg_fragmentation_in_percent “column and internal fragmentation show as
“avg_page_space_used_in_percent”.
Let us run the query against our newly created table and
look at the fragmentation.
SELECT OBJECT_NAME(OBJECT_ID) Table_Name, index_id,index_type_desc,index_level,
avg_fragmentation_in_percent fragmentation, avg_page_space_used_in_percent Page_Density, page_count,Record_count
FROM sys.dm_db_index_physical_stats
(db_id('tutorialsqlserver'), object_id ('Fragmented'), null, null,'DETAILED')
Where
Index_Level =0 and Index_Id=1
There is no external fragmentation in the index and even all
pages all almost fully occupied nicely.
So, we can start. Earlier we use to paste the DBCC IND
command output into the excel find and then we use to filter the results
accordingly to the requirement, but
since I as passionate in SQL server not in Excel so I decided to create
a table with a couple of stored procedures to extract the result of DBCC IND
command and insert it in the INDOUTPUT
table.
You can get the script from here. So, let us get the DBCC IND command result in the
INDOUTPUT table using our stored procedure "REFRESH_INDOUTPUT".
EXEC REFRESH_INDOUTPUT 'TutorialSQLServer', 'Fragmented', 1
Ok, great the table has refreshed, now let
us look into the first three data pages in the index.
Select Table_Name, PrevPagePID, PagePID, NextPagePID from INDOUTPUT
Where IndexLevel = 0 and PrevPagePID = 0
Select Table_Name, PrevPagePID, PagePID, NextPagePID from INDOUTPUT
Where PagePID=264129
Select Table_Name, PrevPagePID, PagePID, NextPagePID from INDOUTPUT
Where PagePID=264130
Look into the above result all the first three
page id and next
to each other this is
the same condition as we described in the introduction section.Let us go into one step deeper and
look into the index pages using DBCC IND
COMMAND.
DBCC PAGE ('TutorialSQLServer',1,264128,3) WITH TABLERESULTS
GO
DBCC PAGE ('TutorialSQLServer',1,264129,3) WITH TABLERESULTS
GO
DBCC PAGE ('TutorialSQLServer',1,264130,3) WITH TABLERESULTS
GO
We can clearly see in the output
of DBCC IND command, the first data page/leaf level "264128" contains
primary key 2 to 160 with its respective
data from, next data page "264129" contains primarykey 162 to 316 and
data page "264130" contains primarykey 318 to 472 all the pages are
nicely externally and internally defragmented.
So let us fragment it by
inserting the primarykey 3 in the table
and see what happen.
INSERT INTO [DBO].Fragmented
SELECT
n,
n,
n,
'Some text..',
n,
n,
'Some text..',
n,
n
,'Some text..'
FROM Numbers WHERE N=3
GO
Row with primarykey (key value) 3 has inserted into the
table so now as described in introduction section the index should be
fragmented,
so let us again look into the output of
sys.dm_db_index_physical_stats.
SELECT OBJECT_NAME(OBJECT_ID) Table_Name, index_id,index_type_desc,index_level,
avg_fragmentation_in_percent fragmentation, avg_page_space_used_in_percent Page_Density, page_count,Record_count
FROM sys.dm_db_index_physical_stats
(db_id('tutorialsqlserver'), object_id ('Fragmented'), null, null,'DETAILED')
Where
Index_Level =0 and Index_Id=1
Look at the result of DMV, now the index
is slightly fragmented, before inserting
the above row DMV output was not showing
any fragmentation and now its showing 0.0579206487112656
percent, and there is a change in page density and page count and offcorse the
record count as well.
So as per out theory the second page should
be out of order and linked to the first page with approx half of rows should be
moved to the new page and this newly created page`s next page should
be linked to the second page of the last result.
So let us again used our
"REFRESH_INDOUTPUT" stored procedure to get the result of IND
command.
EXEC REFRESH_INDOUTPUT 'TutorialSQLServer', 'Fragmented', 1
GO
SELECT Table_Name, PrevPagePID, PagePID, NextPagePID from INDOUTPUT
WHERE IndexLevel = 0 and PrevPagePID = 0
--274014
SELECT Table_Name, PrevPagePID, PagePID, NextPagePID from INDOUTPUT
WHERE PagePID=274014
--264129
SELECT Table_Name, PrevPagePID, PagePID, NextPagePID from INDOUTPUT
WHERE PagePID=264129
Great, as expected the second page is out
of order, then linked back to the ordered next page.
Let us again look inside the all the first
three pages using DBCC PAGE command.
DBCC PAGE ('TutorialSQLServer',1,264128,3) WITH TABLERESULTS
GO
DBCC PAGE ('TutorialSQLServer',1,274014,3) WITH TABLERESULTS
GO
DBCC PAGE ('TutorialSQLServer',1,264130,3) WITH TABLERESULTS
GO
Look at the results, now the first page
contains the newly inserted row, but because there was no space left to insert
a new row in the page,
The Storage Engine has created a new page
moved half of rows into this newly created page and then linked it to the first
and second page, half of the rows moved to a new page with id
"274014" you can see there is no change in data of page id "264129" as it wasn`t required.
Write performance problem with fragmentation (Page Split)
In this section we will look
into a fragmentation related performance problem which occurs during the page
split.
For that we will insert a row and look
into the Statistics IO and transaction log activity using
sys.dm_tran_database_transactions.
In last test we have externally fragmented the table by inserting the 3 in the Unique Clustered Key value so we are cleaning the table and index by deleting key key value and alter that we will Rebuild the table, So the Index can be de-Fragmented again and we can play further:)
Delete DBO.Fragmented
Where Primarykey = 3
GO
Alter Table DBO.Fragmented Rebuild
GO
So let us first insert a new row
in sequence.
SET STATISTICS IO ON
-- transaction log activity
Begin Tran
INSERT INTO [DBO].Fragmented
SELECT
500001,
500001,
500001,
'Some text..',
500001,
500001,
'Some text..',
500001,
500001,
'Some text..'
SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('tutorialsqlserver');
Commit
Look at the output of statistics
IO results, to complete this insert requested SQL Server required 3 logical
reads and this transaction has generated 344 bytes for the transaction log.
Now let us again insert the row,
but this time we will insert a row in between the clustered key so page split can occur.
In the below test we are
inserting 3 in the Clustered Index key column.
-- Transaction log activity
Begin Tran
INSERT INTO [DBO].Fragmented
SELECT
3 ,
3,
3
,'Some text..'
,
3,
3
,'Some text..'
,
3,
3
,'Some text..'
SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('tutorialsqlserver');
Commit
Look into the result of
Statistics IO and
sys.dm_tran_database_transactions DMV , in this transaction page split occurs
and it required 12 pages logical read to complete the transaction and 12148
bytes generated for the transaction log, more transaction logs means more sized
transaction log database, and all the transaction log database should be backed
up/log shipped/mirrored etc to save the database for disaster recovery. Thus more
transaction log means more overhead.
In this particular scenario,
this transaction is almost 4 times expensive compare to the last transaction.
So where ever page split occurs, it takes more time, more IO and generate more
logs, hence become more expensive in all terms, so if the page split can be
avoided, then it should be avoided in all teams.
Summary
In todays article we learnt what is fragmentation, how fragmentation
occurs why it is considered expensive, how SQL Server deals with it. But this
topic hasn`t finished yet, we are coming up with the follow up article which
will give you more insights about the fragmentation till then I have a question
for readers.
Do you know fragmentation doesn`t matter when...
Complete the sentence and win…
What you think about this subject, write in the comment section.
Quote
"That’s been one of my mantras—focus and simplicity. Simple can be harder
than complex; you have to work hard to get your thinking clean to make it
simple."—Steve Jobs