In this article we will try to find the reason behind why primary key should be static and what problems we can face if the primary key is not static.
Introduction
Hello and welcome again to the this running series on the primary key (Clustered Unique Index), if you are new to this series, we recommend you to follow the series to understand the subject better, In this article we will see why the primary key (Clustered Unique Index) should be static, what problems occur if we create an Non Static/update able Primary Key/Unique Clustered Index and how it will affect your servers.
As we used in this series, we will use some DMVs and some undocumented command again so it is not recommended to use the following scripts on the production server.
In the below script we are creating a table with one primary key on primarykey column, one non clustered index and one unique non clustered index on the searchcol column. Why we are creating the same index as unique what the advantages/disadvantages are except then putting a constraint on the column(s), we will see all these with the demos in the below article so let get started and as usually first created a table, but this time we will not insert the primary key 3 so we can update a key to key 3 and see what effects take place.
So here is the demo script below:
--------------------------------------SCRIPT TO GENERATE DUMMY DATA
-----------CREATE DUMMY TABLE
BEGIN TRAN
IF Object_ID ('NotNarrowTable','U') is not null
drop table NotNarrowTable
CREATE TABLE [DBO].[NotNarrowTable] (
Primarykey BIGINT NOT NULL ,
Keycol VARCHAR(50) NOT NULL
,SearchCol INT not null
,SomeData char(20)not null ,
Keycol2 VARCHAR(50) NOT NULL
,SearchCol2 INT
,SomeData2 char(20),
Keycol3 VARCHAR(50) NOT NULL
,SearchCol3 INT
,SomeData3 char(20) )
------------------------------------CREATE DUMMY TABLE START
------------------------------------INSERT DATA IN DUMMY TABLE
INSERT INTO [DBO].[NotNarrowTable]
SELECT TOP (100000)
n ,
n,
n
,'Some text..'
,
n,
n
,'Some text..'
,
n,
n
,'Some text..'
FROM Numbers Where N!=3
/*
Link to generate the number table
--http://www.dotnetfunda.com/codes/show/8271/create-number-table-in-sql-server
*/
------------------------------------INSERT DATA IN DUMMY TABLE
------------------------------------ADD PRIMARY KEY IN DUMMY TABLE
ALTER TABLE DBO.NotNarrowTable
ADD CONSTRAINT PK_NotNarrowTable PRIMARY KEY (Primarykey) With (MaxDop=1)
Create Nonclustered index NC_NotNarrowTable_SearchCol3 on NotNarrowTable(SearchCol3) With (MaxDop=1);
Create UNIQUE Nonclustered index NC_UKNotNarrowTable_SearchCol3 on NotNarrowTable(SearchCol3) With (MaxDop=1) ;
------------------------------------ADD PRIMARY KEY IN DUMMY TABLE
COMMIT
--------------------------------------SCRIPT TO GENERATE DUMMY DATA END
Please look at the above script carefully, we are inserting 100000 rows in the table from 1 to 100001, except the primary key 3 and we have used the Maxdop hint with the index definition, just to guarantee that creation of index should not run in the parallel and give me no fragmentation in the index at all, because building or rebuilding the index can run in parallel depend on the data size and version of your SQL Server, In the parallel Rebuild process index creation runs independently for each thread and merge all the rebuilt, so it leads a minor negligible fragmentation in the index you can read the full story
here.
So the table is built and consists three indexes, let us quickly look into the indexes detail using the
sys.dm_db_index_physical_stats DMV.
SELECT
OBJECT_NAME(IPS.OBJECT_ID) Table_Name, Name,index_type_desc,index_level,
avg_fragmentation_in_percent fragmentation ,avg_page_space_used_in_percent Page_Density ,page_count,Record_count
FROM sys.dm_db_index_physical_stats
(db_id(), object_id ('NotNarrowTable' ), null ,null,'DETAILED')AS IPS
INNER JOIN sys.indexes AS i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
Where
index_level=0
Look at the details of the our newly created indexes no index is fragmented, perfect for our demonstration, see the both Non Clustered indexes they seem like they are duplicate, and till now there is no difference at all in the both non clustered index even the page density matches to the 13 th decimal point :), btw, I really don't know why the Storage Engine team finds It's good to show the page density to 13 decimal points.
To understand the topic better we will look directly into the pages using some undocumented ways and again, it is not recommended to use any undocumented command on the production system because it is unsupported by the Microsoft.
We will look first at the all the index pages using the store procedure
REFRESH_INDOUTPUT which is created for this series, for this article we are using SQL Server 2017, in SQL Server 2012 and onwards there is a DMV
SYS.DM_DB_DATABASE_PAGE_ALLOCATIONS which shows the dbcc page output, but below SQL Server 2012 you can`t manipulate the output of the DBCC page command so for this series, we have created a Store Procedure which sends the dbcc ind output in the INDOUTPUT table
and then we can manipulate it.
-------------Clustered Index pages
-- DBNme,TableName,IndexID
EXEC REFRESH_INDOUTPUT 'TutorialSQLServer', 'DBO.NotNarrowTable', 1
SELECT Table_Name ,IndexID, Prevpagepid , PAgePID , Nextpagepid FROM INDOUTPUT P
Where IndexLevel = 0
Order by PagePId
Look at the above result, in the sample result Clustered Index pages are in order and adjacent to each other, in the above table setup script we did not insert the primarykey =3 so whenever we will insert it will move to the very first data page which is "2899544" (to understand the index structure better visit SQL Server Index Internal article)
So, let us look into the first two data pages using DBCC PAGE command.
-----------------------------------------------------------------
------- First Data Page of Clustered INDEX
DBCC PAGE('TutorialSQLServer',1,2899544,3) WITH TABLERESULTS
------ Second Data Page of Clustered INDEX
DBCC PAGE('TutorialSQLServer',1,2899545,3) WITH TABLERESULTS
----------------------------------------------------------------
As expected, this is the first data page of the index and the clustered index is in ascending order (if we do not specify the order of the indexes while creation, it generates in ascending order) so the first data page consists the lowest numeric key values, the very first data page consists of 1 to 79 key values in order and the second data page consist the key values from 80 to 155.
To understand the effect fully let us repeat the same process to both non clustered index as well and see the first two index pages in the index.
------------ Non Unique NonClustered INDEX, INDEX ID=2( STEP 1 )
EXEC REFRESH_INDOUTPUT 'TutorialSQLServer', 'DBO.NotNarrowTable', 2
SELECT Table_Name , Prevpagepid , PAgePID , Nextpagepid FROM INDOUTPUT P
Where IndexLevel = 0 Order by PagePId
So here we go these are the data pages of the index 2 of the DBO.NotNarrowTable table which is Non Clustered non unique on the search col column, which has an integer data type, data pages of non clustered index representation is quite different from the clustered index data pages, header and details are segregated into two result panel and a second panel consist the actual data of the table look at the below GIFs these are first two data pages of the index.
-----------------------------------------------------------------
------- First Data Page of NON UNIQUE NonClustered INDEX
DBCC PAGE('TutorialSQLServer',1,2909608,3) WITH TABLERESULTS --1 to 450 ( STEP 2 )
------ Second Data Page of NON UNIQUE NonClustered INDEX
DBCC PAGE('TutorialSQLServer',1,2909609,3) WITH TABLERESULTS --451 to 899 ( STEP 3 )
----------------------------------------------------------------
We can see in the above GIF, the second result consists the actual data in the index, and the very first data page has two columns first is the searchcol which is our index key column and the second column is primarykey,wait, but we have defined key only on the searchcol column only so why primarykey column exists?
Non Clustered Index always required a pointer back to the Clustered Index that is why in each and every NON Clustered Index has the ID of Clustered Index so every row can identify its associated Clustered Index`s row whenever required,In our case the pointer is a primarykey column because it is a Unique Clustered Index, but if you look closely at the result you will see it is marked as (key), just because we did not specify the unique index.
It is part of the key column basically storage engine required to identify the row uniquely, so if you won`t make any index Unique storage engine somehow has to make it unique.
So in the above example primarykey column is unique and we have created created a non Unique Non Clustered Index on searchcol column, so to make it unique storage engine is using searchcol column along with the primarykey column.
As we can see in the GIF first page consists key from 1 to 450 and second page consist 450 to 889.Ok, nice,let us see the data pages and actual data of Unique Non Clustered Indec as well.
------------ Unique NonClustered Index on searccol INDEX ID = 3 (STEP 1)
EXEC REFRESH_INDOUTPUT 'TutorialSQLServer', 'DBO.NotNarrowTable', 3
SELECT Table_Name , Prevpagepid , PAgePID , Nextpagepid FROM INDOUTPUT P
Where IndexLevel = 0 Order by PagePId
------- First Data Page of UNIQUE NONClustered INDEX (STEP 2)
DBCC PAGE('TutorialSQLServer',1,2918112,3) WITH TABLERESULTS --1 to 450
------ Second Data Page of UNIQUE NONClustered Clustered INDEX (STEP 3)
DBCC PAGE('TutorialSQLServer',1,2918113,3) WITH TABLERESULTS --451 to 899
--------------------------------------------------
In the above GIF we can see the result are matching exactly with the NON Unique NON clustered index.The only noticeable difference is the parimarykey column is not the part of the key just because the Storage Enginencan guaranteed to identify this index uniquely on seachcol, in this index still the primarycol is the part of this non clustered index in the leaf level just to have the linkages with the Clustered Index.
Ok so far we have only build the Index and seen the Index structure, now we can update the primarykey column and then we will see the changes in the Index and how the page split evet occure and it leads to the fragmentation, along with this we will see the Transaction Log activity as well.
So let us update the primarykey column 460 to 3, reason behind choosing 460 to update with 3 is just to move data from second page to first page technically.
-- Transaction LOG Activity
BEGIN TRAN
Update [DBO].[NotNarrowTable] SET PrimaryKey = 3 Where PrimaryKey = 460
SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('TutorialSQLServer');
COMMIT
--
We can clearly see in the above update query we are updating the primarykey column 460 to 3 and in this simple operation just to update a single row generated 18260 bytes of record for transaction log database, just because page split event occurs, size of transaction log generated can be depend on version to version of SQL Server. More transaction log size means a large transaction log database, large transaction database size means it would take more space more time in Mirroring, log shipping, etc for more detail you can look at our article or Paul Randal`s this article.
Ok, so now we are assuming that a page spilt event has occurred, so let us confirm it by using sys.dm_db_index_physical_stats
DMV, Look at below for the query and the result:
SELECT
OBJECT_NAME(IPS.OBJECT_ID) Table_Name, Name,index_type_desc,index_level,
avg_fragmentation_in_percent fragmentation ,avg_page_space_used_in_percent Page_Density ,page_count,Record_count
FROM sys.dm_db_index_physical_stats
(db_id(), object_id ('NotNarrowTable' ), null ,null,'DETAILED')AS IPS
INNER JOIN sys.indexes AS i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
Where
index_level=0
Look at the above result set, now this is interesting as we can see in the result Clustered Index and Non Unique Non Clustered Index is Externally fragmented But the Unique Non Clustered Index is still not externally fragmented.
Let us see all three indexes, one by one and try to understand what happened to them by using our Store Procedure and DBCC PAge command again.
EXEC REFRESH_INDOUTPUT 'TutorialSQLServer', 'DBO.NotNarrowTable', 1
SELECT Table_Name ,IndexID, Prevpagepid , PAgePID , Nextpagepid FROM INDOUTPUT P
Where IndexLevel = 0
Order by PagePId
See now the second page is not in order because the page split event has occurred and created an out of order new page and linked to last and the next page and rest pages are in order as it were before, now look at the data inside the pages.
------- First Data Page of Clustered INDEX (After Page Split)
DBCC PAGE('TutorialSQLServer',1,2899544,3) WITH TABLERESULTS
------- Second Data Page of Clustered INDEX (After Page Split)
DBCC PAGE('TutorialSQLServer',1,2900982,3) WITH TABLERESULTS
------ Third Data Page of Clustered INDEX (After Page Split)
DBCC PAGE('TutorialSQLServer',1,2899545,3) WITH TABLERESULTS
We can see in the above GIF that the newly updated row id inserted in the first page and almost half of the rows of first page moved to the newly created out of order page and rest pages are as they were before.
For more details, see our article on
fragmentation.
It was obvious that the above transaction would create an external fragmentation and a page split to the primary key column, but why fragmentation occur in the non unique clustered index and why it did not occur in the non unique clustered index this is the question we need to find the answer, so again we will look into the non unique non clustered index again using our stored procedure and then the data pages.
------------ Non Unique NonClustered INDEX, INDEX ID=2( STEP 1 ) [After Page Split]
EXEC REFRESH_INDOUTPUT 'TutorialSQLServer', 'DBO.NotNarrowTable', 2
SELECT Table_Name , Prevpagepid , PAgePID , Nextpagepid FROM INDOUTPUT P
Where IndexLevel = 0 Order by PagePId
Look at the above image, we can see the third page "2837639" is out of order and don`t forget this non clustered.
The index key column is searchcol column but we have updated the primary key column not the searchcol column, still the page split event occurs, again we will look into the pages but this time with more precision.
-----------------------------------------------------------------
------- First Data Page of NON UNIQUE NONClustered INDEX
DBCC PAGE('TutorialSQLServer',1,2909608,3) WITH TABLERESULTS --1 to 450 ( STEP 2 )
------ Second Data Page of NON UNIQUE NONClustered INDEX
DBCC PAGE('TutorialSQLServer',1,2909609,3) WITH TABLERESULTS --451 to 673 ( STEP 3 )
------ Third Data Page of NON UNIQUE NONClustered INDEX
DBCC PAGE('TutorialSQLServer',1,2910023,3) WITH TABLERESULTS --674 to 899 ( STEP 4 )
----------------------------------------------------------------
Please have a look at the above GIF both columns are marked as key column, so in other works that means both columns are key column there is nothing in the leaf level (non key column), but wait we didn`t specify the primarykey column as key column for the non clustered index so how it became the part of key column?
Actually, all index key values must be the unique key column whether it is a Clustered or Non Clustered Index to identify the row uniquely in the table, so if you won`t make it unique then storage engine has to add something in the key column to make it unique as we have already seen in the above example we
define searchcol column as the Non Clustered Index and since we have not defined, it unique, so the Storage Engine believes seachcol column is not Unique by itself if it was the user have defined it unique so to make this index unique it added primary key column as the second key and made it Composite Index
so when we tried to update the primarykey column and since the primarykey column now the part of key column of the above index it fragmented.
As we updated the primary key column 460 to 3, since this is the new key column to the index, storage engine split the page into two and moved approx the half data from the page to the new data page.
Now let us quickly see the first three data pages of Unique Non Clustered Index, we have already seen above that
the Unique Non Clustered Index is not fragmented.
------------ Unique NonClustered Index on searccol INDEX ID = 3 (STEP 1)
EXEC REFRESH_INDOUTPUT 'TutorialSQLServer', 'DBO.NotNarrowTable', 3
SELECT Table_Name , Prevpagepid , PAgePID , Nextpagepid FROM INDOUTPUT P
Where IndexLevel = 0 Order by PagePId
------- First Data Page of UNIQUE NONClustered INDEX (STEP 2)
DBCC PAGE('TutorialSQLServer',1,2918112,3) WITH TABLERESULTS --1 to 450
------ Second Data Page of UNIQUE NONClustered Clustered INDEX (STEP 3)
DBCC PAGE('TutorialSQLServer',1,2918113,3) WITH TABLERESULTS --451 to 899
Look at the above GIF the data pages are in order page and split didn`t occur just because we have created the searchcol column unique, so now the Storage Engine does not need to add any other column to make this index unique but still the Storage Engine required a pointer to point each row to all available indexes so it has primarykey column in the leaf level not as the key column and we can see in the above gif the leaf column has updated as well .
So always defined the key unique if its unique whenever you don't specify the keyword UNIQUE while creating the Index, storage engine consider it as a non unique index and even query optimizer treat it differently as well, we have shown this in
set statistics IO article.
Conclusion
In the above article we try to find why Primary key (Clustered Unique index) should be static, if we are allowing it to update by user then it can cause the page split and lots of page split event are not good for specially for
OLTP systems it decreases the system performance and cause fragmentation as well well, which is neither good are reading or writing to the database, for more details look our in depth
article on fragmentation.