Missing Index Analysis and Creation.

Deviprasads
Posted by in Sql Server category on for Advance level | Points: 250 | Views : 20990 red flag
Rating: 5 out of 5  
 1 vote(s)

We get little confused, when the word INDEX comes into the picture? We ask so many questions to our-self before creating a non cluster index, as far as the performance is concern.


 Download source code for Missing Index Analysis and Creation.

Introduction:

As we are aware that the non-clustered indexes plays a very vital role in performance. Indexed keys, order of indexed keys, included columns, order of included columns, fill factor etc are the most important factors for an index. Index has a thumb rule of 50-50. Some indexes will make you feel like in heaven and some will give you a reverse feel of hell. 

Two weeks ago, one of my colleague (good friend) NG, asked me a question. How can we analyze and create most efficient indexes on a DB? In one of his DB project, non-clustered indexes were not being created correctly. Most of the tables even did not have a non clustered index. However every table had a clustered index. 

SQL Server 10.0 made it very easy for us by providing DMVs. In earlier versions of SQL Server, profiler was the only option to analyze and find the missing indexes.

Objective:

Here we are going to play around with couple of DMVs to achieve our requirement and get missing index scripts dynamically. 

sys.dm_db_missing_index_details
sys.dm_db_missing_index_group_stats
sys.dm_db_missing_index_groups
sys.dm_db_partition_stats

Note: We will be using AdventureWorks DB for the test purpose here.

Play Around:


What is a missing Index?

SQL server is intelligent enough to analyze our query before executing them on the server. You might have seen some green color texts (Missing Index Details) pops up in execution plans, when there is index a suggestion from SQL server. SQL server gives hint that, this index could have been used to perform the query better. 

Note: Missing index suggestion can only be seen, if you have the execution plan enabled. It's very difficult to check the execution plans for every queries being fired to the DB from application in a prod environment unless with the help of profiler. Now the question is who saves these details and where to find these details?

Answer: sys.dm_db_missing_index_details DMV.

Example: SELECT CustomerID, AccountNumber, CustomerType from Sales.Customer WHERE TerritoryID = 6;


(Pic:1.0)


Looks like the above query is doing a clustered index scan on Sales.Customer table. see we had the actual execution plan ON and that's why we are able to get the missing index detail. This missing index has a performance impact of 91.98% on the whole batch. You can right click and go to 'Missing Index Details.. ' to get the whole script of missing index on a new query window.

Now, we got a little idea about missing indexes. Next, how can we query these information from our DB?

You can do a select * on sys.dm_db_missing_index_details DMV to get missing index details but here We will tweak little bit to generate missing index scripts dynamically with some good analytic information, which will help us to decide whether a new index will be really helpful for the table or not?

Script:
/*    
[Author]: deviprasad   
[Date of Creation]: 08/01/2013   
[Contact]: deviprasad@expertsworld.in   
*/   

SELECT TOP 100 PERCENT DB_NAME() AS [DB Name]
, OBJECT_NAME(dmvMID.OBJECT_ID,dmvMID.database_id) AS [Table Name]
, '/* ' + '
	[Author]: '+ REPLACE(SUSER_SNAME(), '.', ' ') + '
	[Date of Creation]: ' + CONVERT(varchar(10), GETDATE(), 101) + '
	[Contact]: '+ SUSER_SNAME() + '@expertsworld.in
	*/ ' + CHAR(10) + CHAR(10)

	+'CREATE NONCLUSTERED INDEX [IDX_' + OBJECT_NAME(dmvMID.OBJECT_ID,dmvMID.database_id) + '_'
	+ REPLACE(REPLACE(REPLACE(COALESCE(dmvMID.equality_columns,''),', ','_'),'[',''),']','') +
	CASE
		WHEN (dmvMID.equality_columns IS NOT NULL AND dmvMID.inequality_columns IS NOT NULL) THEN '_'
		ELSE ''
	END
	+ REPLACE(REPLACE(REPLACE(COALESCE(dmvMID.inequality_columns,''),', ','_'),'[',''),']','')
	+ ']' 
	+ ' ON ' + dmvMID.statement
	+ ' (' + COALESCE (dmvMID.equality_columns,'')
	+ CASE 
		WHEN (dmvMID.equality_columns IS NOT NULL AND dmvMID.inequality_columns IS NOT NULL) THEN ','
		ELSE
	'' END
	+ COALESCE (dmvMID.inequality_columns, '')
	+ ')' + CHAR(10)
	+ COALESCE (' INCLUDE (' + dmvMID.included_columns + ')', '') + CHAR(10)
	+ 'WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 80)'
	+ ' GO'
	AS [Missing Index SQL Script]
, dmvMIGS.user_seeks as [User Seeks]
, CAST(dmvMIGS.avg_user_impact AS varchar)+ ' %'AS [Estimated Impact]
, dmvMIGS.last_user_seek AS [Last User Seek]
--, dmvMIGS.avg_total_user_cost AS [Estimated Cost on disk]
, (SELECT COUNT(*) from sys.indexes where OBJECT_ID = dmvMID.OBJECT_ID and type_desc = 'NONCLUSTERED') AS [NC Indexes]
, OBJECTPROPERTY(dmvMID.object_id, 'tableHasClustIndex') as [Is Clust Idx]
,(SELECT
CAST (SUM(
 CASE
 WHEN ps.usedpages > ps.pages THEN (ps.usedpages - ps.pages)
 ELSE 0
 END * 8) AS varchar(100)) + ' KB' AS indexsize
 FROM sys.indexes i
 INNER JOIN (
 SELECT
 OBJECT_ID,
 index_id,
 SUM (used_page_count) usedpages,
 SUM (
 CASE
 WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
 ELSE lob_used_page_count + row_overflow_used_page_count
 END
 )pages
 FROM sys.dm_db_partition_stats
 WHERE object_id = dmvMID.OBJECT_ID
 GROUP BY object_id, index_id
 ) ps on i.index_id = ps.index_id
 WHERE i.object_id = dmvMID.OBJECT_ID) as [Total Index Size]

FROM sys.dm_db_missing_index_groups as dmvMIG
INNER JOIN sys.dm_db_missing_index_group_stats dmvMIGS
	ON dmvMIGS.group_handle = dmvMIG.index_group_handle
INNER JOIN sys.dm_db_missing_index_details dmvMID
	ON dmvMIG.index_handle = dmvMID.index_handle
WHERE dmvMID.database_ID = DB_ID()
--AND OBJECT_NAME(dmvMID.OBJECT_ID,dmvMID.database_id) LIKE '%Customer%'
ORDER BY --[NC Indexes] ASC, 
[Estimated Impact] DESC 



(Pic 1.1)

[DB Name]: Data base name on which you fired the given query for missing indexes.
[Table Name]: Non clustered index needs to be created on this table.

[Missing Index SQL Script]: Auto generated script for missing index.
[User Seek]: How many times SQL server found this missing index or how many times the query being fired.
[Estimated Impact]: How much performance impact will be gained on the cost of the missing index.
[Last User Seek]: Last time the query was fired or last time SQL server suggested us about the missing index.

[NC Index]: How many non-clustered indexes are already available on the table?
[Is Clust Index]: Is there a clustered index available on the table or not?
[Total Index Size]: Total index size in KB on datafile/disk including clustered index.


(Pic 1.2)

Wow! we got the index script without really coding it!!

The above index script is generated in [Missing Index SQL Script] column. I have gone ahead and created the same non-clustered index on Sales.Customer table by this script. Now lets analyze the index in detail.

Index key is created on TerritoryID which is a equality column (=) in where clause. Index keys can be also created on non-equality columns like >, <,  != etc in where clause.

Included columns are our select list with proper order. 

Check this out, after a non clustered index on Sales.Customer table.


(Pic: 1.3)

Wow Index seek!! The same query is performing healthier than previous. We see the index seek on the non clustered index, we just created. That's a good sign however!

Now compare (Pic 1.0) with (1.3). check the subtree cost and operator cost gone down drastically (0.0999411 to 0.010185) That means we got a huge performance improvement.

Notes:


1. Records in any DMVs gets cleared out on recycle of SQL Server service or Server reboot.
2. Not all missing indexes are helpful.
3. Much more analysis required during index creation.
4. You must have access on DMVs to fire this query.

Conclusion:


Non-clustered index plays a very big role when performance is in consideration. BTW, this little script helped my friend NG, to get missing indexes on his DB project. Normally developers do not think about indexes deeply. If you don't know which index needs to be created to tune your DB, then this is the best option for you to start ahead. Once again these indexes are in result of SQL Server's own analysis!!!

To test this, get a backup copy of your production DB and fire this query on that to get a best result.

This same article can be found at my personal blog www.sqlindia.com

(Caution: Do not test this on your production environment directly.)

Reference:


MSDN and MSSQLTIPS


Page copy protected against web site content infringement by Copyscape

About the Author

Deviprasads
Full Name: Deviprasad sahoo
Member Level: Starter
Member Status: Member
Member Since: 6/8/2012 3:40:15 AM
Country: India
S Devi Prasad
http://www.sqlindia.com
currently working as ASE in an USA based health care company.

Login to vote for this post.

Comments or Responses

Posted by: Neeraaj.Sharma on: 8/13/2013 | Points: 25
congrats
keep posting good material...

Posted by: Neeraaj.Sharma on: 8/13/2013 | Points: 25
And +1 for index size.

Posted by: Deviprasads on: 8/13/2013 | Points: 25
Thanks Neeraaj for your kind words!!
Posted by: Bandi on: 8/14/2013 | Points: 25
Hi Neeraj,
Can you clarify my doubt?
I have 100 tables in database. Each table has clustered index on primary key column....
Is it necessary to have non-clustered index on same primary key column for each table?
Posted by: Deviprasads on: 8/16/2013 | Points: 25
As I have mentioned in my article.. A deep analysis is required when a non-clustered index is in consideration. Yes most of the time a non-clustered index on primary key column gives good performance improvement with some included columns (having in most where clauses). A non-clustered index seek is always better than clustered index scan.

Thanks,
Deviprasads

Login to post response

Comment using Facebook(Author doesn't get notification)