Tips to write a better TSQL Query for enhancing the performance of Stored Procedure - Part I

Rajnilari2015
Posted by in Sql Server category on for Beginner level | Points: 250 | Views : 1617 red flag
Rating: 4.5 out of 5  
 2 vote(s)

Many times it so happens that we write a query but that is not optimized and henceforth it takes a long time to get execute. This is a series of posts that will address various problems of bad way of writing a TSQL program and what is the suggested way of writing which on the other hand will boost the query performance.
Recommendation
Read Find values from outside braces using T-SQL and XQuery before this article.

Introduction

Many times it so happens that we write a query but that is not optimized and henceforth it takes a long time to get execute. This is a series of posts that will address various problems of bad way of writing a TSQL program and what is the suggested way of writing which on the other hand will boost the query performance.In this article we will look into some tips as how we can write a better TSql query for enhancing the performance of Stored Procedure.

Rules to follow and Points to remember while writing a TSQL query

  1. Never name the user defined store procedures with sp_.

    They are system defined and mostly resides under the master db. So if we write a user defined stored procedure by the name sp_ the query engine will first search the Stored Procedure inside the master db and if not found then it will search in the current session db. This brings unnecessary round trip. Better to use some other naming convention as usp_.E.g. instead of writing sp_GetEmployeesDetail, let's write usp_GetEmployeesDetail

  2. Avoid using * in SELECT statement.

    E.g never write

     SELECT * FROM TableName

    Here * indicates all columns. When we write SELECT *, we are asking the query engine to send back all the columns which we may not need in our application. The way of unnecessary columns makes for more data inflow from the database server to the client which on the other hand slows access and increases the load on the client machines thereby causes more more time to travel across the network. Not only that, using SELECT * refrains the query from applying the covering index. Consider, that we have 5 columns in a table and we have applied the covering index for 3 columns.Now we added 2 more columns to the underlying table.This would cause the query optimizer to ignore the optimized covering index and will go for a Full Table scan.

    Another reason could be that we have two tables where both of them having ID column as their primary key and we have performed a join operation between the tables like

    				Select *
    				From table1
    				Join table2  ON table1.ID = table2.ID
    			

    In such as case, it will be difficult for the data consumer to understand which ID column pertains to whom.

    The above statement is equally applicable for the views when applied on the tables if the underlying table structures change(s)

    A better approach is to return only those columns that are needed as

    				Select 
    					T1ID = t1.ID
    					,T2ID = t2.ID
    					,t1.Col1
    					,t1.Col2
    					,t2.Col3
    				From table1 t1
    				Join table2 t2  ON t1.ID = t2.ID
    			

    Here t1 and t2 are the table alias.Also, whenever the query engine encounters SELECT *, it converts the * to columns name before query executes.

  3. Use EXISTS instead of IN clause

    EXISTS informs if a query returned any results as soon as a match is found.E.g.

    	SELECT ColumnName(s)
    	FROM TABLENAME t1
    	WHERE
    		-- the statemet turn out true as soon as a match is found
    		EXISTS ( SELECT COLUMNNAME(S) 
    			FROM TABLENAME t2
    			WHERE t1.ID = t2.ID
    			)
    	

    In comparison, IN compares one value to several values e.g.

    		SELECT ColumnName(s)
    		FROM TABLENAME t1
    		WHERE t1.ID IN(SELECT ID FROM FROM TABLENAME t2)
    	

    Also EXISTS clause uses INDEX at the time of fetching records and is there by faster than IN which on the other hand does not.

    EXISTS is a kind of correlated subquery while in the case of IN clause the sub query is evaluated first and then joined to the original table.

    As a final note, though we went in favour of EXIST and opposed to use IN , however, as a general thumb of rule, use EXIST when the sub-query results is very large.In clause is best served when we have a static list to pass like

    	SELECT ColumnName(s)
    	FROM TABLENAME t1
    	WHERE t1.ID IN(1,2,3 4,5 6,7,8,9,0)
    	
  4. Avoid using User Defined functions.

    User defined functions are blackboxes for the query optimizer.Let us evaluate the statement

    Let us write a simple SPLIT function that will split the comma seperated strings

    CREATE FUNCTION [dbo].[Splitter]
    (
       @List NVARCHAR(MAX),
       @delimiter NVARCHAR(100)
    ) RETURNS @t TABLE (Element NVARCHAR(MAX))
    AS
     BEGIN
       DECLARE @xml XML
       SET @xml = N'<t>' + REPLACE(@List,@delimiter,'</t><t>') + '</t>'
    
    		   INSERT INTO @t(Element)
    		   SELECT  r.value('.','varchar(MAX)') as Element
    		   FROM  @xml.nodes('/t') as records(r)
    	RETURN
     END
     

    Now invoke the function as under

     SET STATISTICS IO ON;
    SET STATISTICS TIME ON;
    GO
    DECLARE
    	 @List VARCHAR(8000),
    	 @delimiter varchar(10)
    
    SELECT @List = REPLICATE('1,2,3,4,5,6,7,8,9,',500) + ',' + REPLICATE('1,2,3,4,5,6,7,8,9,',500)
    SET @delimiter = ','
    SELECT * FROM [dbo].[Splitter](@List,@delimiter)
    GO
    SET STATISTICS IO OFF;
    SET STATISTICS TIME OFF;
    

    If we look into the query plan, we will get the below

    As can be figure out that, it's hardly impossible to know that execution plan for the TVF.

    Now let us look into the statistics

    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 0 ms.
    
     SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 0 ms.
    
     SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 0 ms.
    
    (4001 row(s) affected)
    Table '#B955F13E'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    
    (1 row(s) affected)
    
     SQL Server Execution Times:
       CPU time = 156 ms,  elapsed time = 248 ms.
    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 0 ms.
    
     SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 0 ms.
     

    We can make out that, there are 8 logical reads and the SQL Server Execution Times: CPU time = 156 ms, elapsed time = 248 ms.

    Now let us re-write the same program without using function

    SET STATISTICS IO ON;
    SET STATISTICS TIME ON;
    GO
    DECLARE 
    		@xml XML, 
    		@List VARCHAR(8000),
    		@delimiter varchar(10)
    SELECT @List = REPLICATE('1,2,3,4,5,6,7,8,9,',500) + ',' + REPLICATE('1,2,3,4,5,6,7,8,9,',500)
    SET @delimiter = ','
    SET @xml = cast(('<t>'+replace(@List, @delimiter, '</t><t>')+'</t>') AS XML)
    SELECT C.value('.', 'VARCHAR(MAX)') as Element FROM @xml.nodes('t') as X(C)
    GO
    SET STATISTICS IO OFF;
    SET STATISTICS TIME OFF;
     

    If we look into the query plan, we will get the below

    It is explaining at what stage what plan has been generated and the cost.If any improvement is needed, the query optimizer is providing the hint too.

    Now let us look into the statistics

    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 0 ms.
    
     SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 0 ms.
    
     SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 0 ms.
    
     SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 0 ms.
    
    (1000 row(s) affected)
    
    (1 row(s) affected)
    
     SQL Server Execution Times:
       CPU time = 31 ms,  elapsed time = 79 ms.
    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 0 ms.
    
     SQL Server Execution Times:
       CPU time = 0 ms,  elapsed time = 0 ms.
    

    We can make out that, the SQL Server Execution Times: CPU time = 31 ms, elapsed time = 79 ms.

    We can infer that there has been 5% improvemnt in the CPU time and the elapsed time has been reduced by almost 2%

  5. Maintain small Clustered Index

    It is advisible to maintain as small clustered index as much as possible since the fields used in clustered index may also used in nonclustered index.Data in the database is also stored in the order of clustered index.So, maintaining a huge clustered index on a table with a large number of rows increases the size drastically.

References/Suggested Reading

  1. Why is SELECT * considered harmful?
  2. Effective Clustered Indexes

Conclusion

Though there are no standard rules that follow X then Y and Z will yield a better query ensuring performance enhancement but there are some tips that we need to follow while writing the query and will produce better performance.We have started our journey to the realm of query performace optimization.Let's hit upon some more points at Part II.Stay tuned and hope you all have enjoyed this article.Thanks for reading.

Recommendation
Read Perform Android Like Search in TSQL after this article.
Page copy protected against web site content infringement by Copyscape

About the Author

Rajnilari2015
Full Name: Niladri Biswas (RNA Team)
Member Level: Platinum
Member Status: Member,Microsoft_MVP,MVP
Member Since: 3/17/2015 2:41:06 AM
Country: India
-- Thanks & Regards, RNA Team


Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)