You might already seen plenty of articles on the same topic, but today we will see all this with simple explational and an example to understand what actually the SET STATISTICS IO ON tells us?
Introduction
We use SET STATISTICS IO ON to view the activity of data pages in the storage (buffer pool,disk), in SQL Server a data page is 8 KB sized smallest unit, so 1 data page means 8 KB size of data, so any activity which displays the number of pages in the statistics you are actually n* 8 KB of data activity whereas N is referred as number of pages.
In this article we will see how to read and interpret the output of SET STATISTICS IO ON with examples. If you look the output of STATISTICS IO from the left to right, you will see SCAN COUNT first, but for better understanding the output of STATISTICS IO we will start with Logical Reads.
SET STATISTICS IO ON - LOGICAL READS
First, we will look the definition of logical read from the
book online"Number of pages read from the data cache " This one line definition is perfect and make sense, actually all reads are logical read. Confused? The processor can not read the data directly from the secondary storage, all the data pages have to bring on to the primary memory (RAM), this is how the computers work, there is a nice article on that
How Computers Work: The CPU and Memory.
Whenever we required the data to read or write, it has to be in the data cache and logical reads is the metric of STATISTICS IO which tells how many pages are read to complete the action, thus Logical Reads = Data Cache Reads.
First, we need a dummy table with data to understand the concept, we will use
this link to populate the test table in our database.
SELECT OBJECT_NAME(OBJECT_ID) Table_Name, index_id,index_type_desc,index_level,
avg_fragmentation_in_percent fragmentation ,avg_page_space_used_in_percent Page_Density ,page_count,Record_count
FROM sys.dm_db_index_physical_stats
(db_id('tutorialsqlserver'), object_id ('DummyTable' ), null ,null,'DETAILED')
Look into the results of the DMV sys.dm_db_index_physical_stats , the index is not fragmented at all and page density is close to 100% of data pages. If you have no knowledge or very good knowledge about indexes, we recommend you to look into the article about indexes internal.
Now so let`s get started and run a query against our DummyTable with STATISTICS IO.
SET STATISTICS IO ON
SELECT * FROM DummyTable WHERE Primarykey=9999
Look at the Logical Read counter in Statistics IO output, it shows 3 Logical Reads, that means only 3 pages are required to fulfill the above request from the index structure. If you want to learn more about the index structure and how it works please visit our previous article on
index internal.
Let us tweak a query little bit and request all the rows in the range between 9999 and 10020.
SELECT * FROM DummyTable WHERE Primarykey>9999 and Primarykey < 10020
The above query returns 20 rows, and again the logical reads are same as was in the first query, only 3 pages are required to satisfy the above query. Let us again run the query for 9999 literal value but this time against keycol column, which has a varchar data type and it is not indexed column.
Select * from DummyTable where keycol=9999
Look at the output the logical reads, to satisfy the above query, the Query Optimizer has required all the data pages in the buffer cache because we have not created an index against the keycol column.
Let us test again the Logical Reads but In the below test we will write some random different queries and see the output of STATISTICS IO.
select 1 from DummyTable
select 1 from DummyTable where somedata='Some text.. '
select * from DummyTable order by Primarykey
Look at the logical reads for all three queries, all three queries required to scan all the data pages of the index, that is why logical reads are reported equally for all three above query.
Thus, from the above examples we can conclude that, 8 KB pages which required to fulfill the user request is shown as the logical read in the output of STATISTICS IO, from the above tests and example and explanation we learn that we can simply replace the word logical read to data cache read as well.
Let us move to scan count now.
SET STATISTICS IO ON – SCAN COUNT
Again, before we start this section we will first look into what book online say about this
"Number of seeks/scans started after reaching the leaf level in any direction to retrieve all the values to construct the final dataset for the output. Scan count is 0 if the index used is a unique index or clustered index on a primary key and you are seeking for only one value. For example WHERE Primary_Key_Column = <value
>.Scant count is 1 when you are searching for one value using a non-unique clustered
index which is defined on a non-primary key column. This is done to check for duplicate values for the key value that you are searching for. For example WHERE Clustered_Index_Key_Column = <value
>.Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key." –
Book OnlineActually scan count is zero when its a singleton seek, everything else count as in a numeric value. Isn`t it a book on line kind of explanation:)? First, let us try to understand what is singleton seek?
When we query against the unique index (unique clustered index, unique key, clustered primary key) and query processor knows that only maximum 1 row can be return against the query, its singleton seek.
Still, you find it difficult to understand no problem let us try to understand with examples:
We will test this against the clustered primary key.
SELECT * FROM DummyTable WHERE Primarykey = 1
See above Scan Count is zero because the above query is against Unique Clustered Index. Let us modify the above query slightly, but it will be same as above but slightly modified and check the Scan Count, primarykey is an integer type column, so greater than zero is lesser than two (0 and <2) is equivalent to one, right, so let us see the query and check the Scan Count again.
SELECT Primarykey FROM DummyTable WHERE Primarykey > 0 and Primarykey<2
Look at the output of Statistics IO, Scan Count is 1, even this query is logically equivalent to equals to 1 query. We can argue that above query should be converted into singleton seek, scan count should not recorded for this value no? For humans, it should be singleton seek, but on the other way human should not write this query if they are looking for exactly one value :).
If we ignore the literal values in the query, then the query is a range query we want some data based on some range, this is called Range Scan, and because it's not singleton seek for the Query Processor and it is reported as Scan Count 1.
Let`s run another query, which would be again for value 1, see the below query and think what should be the scan count.
SELECT Primarykey FROM DummyTable WHERE Primarykey >= 1 and Primarykey <=1 Option (Recompile)
We use Option (Recompile) to discard the similar cached plan and force the compiler to generate a new execution plan for new literal value. To get more information about option recompile you can look at
this link.Back to the above query, it seems like it's also a range scan right? But look below the image scan count is zero why?
Actually, there is an optimization process called simplification, which rewrite the query when required, it has converted the above predicate into equality predicate. Look at the execution plan with converted predicate below:
I promised to write a full length article on this topic in one of my articles on sqlshak.com soon, till then there are some good article online you can visit
this link.
Let us run another query and try to understand the scan count with one more example, we will write two different queries on a primarykey column to get a value from 1 to 10. Look at the query and output of logical read below:
SELECT Primarykey FROM DummyTable WHERE Primarykey in (1,2,3,4,5,6,7,8,9,10)
SELECT Primarykey FROM DummyTable WHERE Primarykey >=1 and Primarykey <=10
See the difference between scan count and logical IO bottom query is 10 times better than the top query in terms of logical reads and scan count. The reason behind the difference of logical read is that the Query Optimizer sees both queries differently first query is against multiple literal values and next one is "range scan" first query travels 10 times from root to leaf level and bottom query just ones. First query ran in the index tree somewhat like
From index root to leaf level.
Get Row for Primarykey = 1 Return
Get Row for Primarykey = 2 Return
Get rows for Primarykey = 3 Return
................... Upto 10
And second query ran in the index tree from root to leaf level somewhat like :
Get Rows for Primarykey >= 1 and Primarykey <= 10 Return ,
If you want to learn more about how Query Processor reads rows against the multiple literal value you can see my this in depth article on the same topic but for scans.
So are you thinking that query optimizer should able to convert the IN query to Range Scan as well. Think what if range are not in range? Or what if the IN query had literal values from the extreme right and left leaf level, like in (1,2,3, 49999,5000).
offcorse it's not that difficult to solve these problems by the Query Optimizer, but it has to come with the executable execution plan in very short span of time, so it would not good to push the Query Optimizer to do all this stuff when we can do it ourself NO? Paul White has written a full length
article on than you can visit.
there is another article written by Amit Banerjee on
Scan Count, you can visit here to read more about it.
SET STATISTICS IO ON – PHYSICAL READS AND READ AHEAD READS
As
usual, we will look into was
bol says about that "Number of pages read from disk."
." –
Book Online .
You won`t see much on physical read on the internet because when Developers/DBAs wanted to know some information from the Statistics IO they usually see the logical reads as performance counter but that doesn`t mean physical read has no significance. Physical reads tells how may pages requested by the Query Processor from the storage engine and storage engine retrieves it from secondary storage (hard disk) and put in the primary storage system, made them available to read.
When data isn`t in the buffer cache, so all reads are technically physical access of data, but requesting row by row from the physical storage is very expensive so there is mechanism called Read Ahead Read mechanism of SQL Server, which anticipates bunch of rows that might be required by the query so Read Ahead Read mechanism places pages into the buffer pool, even before requested by the Query Processor.
Let us see this in the example:
In the below test we are cleaning all the buffer pool using
DBCC Dropcleanbuffers command so this command is not recommended on production server. Please test in developer system only.
DBCC Dropcleanbuffers With NO_Infomsgs
SELECT * FROM DummyTable WHERE Primarykey <10000
Look above only 3 physical reads (places pages into buffer pool), the rest is read ahead read. We have 3 physical reads, 132 read ahead reads and 135 are logical reads actually logical reads here are equivalent of physical + Read Ahead Reads but this is not always true, we will see this in below example and logical reads are not equals to physical + read ahead reads.
DBCC DropCleanBuffers With NO_Infomsgs
SELECT * FROM DummyTable WHERE primarykey <500000
Look at the output above here logical reads are not equal to physical +read ahead reads I have observed this when Read Ahead Reads place a large amount of pages in the buffer cache in my opinion when read ahead, read and place pages into buffer cache that time some pages are counted by both +physical reads and read ahead read as well, As I can`t verify that so I can`t confirm that.
We usually use the physical read counters to check data were already in cache or fetching from the secondary storage unit. There is a trace Flag available to disable the read ahead read mechanism and its a documented Trace Flag 652, but not recommended on the production servers. Let us disable the read ahead reads mechanism and we see will how much time took the above query to complete.
DBCC DropCleanBuffers With NO_Infomsgs
SET STATISTICS TIME ON
Dbcc Traceon(652)
SELECT * FROM DummyTable WHERE keycol <2
GO
DBCC Traceoff(652)
DBCC DropCleanBuffers With NO_Infomsgs
SELECT * FROM DummyTable WHERE keycol <2
GO
First query without Read Ahead Read mechanism completed in 6.9 seconds, and the second query with Read Ahead Read mechanism completed in 1.6 seconds, such a huge benefit of this mechanism, see the above physical read and logical reads, there is no read ahead read but still physical reads are lesser than logical reads. physical reads are not accurate.
Lately I wrote an article on
Read Ahead Read Testing if you want to read more about read ahead read you can look into it.
SET STATISTICS IO ON FOR LOB DATA TYPE
Data pages read against LOB data type displays as LOB logical reads, LOB physical reads and LOB
read ahead reads. To demonstrate it, we will create a different table and fill this with lob data types
.the example below has taken from
kimberly l tipp blog.
CREATE TABLE TestLobTable
(
c1 int identity,
c2 char(8000) default 'this is a test',
c3 varchar(max) NULL
);
GO
INSERT dbo.TestLobTable DEFAULT VALUES;
GO 1000 -- create an 80MB table
CREATE UNIQUE CLUSTERED INDEX TestLobTableCL
ON TestLobTable (c1);
GO
UPDATE TestLobTable
SET c3 = REPLICATE (convert(varchar(max), 'ABC'), 8000) -- creates a 24 row
WHERE c1 % 17 = 0; -- only 58 rows
GO
SET STATISTICS TIME OFF
DBCC DropCleanBuffers With NO_Infomsgs
SELECT * FROM TestLobTable
Look at lob logical reads column it shows 1030 reads, it is same as regular logical reads number of pages read from the buffer cache to fulfill the user request, so as Lob Physical read.
Try to answer the below question. Run this query on your test machine
DBCC DropCleanBuffers With NO_Infomsgs
SELECT TOP (1)Primarykey from DummyTable where keycol in (1)
Why the read ahead reads placed 128 pages into cache when not required by the query at all?
Summary:
Logical Reads: Number of pages read from the data cache
Scan Count: Everything else other than singleton seek, for example range scan, index scan , table scan and seeking multiple rows in the index, etc.
Physical Reads: Number of pages placed in buffer cache requested by the Query Processor.
Read Ahead Read: Number of anticipated pages that may required by the query processor.
LOB Logical Reads: Number of LOB pages read from the data cache.
LOB Physical Reads: Number of LOB pages placed on buffer cache requeste by the query processor.
LOB Read Ahead Read: Number of anticipated LOB pages that may required by the query processor.
This topic hasn`t finished yet, soon we will come up with another article to add more to this topic.
A beautiful quote from Steve Jobs:
"Your time is limited so don’t waste it living someone else’s life. Don’t be trapped by dogma, which is living with the results of other people’s thinking. Don’t let the noise of others’ opinions drown out your own inner voice. And most important, have the courage to follow your heart and intuition, they somehow already know what you truly want to become. Everything else is secondary."