Tags: Index

Quick and Dirty Index Info

Categories: News, Professional, Scripts, SSC
Tags: ,
Comments: No Comments
Published on: November 1, 2011

From time to time, somebody needs to find some information about the indexes in the database.  As DBA’s, there are a few tools at your disposal to look up that information.  The required/requested information is not always the same.  That said, you can usually adapt a query quickly to find what you need.

Where to Look

SQL 2005 and up have provided us with some very useful means to gain insight into the indexes in our databases.  The following DMOs is a small representation of that information.

  • sys.dm_db_missing_index_details
  • sys.dm_db_missing_index_group_stats
  • sys.dm_db_missing_index_groups
  • sys.dm_db_index_usage_stats
  • sys.dm_db_index_physical_stats

Add to that, there are some catalog views to help with finding information relevant to indexes in the databases.

  • sys.indexes
  • sys.index_columns

With just these few objects, there is a wealth of information at your fingertips.  I often find myself querying for information about indexes.  I also find myself being asked how to find this information on a relatively frequent interval.  Usually, I just end up writing out the query again (depends on mood and day).

Rewriting the query doesn’t take too much time when you know what you need to query.  That said, sometimes it is nice to have a base query ready to go.  From this query you can add/remove items as you see fit to get the information that you really need.

For instance, here is a query to get some of the fragmentation information.  I would probably just re-use this query over and over where applicable – only changing the parts that mattered to get the necessary result set.

SELECT OBJECT_NAME(ps.OBJECT_ID),I.name,index_level,index_type_desc
		,I.is_hypothetical,ps.page_count,ps.record_count,ps.avg_fragmentation_in_percent
		,ps.ghost_record_count
		,ps.compressed_page_count
	FROM sys.dm_db_index_physical_stats(DB_ID(),null,null,null,'detailed') ps
		INNER Join sys.indexes I
			ON ps.OBJECT_ID = I.OBJECT_ID
			And ps.index_id = I.index_id
	ORDER BY OBJECT_NAME(ps.OBJECT_ID),ps.index_id,ps.index_level

There are many types of queries we could write for the different relevant information pertaining to our indexes.  It is all a matter of knowing where to start.  I hope that knowing about these DMOs and catalog views, it will help shorten your efforts to retrieving this information.

Table Sizes

Comments: 2 Comments
Published on: May 3, 2010

In SQL Server there are two stored procedures that help one to determine the size of a table.  One is documented and more widely popular, the other is not.  These two stored procedures are sp_spaceused and sp_MStablespace.  Dukagjin Maloku wrote a blog article about the the two stored procedures in March of this year.  I don’t intend to rehash what he has covered.  What I intend is to look at the internals of the two stored procedures.  This will be the first of a multi-part series.

Why?

I saw a question asking how to find the table sizes for all of the tables in a database.  A few scripts were provided (one using sp_MStablespace and one using sp_spaceused).  I had been using the msTablespace version for quite some time and had never questioned it’s accuracy.  That changed when the question evolved into why do they (the two scripts) show the same results?  So, I decided I needed to figure out what is happening inside both procedures to see if I could determine a reason for the difference.  I will be comparing these on SQL Server 2008.

Comparison

sp_spaceused

This stored procedure has been mostly updated to use the new DMVs available since SQL 2005.  It currently relies heavily on sys.dm_db_partition_stats.  This procedure determines the size of the indexes by subtracting the number of pages from the number of used pages.  The idea here is that the number of Data Pages will be less than the total number of used pages.  This stored procedure does still rely on a soon to be deprecated object called sysfiles.  The replacement for that object would be sys.database_files.  The functionality would be somewhat different, in regards to how it is used in this procedure – the results are the same.  Despite that, this proc does not look to be on the replacement block any time soon.

sp_MStablespace

This stored procedure on the contrary does look like it will be deprecated, though I was unable to find information to back that up.  When delving into the procedure, one can quickly spot that it relies on sysindexes and sysobjects for much of its heavy lifting.  The sizing information will be coming from sys.partitions, sys.allocation_units and sysindexes.  This procedure uses sysindexes mostly for that sizing information and then calculates the data size based on clustered indexes and heaps.  Then to calculate the remaining index size, it subtracts that size from the total for all of the indexes on the specified table.  The math is a bit different and can be the reason for the difference.  This procedure seems to be in need of a decent sized overhaul if expected to not be deprecated.

Findings

After looking at the code and then running some tests with each of these procedures, I was able to corroborate the claim that they don’t return the same results.  With that in hand, I decided to take a stab at writing something different to help create a consistent return.  Thus I will be sharing some queries I have been working on for each of these.  The desired goal is that both would return the same results.  Based on the consistency of information between the DMVs and the necessary (and some bonus) elements for these queries, that may be a bit more difficult to do.  For the sp_MSTablespace query, the crux of the query will be the sys.dm_db_index_physical_stats function, while the sp_spaceused query may continue to rely on sys.dm_db_partition_stats or just sys.partitions.  Both queries may rely on a combination of these latter two DMVs, tune in to find out.

BLOB Index Columns

Tags: , , ,
Comments: 1 Comment
Published on: April 27, 2010

For the April T-SQL Tuesday, I blogged about a report to find information on the BLOBs in a database.  I have since seen a request to add to that script.  The addition would add some good information concerning the columns involved in a BLOB index.  This information is to find all of the columns that are involved in the index that includes a BLOB in the index.

Base Script

In that article I posted a script to help arrive at the final report.  There were a couple of things required for the setup.  I am including all of that information here in a single script.

CREATE TABLE #indstats (
        indstatsid INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
        ,database_id BIGINT
        ,index_id BIGINT
        ,IndexSizeMB DECIMAL(16,1)
        ,OBJECT_ID	BIGINT
);
 
INSERT INTO #indstats (database_id,index_id,OBJECT_ID,IndexSizeMB)
        SELECT database_id,index_id,OBJECT_ID
                ,CONVERT(DECIMAL(16,1)
                ,(SUM(ps.avg_record_size_in_bytes * ps.record_count) / (1024.0 * 1024))) AS IndexSizeMB
        FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
        GROUP BY database_id,OBJECT_ID,index_id;
 
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
	,TableName = OBJECT_NAME(p.OBJECT_ID)
        ,IndexName = i.name
        ,LOBUsedPages = a.used_pages
        ,LOBTotalPages = a.total_pages
        ,LOBDataSizeMB = a.used_pages * 8/1024
        , ps.IndexSizeMB
        , (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
        , (us.user_updates) AS UserUpdates
        , us.last_user_update AS LastUpdate
        , CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
                / CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL) AS RatioRequestsToUpdates
        ,a.type_desc AS AllocUnitType
        FROM sys.allocation_units a
                INNER Join sys.partitions p
                        ON p.partition_id = a.container_id
                        And a.type = 2                                  --LOB data is stored in pages of type Text/Image
                LEFT Outer Join sys.dm_db_index_usage_stats us
					ON us.OBJECT_ID = p.OBJECT_ID
                        And us.index_id = p.index_id
                        And us.database_id = DB_ID()
                LEFT Outer Join #indstats ps
                        ON us.index_id = ps.index_id
                        And us.database_id = ps.database_id
                        And us.OBJECT_ID = ps.OBJECT_ID
                LEFT Outer Join sys.indexes i
                        ON i.OBJECT_ID = p.OBJECT_ID
                        And i.index_id = p.index_id
        WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
                --And a.data_pages > 0
                --And filegroup_name(a.data_space_id) = 'Primary'
UNION
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
        ,TableName = OBJECT_NAME(p.OBJECT_ID)
        ,IndexName = i.name
        ,LOBUsedPages = a.used_pages
        ,LOBTotalPages = a.total_pages
        ,LOBDataSizeMB = a.used_pages * 8/1024
        , ps.IndexSizeMB
        , (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
        , (us.user_updates) AS UserUpdates
        , us.last_user_update AS LastUpdate
        , CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
                / CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL) AS RatioRequestsToUpdates
        ,a.type_desc AS AllocUnitType
        FROM sys.allocation_units a
                INNER Join sys.partitions p
                        ON p.hobt_id = a.container_id
                        And a.type = 3                                  --Overflow data is stored in pages of type Text/Image
                LEFT Outer Join sys.dm_db_index_usage_stats us
                        ON us.OBJECT_ID = p.OBJECT_ID
                        And us.index_id = p.index_id
                        And us.database_id = DB_ID()
                LEFT Outer Join #indstats ps
                        ON us.index_id = ps.index_id
                        And us.database_id = ps.database_id
						And us.OBJECT_ID = ps.OBJECT_ID
                LEFT Outer Join sys.indexes i
						ON i.OBJECT_ID = p.OBJECT_ID
						And i.index_id = p.index_id
        WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
                --And filegroup_name(a.data_space_id) = 'Primary'
                --And a.data_pages > 0
        ORDER BY TableName ASC,a.type_desc;
 
GO
 
DROP TABLE #indstats
GO

In this script, I made a slight alteration from the article I posted.  In that article, I somehow missed a change to the script I had been testing.  That change is in the Temp table that I created (to properly support the Join statements on each side of the Union Select statement).  I simply added the object_id.

The Change

CREATE TABLE #indstats (
        indstatsid INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
        ,database_id BIGINT
        ,index_id BIGINT
        ,IndexSizeMB DECIMAL(16,1)
        ,OBJECT_ID	BIGINT
);
 
INSERT INTO #indstats (database_id,index_id,OBJECT_ID,IndexSizeMB)
	SELECT database_id,index_id,OBJECT_ID
		,CONVERT(DECIMAL(16,1)
		,(SUM(ps.avg_record_size_in_bytes * ps.record_count) / (1024.0 * 1024))) AS IndexSizeMB
	FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
	GROUP BY database_id,OBJECT_ID,index_id;
;
WITH IndexStuff AS (
    SELECT
      icol.OBJECT_ID
      ,i.name
      ,icol.index_id
      ,STUFF(
		(
			SELECT ',' + c.name AS [TEXT()] --text should be lower case
				FROM sys.index_columns ic
					INNER Join sys.columns c
						ON ic.OBJECT_ID = c.OBJECT_ID
						AND ic.column_id = c.column_id
				WHERE ic.OBJECT_ID = icol.OBJECT_ID
					AND ic.index_id = icol.index_id
			ORDER BY ic.key_ordinal
			FOR XML PATH('')
	    ), 1, 1, '') AS ColList
	FROM sys.index_columns icol
		INNER Join sys.indexes i
			ON icol.OBJECT_ID = i.OBJECT_ID
			AND icol.index_id = i.index_id
    GROUP BY icol.OBJECT_ID, i.name, icol.index_id
)
 
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
		,TableName = OBJECT_NAME(p.OBJECT_ID)
		,IndexName = i.name
		,LOBUsedPages = a.used_pages
		,LOBTotalPages = a.total_pages
		,LOBDataSizeMB = a.used_pages * 8/1024
		, ps.IndexSizeMB
		, (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
		, (us.user_updates) AS UserUpdates
		, us.last_user_update AS LastUpdate
		, CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
			/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL)
				AS RatioRequestsToUpdates
		,a.type_desc AS AllocUnitType
		,s.collist	--Added for Column Output
	FROM sys.allocation_units a
		INNER Join sys.partitions p
			ON p.partition_id = a.container_id
			And a.type = 2
		LEFT Outer Join sys.dm_db_index_usage_stats us
			ON us.OBJECT_ID = p.OBJECT_ID
			And us.index_id = p.index_id
			And us.database_id = DB_ID()
		LEFT Outer Join #indstats ps
			ON us.index_id = ps.index_id
			And us.database_id = ps.database_id
			And us.OBJECT_ID = ps.OBJECT_ID
		LEFT Outer Join sys.indexes i
			ON i.OBJECT_ID = p.OBJECT_ID
			And i.index_id = p.index_id
		LEFT Outer Join IndexStuff s
			ON s.OBJECT_ID = i.OBJECT_ID
			And s.index_id = i.index_id
	WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
UNION
	SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
		,TableName = OBJECT_NAME(p.OBJECT_ID)
		,IndexName = i.name
		,LOBUsedPages = a.used_pages
		,LOBTotalPages = a.total_pages
		,LOBDataSizeMB = a.used_pages * 8/1024
		, ps.IndexSizeMB
		, (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
		, (us.user_updates) AS UserUpdates
		, us.last_user_update AS LastUpdate
		, CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
			/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL)
				AS RatioRequestsToUpdates
		,a.type_desc AS AllocUnitType
		,s.ColList	--Added for Column output
	FROM sys.allocation_units a
		INNER Join sys.partitions p
			ON p.hobt_id = a.container_id
			And a.type = 3
		LEFT Outer Join sys.dm_db_index_usage_stats us
			ON us.OBJECT_ID = p.OBJECT_ID
			And us.index_id = p.index_id
			And us.database_id = DB_ID()
		LEFT Outer Join #indstats ps
			ON us.index_id = ps.index_id
			And us.database_id = ps.database_id
			And us.OBJECT_ID = ps.OBJECT_ID
		LEFT Outer Join sys.indexes i
			ON i.OBJECT_ID = p.OBJECT_ID
			And i.index_id = p.index_id
		LEFT Outer Join IndexStuff s
			ON s.OBJECT_ID = i.OBJECT_ID
			And s.index_id = i.index_id
	WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
		--And filegroup_name(a.data_space_id) = 'Primary'
		--And a.data_pages > 0
	ORDER BY TableName ASC,a.type_desc;
 
GO
 
DROP TABLE #indstats
GO

The guts of the change to add in the columns for this script comes with the following segment of code.

WITH IndexStuff AS (
    SELECT
      icol.OBJECT_ID
      ,i.name
      ,icol.index_id
      ,STUFF(
		(
			SELECT ',' + c.name AS [TEXT()] --text should be lower case
				FROM sys.index_columns ic
					INNER Join sys.columns c
						ON ic.OBJECT_ID = c.OBJECT_ID
						AND ic.column_id = c.column_id
				WHERE ic.OBJECT_ID = icol.OBJECT_ID
					AND ic.index_id = icol.index_id
			ORDER BY ic.key_ordinal
			FOR XML PATH('')
	    ), 1, 1, '') AS ColList
	FROM sys.index_columns icol
		INNER Join sys.indexes i
			ON icol.OBJECT_ID = i.OBJECT_ID
			AND icol.index_id = i.index_id
    GROUP BY icol.OBJECT_ID, i.name, icol.index_id
)

With that snippet, I also needed to Join it to the select statements, and thus it was added on both sides of the Union statement as illustrated in the next example.

LEFT Outer Join IndexStuff s
			ON s.OBJECT_ID = i.OBJECT_ID
			And s.index_id = i.index_id

In the above CTE, you will note that I used the STUFF function along with a FOR XML Path statement.  The combination of these statements allows one to concatenate a list into a comma separated list as I have done with the ColList (column list) column.  I also want to note here that I am using the text() keyword along with the FOR XML Path.  There are other methods of returning information back to the screen when using FOR XML Path.  I chose to use the text() in this case because I am just returning a concatenated list of columns that really only should read as text.  If I were returning a SQL statement, I would choose a different method to make the text more readable.

Conclusion

I chose to make this subtle change via the CTE due to the ease of understanding and readability of the code for me.  By illustrating the columns involved in an index that is on a BLOB column, one can gain greater insight into the use of the database.  I am glad that this change was requested because it makes sense to me.

I hope you find it useful.

Edit: Made a correction to the first script

Index Stats Duplication

Tags: ,
Comments: 2 Comments
Published on: April 21, 2010

I came across a recent posting about seeing multiple entries in sys.dm_db_index_usage_stats for the same index.  This kind of behavior can be somewhat concerning since that view should have a unique entry for each index.  As described by Microsoft, this view

“Returns counts of different types of index operations and the time each type of operation was last performed.”

You can read more about it here.  Further in the same document, one will read

“When an index is used, a row is added to sys.dm_db_index_usage_stats if a row does not already exist for the index. When the row is added, its counters are initially set to zero.”

Thus, I started checking to see if I could reproduce the same sort of results.  While investigating, I determined that I should Join that view to the sys.indexes view.  Since the sys.dm_db_index_usage_stats view does not contain an Index Name, I felt it necessary to be able to see the name and not just a number.

My first run at the query to try and reproduce the results is displayed below.

SELECT DB_NAME(s.database_id) AS DBNAME, OBJECT_NAME(s.OBJECT_ID) AS ObjectName
	, i.name AS IndexName
	,user_seeks, user_scans,user_lookups,user_updates,system_seeks, system_scans
	FROM sys.dm_db_index_usage_stats s
		INNER Join sys.indexes i
			ON i.OBJECT_ID = s.OBJECT_ID
			And i.index_id = s.index_id
			And s.database_id = DB_ID()
	WHERE OBJECTPROPERTY(s.OBJECT_ID,'ISMSShipped') = 0
	GROUP BY s.database_id,s.OBJECT_ID,i.name,s.index_id, user_seeks, user_scans
		,user_lookups,user_updates,system_seeks, system_scans

Well this appears to produce results that are consistent with the description offered by Microsoft.  So I wanted to verify and added a windowing function into the mix.

SELECT DB_NAME(s.database_id) AS DBNAME, OBJECT_NAME(s.OBJECT_ID) AS ObjectName
	, i.name AS IndexName
	,user_seeks, user_scans,user_lookups,user_updates,system_seeks, system_scans
	,Row_number() OVER (partition BY s.OBJECT_ID,s.index_id ORDER BY s.index_id) AS RowNum
	FROM sys.dm_db_index_usage_stats s
		INNER Join sys.indexes i
			ON i.OBJECT_ID = s.OBJECT_ID
			And i.index_id = s.index_id
			And s.database_id = DB_ID()
	WHERE OBJECTPROPERTY(s.OBJECT_ID,'ISMSShipped') = 0
	GROUP BY s.database_id,s.OBJECT_ID,i.name,s.index_id, user_seeks, user_scans
		,user_lookups,user_updates,system_seeks, system_scans

You will notice the line with Row_Number().  This gives me a count of each time that IndexName appears in the system view.  Well, now having verified further, I still have results consistent with the documentation from Microsoft.  I decided to work backwards from this script (yeah I know).  I will reduce the number of fields I have returned and try to get a duplication on a base level and then re-expand the fields being gathered.  The next stab at this looked like the following script.

SELECT DB_NAME(s.database_id) AS DBNAME,s.database_id, OBJECT_NAME(s.OBJECT_ID) AS ObjectName, i.name AS IndexName
	,Row_number() OVER (partition BY s.OBJECT_ID,s.index_id ORDER BY s.index_id) AS RowNum
	FROM sys.dm_db_index_usage_stats s
		INNER Join sys.indexes i
			ON i.OBJECT_ID = s.OBJECT_ID
			And i.index_id = s.index_id
			And s.database_id = DB_ID()
	WHERE OBJECTPROPERTY(s.OBJECT_ID,'ISMSShipped') = 0

So for this revision, I simply removed any of the counter fields and the group by clause.  I was hoping that my Group By would reveal the answer.  Lo and behold, I am still unable to reproduce the results.  Ok, time to trim a little bit more as I work backward trying to create the same results.

SELECT OBJECT_NAME(s.OBJECT_ID) AS ObjectName, i.name AS IndexName
	,Row_number() OVER (partition BY s.OBJECT_ID,s.index_id ORDER BY s.index_id) AS RowNum
	FROM sys.dm_db_index_usage_stats s
		INNER Join sys.indexes i
			ON i.OBJECT_ID = s.OBJECT_ID
			And i.index_id = s.index_id
	WHERE OBJECTPROPERTY(s.OBJECT_ID,'ISMSShipped') = 0

This time, I removed a Join condition of s.database = db_id() and the database_id and DatabaseName.  When I run the query now, I start to see the same results.  So the immediate indicator to me is that there is a duplication of indexes but not within the same database.  The duplication appears across different databases.  To verify, I know need to expand my query just a bit.

SELECT DB_NAME(s.database_id) AS DBNAME,s.database_id, OBJECT_NAME(s.OBJECT_ID) AS ObjectName, i.name AS IndexName
	,Row_number() OVER (partition BY s.OBJECT_ID,s.index_id ORDER BY s.index_id) AS RowNum
	FROM sys.dm_db_index_usage_stats s
		INNER Join sys.indexes i
			ON i.OBJECT_ID = s.OBJECT_ID
			And i.index_id = s.index_id
	WHERE OBJECTPROPERTY(s.OBJECT_ID,'ISMSShipped') = 0

Here, I was able to verify that the places where RowNum did not equal one – I had an index in a different database by the same name.  Not only is the index name the same, but the Object holding that index is also the same in the other database.  Now, if I want, I can expand my query all the way back to the original query knowing that I verified that the entries in sys.dm_db_index_usage_stats are unique for each index.  The conclusion I would draw from this exercise is that a query can have an impact on the results displayed.  One should check and then double check to make sure the results jive.  Try the query a few different ways and see if the results are consistent.  If not, what factor in the query was changed that changed your results?

Or is it?

While exploring this further I decided to verify some of the objects in different databases.  For instance, I might see the object being reported in the ReportServer database, and then also in msdb and then again in another database.  But the object only exists in ReportServer.  Why is this happening?  My thoughts on this are currently just an educated guess.  It appears that a connection is established in one database and then makes a call to an object in a second database and causing a use against the actual index in the call to the second database.  The record gets inserted into the metadata tables that this view pulls from with each database that was touched – but each time with the object_id and index_id of the second database.  Does that make sense?

Index Info

Categories: News, Professional, Scripts
Tags: , ,
Comments: 3 Comments
Published on: March 31, 2010

I recently came across an interesting script to help show index info and usage.  As I began to dissect the script, I noticed that the script was intended for SQL 2005 and beyond but was using SQL 2000 compatibility views.  I decided I would take it upon myself to update the script to use the 2005 objects and not the compatibility objects.

The design of the script is to give a basic view of the usage of the index and then try to evaluate the cost of that index.  If the index updates far outweigh the queries that use it, then the index may not be a useful index.  Furthermore, the index cost probably outweighs its use.

The original query employed the use of dbo.sysobjects and dbo.sysindexes.  The reason for the use of dbo.sysindexes was merely to calculate the size of the index.  The use of dbo.sysobjects seemed like an oversight.  The area that I ran into a sticky point was with calculating the size of the index.  The problem was in figuring out a method to execute the entire query and have it perform as well as the original query.  The first attempt yielded a query that was absolutely abysmal.

WITH EqColumns AS (
SELECT OBJECT_ID,index_id,COUNT(index_id) AS IndCount
	FROM sys.index_columns
	WHERE is_included_column = 0
	GROUP BY OBJECT_ID,Index_id
), IncludeColumns AS (
SELECT OBJECT_ID,index_id,COUNT(index_id) AS IndCount
	FROM sys.index_columns
	WHERE is_included_column = 1
	GROUP BY OBJECT_ID,Index_id
)
 
SELECT OBJECT_NAME(a.OBJECT_ID) AS TableName
	, c.name AS IndexName
	, c.type_desc IndexType
	, CASE c.is_unique
		WHEN 1 THEN
			CASE is_primary_key
				WHEN 1 THEN 'Primary Key'
				ELSE 'Unique'
			END
		ELSE
			CASE c.is_unique_cOnstraint
				WHEN 1 THEN 'Unique Constraint'
				ELSE 'Performance'
			END
		END 'IndexUsage'
	, e.name 'FileGroup'
	, (	SELECT CONVERT(DECIMAL(16,1),(SUM(ps.avg_record_size_in_bytes * ps.record_count) / (1024.0 * 1024)))
	FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
	WHERE ps.database_id = a.database_id and ps.index_id = a.index_id)  AS IndexSizeMB
	, isnull(ec.IndCount,0) AS EqualityColumns
	, isnull(ic.IndCount,0) AS IncludedColumns
	, (a.user_seeks + a.user_scans + a.user_lookups) AS UserRequests
	, (a.user_updates) AS UserUpdates
	, a.last_user_update AS LastUpdate
	, CAST(a.user_seeks + a.user_scans + a.user_lookups AS REAL) / CAST(CASE a.user_updates WHEN 0 THEN 1 ELSE a.user_updates END AS REAL) AS RatioRequestsToUpdates
	FROM sys.dm_db_index_usage_stats a
		--Inner Join #indstats ps
		--	On a.index_id = ps.index_id
		--	And a.database_id = ps.database_id
		INNER Join sys.objects AS o
			ON (a.OBJECT_ID = o.OBJECT_ID)
		INNER Join sys.indexes AS c
			ON (a.OBJECT_ID = c.OBJECT_ID and a.index_id = c.index_id)
		INNER Join sys.filegroups e
			ON c.data_space_id = e.data_space_id
		LEFT Outer Join EqColumns ec
			ON a.OBJECT_ID = ec.OBJECT_ID
			And a.index_id = ec.index_id
		LEFT Outer Join IncludeColumns ic
			ON a.OBJECT_ID = ic.OBJECT_ID
			And a.index_id = ic.index_id
	WHERE o.type = 'U'							-- exclude system tables
		And c.type <> 0							-- exclude HEAPs
		And c.is_disabled = 0					-- Only active indexes
		And a.database_id = DB_ID()				-- for current database Only
	ORDER BY RatioRequestsToUpdates DESC

The method I am using to calculate the size whilst using the SQL 2005 objects was to use the function sys.dm_db_index_physical_stats.  I employed this directly as a subquery in the above posted query.  This version takes in excess of 7 minutes to return results.  I moved on from it to find a better method.

The next attempt was to move the subquery into a CTE.

WITH indexstats AS (
SELECT database_id,index_id,CONVERT(DECIMAL(16,1),(SUM(ps.avg_record_size_in_bytes * ps.record_count) / (1024.0 * 1024))) AS IndexSizeMB
	FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
GROUP BY database_id,index_id
), EqColumns AS (
SELECT OBJECT_ID,index_id,COUNT(index_id) AS IndCount
	FROM sys.index_columns
	WHERE is_included_column = 0
	GROUP BY OBJECT_ID,Index_id
), IncludeColumns AS (
SELECT OBJECT_ID,index_id,COUNT(index_id) AS IndCount
	FROM sys.index_columns
	WHERE is_included_column = 1
	GROUP BY OBJECT_ID,Index_id
)
 
SELECT OBJECT_NAME(a.OBJECT_ID) AS TableName
	, c.name AS IndexName
	, c.type_desc IndexType
	, CASE c.is_unique
		WHEN 1 THEN
			CASE is_primary_key
				WHEN 1 THEN 'Primary Key'
				ELSE 'Unique'
			END
		ELSE
			CASE c.is_unique_cOnstraint
				WHEN 1 THEN 'Unique Constraint'
				ELSE 'Performance'
			END
		END 'IndexUsage'
	, e.name 'FileGroup'
	, ps.IndexSizeMB
	, isnull(ec.IndCount,0) AS EqualityColumns
	, isnull(ic.IndCount,0) AS IncludedColumns
	, (a.user_seeks + a.user_scans + a.user_lookups) AS UserRequests
	, (a.user_updates) AS UserUpdates
	, a.last_user_update AS LastUpdate
	, CAST(a.user_seeks + a.user_scans + a.user_lookups AS REAL) / CAST(CASE a.user_updates WHEN 0 THEN 1 ELSE a.user_updates END AS REAL) AS RatioRequestsToUpdates
	FROM sys.dm_db_index_usage_stats a
		INNER Join indexstats ps
			ON a.index_id = ps.index_id
			And a.database_id = ps.database_id
		INNER Join sys.objects AS o
			ON (a.OBJECT_ID = o.OBJECT_ID)
		INNER Join sys.indexes AS c
			ON (a.OBJECT_ID = c.OBJECT_ID and a.index_id = c.index_id)
		INNER Join sys.filegroups e
			ON c.data_space_id = e.data_space_id
		LEFT Outer Join EqColumns ec
			ON a.OBJECT_ID = ec.OBJECT_ID
			And a.index_id = ec.index_id
		LEFT Outer Join IncludeColumns ic
			ON a.OBJECT_ID = ic.OBJECT_ID
			And a.index_id = ic.index_id
	WHERE o.type = 'U'							-- exclude system tables
		And c.type <> 0							-- exclude HEAPs
		And c.is_disabled = 0					-- Only active indexes
		And a.database_id = DB_ID()				-- for current database Only
	ORDER BY RatioRequestsToUpdates DESC

This version returns in about thirty seconds.  Still not acceptable but substantially better.  The original query returned in roughly three seconds.  It is also important to note that the accuracy of both queries is equal.  Since both return the same result set, the only focus is on performance.  The delay in this query completely revolves around the use of the function.  Armed with the knowledge that the real delay is in the use of that function, there needs to be alternative method to pull back that information.  I decided to use a temp table to stage the data.  Using this temp table I could test directly the efficiency of the query.

CREATE TABLE #indstats (indstatsid INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,database_id BIGINT,index_id BIGINT,IndexSizeMB DECIMAL(16,1))
 
INSERT INTO #indstats (database_id,index_id,IndexSizeMB)
	SELECT database_id,index_id,CONVERT(DECIMAL(16,1),(SUM(ps.avg_record_size_in_bytes * ps.record_count) / (1024.0 * 1024))) AS IndexSizeMB
	FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
	GROUP BY database_id,index_id
;
WITH EqColumns AS (
SELECT OBJECT_ID,index_id,COUNT(index_id) AS IndCount
	FROM sys.index_columns
	WHERE is_included_column = 0
	GROUP BY OBJECT_ID,Index_id
), IncludeColumns AS (
SELECT OBJECT_ID,index_id,COUNT(index_id) AS IndCount
	FROM sys.index_columns
	WHERE is_included_column = 1
	GROUP BY OBJECT_ID,Index_id
)
 
SELECT OBJECT_NAME(a.OBJECT_ID) AS TableName
	, c.name AS IndexName
	, c.type_desc IndexType
	, CASE c.is_unique
		WHEN 1 THEN
			CASE is_primary_key
				WHEN 1 THEN 'Primary Key'
				ELSE 'Unique'
			END
		ELSE
			CASE c.is_unique_cOnstraint
				WHEN 1 THEN 'Unique Constraint'
				ELSE 'Performance'
			END
		END 'IndexUsage'
	, e.name 'FileGroup'
	, ps.IndexSizeMB
	, isnull(ec.IndCount,0) AS EqualityColumns
	, isnull(ic.IndCount,0) AS IncludedColumns
	, (a.user_seeks + a.user_scans + a.user_lookups) AS UserRequests
	, (a.user_updates) AS UserUpdates
	, a.last_user_update AS LastUpdate
	, CAST(a.user_seeks + a.user_scans + a.user_lookups AS REAL) / CAST(CASE a.user_updates WHEN 0 THEN 1 ELSE a.user_updates END AS REAL) AS RatioRequestsToUpdates
	FROM sys.dm_db_index_usage_stats a
		INNER Join #indstats ps
			ON a.index_id = ps.index_id
			And a.database_id = ps.database_id
		INNER Join sys.objects AS o
			ON (a.OBJECT_ID = o.OBJECT_ID)
		INNER Join sys.indexes AS c
			ON (a.OBJECT_ID = c.OBJECT_ID and a.index_id = c.index_id)
		INNER Join sys.filegroups e
			ON c.data_space_id = e.data_space_id
		LEFT Outer Join EqColumns ec
			ON a.OBJECT_ID = ec.OBJECT_ID
			And a.index_id = ec.index_id
		LEFT Outer Join IncludeColumns ic
			ON a.OBJECT_ID = ic.OBJECT_ID
			And a.index_id = ic.index_id
	WHERE o.type = 'U'							-- exclude system tables
		And c.type <> 0							-- exclude HEAPs
		And c.is_disabled = 0					-- Only active indexes
		And a.database_id = DB_ID()				-- for current database Only
	ORDER BY RatioRequestsToUpdates DESC
 
DROP TABLE #indstats

The population of the temp table consumes 97% of the query cost.  The prior version of the query also showed a cost of 97% associated directly with the function.  If I populate the temp table prior to the execution of the remainder of the script, then I can compare performance of this script versus the original.  When comparing, this version runs in about 1 second.

The next steps from here would be to employ a method that populates a table in an “Admin” database.  By using a pre-populated table I can maintain good performance to gain the information concerning Index Usage.

Index your FK

Categories: News, Professional, Scripts
Comments: No Comments
Published on: March 4, 2010

A few months ago I read an article from SQLServerCentral.com about some Foreign Key gotchas.  Since that article, I have seen several people asking about whether a foreign key should be indexed or not.  I have seen the benefits of having FKs indexed as well as the agony of not indexing them.  The article discusses some of the benefits.  In summary, it will help to avoid deadlocks, reduce locks, and improve performance.  My intent is not to go into depth on the article, and to focus on a couple of scripts to help identify which tables need indexes on the foreign keys.  This article is a part of my Key documentation series.

If you browse the internet, you will likely be able to find several scripts to accomplish the same goal.  There is one script that I have seen that will generate the script for you to create the index on the foreign key.  Many scripts are similar – some perform slightly different things.  I thought about evaluating some of the scripts and comparing them.  I have found some to be less accurate than others.  I decided after some thought that I would just share my scripts.  I will show the evolution of the script that I now use.

Rev 1

This was just to hammer out a solution that would get me the results I sought.  This version uses some objects that are to be deprecated.  Thus should only really be used on Servers that are SQL 2000.

SELECT o.name AS TableName,c.name AS FKColumn FROM sys.sysreferences r
	INNER Join sys.syscolumns c
		ON r.fkeyid = c.id
		and r.fkey1 = c.colid
	INNER Join sys.sysobjects o
		ON c.id = o.id
	LEFT Outer Join sys.sysindexkeys k
		ON c.id = k.id
		And c.colid = k.colid
	WHERE k.id IS null
	ORDER BY o.name

This works fine and with limitations.  Obviously there was the limitation of using objects that are scheduled to be deprecated.  The second limitation is that it is not 100% accurate.  This script does not pull all of the Foreign Keys that are missing Indexes.  This is the same sort of issue that I found with several scripts on the internet.  That said, this query is rather efficient.  When compared to future versions, it was 8% less costly and about 150ms faster.  However, it does consume more disk IO and more CPU time.

With these limitations in mind, I decided to create a second query.

Rev 2

SELECT
	OBJECT_NAME(FK.parent_object_id) AS FKTable
	,OBJECT_NAME(FK.constraint_object_id) AS FKName
	,C.name FKColumn
FROM sys.foreign_key_columns FK
	INNER JOIN sys.columns C
		ON FK.parent_object_id = C.OBJECT_ID
		AND FK.Parent_column_id = C.column_id
	INNER Join sys.objects O
		ON FK.parent_object_id = o.OBJECT_ID
	LEFT OUTER JOIN sys.index_columns ic
		ON FK.parent_object_id = ic.OBJECT_ID
		AND FK.Parent_column_id = ic.column_id
WHERE ic.OBJECT_ID IS null

This query is less limited than the first.  Here, I have improved the performance substantially over the first query and the execution plan cost is lower.  I have implemented the use of SQL 2005 objects, decreased Disk IO, and decreased run time.  CPU time remains about the same as the previous example.  Still one limitation that is pretty glaring.  I am not returning all Foreign Keys that are missing an index.

Moving onto attempt number three.

Rev 3

As I was testing version three of this query, I noticed there was another item that I needed.  I ran the tests in a database with multiple schemas and had difficulty locating the tables that were in the list.  When looking at the expanded view in SSMS, the tables are grouped by schema and listed alphabetically within that schema.  When I realized where the table was, it dawned on me the need to include the schema in the query results.  Adding this little piece of information will save a few seconds when trying to verify the information in the report.

SELECT
	schema_name(o.schema_id) + '.' + OBJECT_NAME(FK.parent_object_id) AS FKtable
	,OBJECT_NAME(FK.constraint_object_id) AS FKName
	,C.name FKColumn
FROM sys.foreign_key_columns FK
	INNER JOIN sys.columns C
		ON FK.parent_object_id = C.OBJECT_ID
		AND FK.Parent_column_id = C.column_id
	INNER Join sys.objects O
		ON FK.parent_object_id = o.OBJECT_ID
	--Inner Join sys.schemas s
		-- On s.schema_id = o.schema_id
	LEFT OUTER JOIN sys.index_columns ic
		ON FK.parent_object_id = ic.OBJECT_ID
		AND FK.Parent_column_id = ic.column_id
		And ic.index_column_id = FK.constraint_column_id
WHERE ic.OBJECT_ID IS null
ORDER BY FKtable

In the code, note that I have opted to use the schema_name() function rather than use the Join on the sys.schemas view.  By using the function, the script performs a little better.  There is another addition I made to the script.

LEFT OUTER JOIN sys.index_columns ic
		ON FK.parent_object_id = ic.OBJECT_ID
		AND FK.Parent_column_id = ic.column_id
		And ic.index_column_id = FK.constraint_column_id

An additional condition was added to the Join.  With the addition of this condition, I am now able to pull back the additional foreign keys that are missing indexes.  In addition to this additional condition, I have added an Order By, which will slow it down a bit.  The offset to the decrease there is that the data is more manageable for verification.

Conclusion

There are many methods to arrive at this data.  The difference in performance between the methods I wrote and evolved was negligible.  The most important concept in the end is having a solution that is accurate and timely to provide this data.  All of these solutions finish in under a second for me.  None of them place a significant impact on the server, and thus I can choose for accuracy over performance in this instance.

This exercise is a part of getting to know your database.  Finding and documenting is very helpful when it comes time to troubleshoot.  Once this information is garnered, one needs to evaluate whether or not to place an index on that foreign key.

SQLFool IDX Defrag Scripts

Categories: News, Professional, Scripts
Comments: No Comments
Published on: January 26, 2010

I just found out that Michelle Ufford (@sqlfool) has updated her Index Defrag scripts.  Right now she is looking for Beta testers.  Anybody that is interested could find more info at:

http://sqlfool.com/2010/01/index-defrag-script-updates-beta-testers-needed/

Indexes Out the Window II

Categories: News, Professional
Comments: 2 Comments
Published on: January 25, 2010

In Part I, I discussed some of the peculiarities and troubleshooting done in relation to a peculiar execution plan.  In this post, I will discuss some of what I did to further tune the query.

First, a little more background into the design of this query.  This query uses some dynamic SQL to try and build a dynamic where clause based on the value passed to a single parameter.  I need to be able to create three different where clauses that could essentially create a range-scan on a date field.  In two of the three queries could be an inequality date comparison rather than an equality comparison.  The specific requirements are to have the report return 1 of the following: 1) 2010 data only, 2) < 2010 data, and 3) all data.  This was initially accomplished with a query similar to:

Case when convert(varchar(4), @RecordDate, 112) = '2010' then ' convert(varchar(4), RecordDate, 112) = ''2010'''
	when convert(varchar(4), @RecordDate, 112) < '2010' then ' isnull(RecordDate,''1/1/2000'') < ''1/1/2010'''
	else ' isnull(RecordDate,''1/1/2001'') > ''1/1/2000'''
End

One of the first comments might be, “That query is not SARG-able.”  True it is not a SARG query.  That will be the last item to be evaluated.  The slowness of the query in this case has less to do with the SARG-ability of it than it does with the inequality scan.  The puzzle in the query was “How to make the query an equality comparison” rather than an < or > comparison.  The above query uses the indexes as discussed in the prior post – it is just not as fast as it could be.

It is at this point that I am reminded of an integral part of my last two presentations at my local PASS meetings.  And it dawns on me that an understanding of the data is truly necessary.  Most of the RecordDate values are Null since there was no backfill on the Column after it’s implementation.  Prior to it’s implementation there was no Date control on the table.  The table is in a warehouse and only contains data since the beginning of 2009.  It is also necessary to understand that there is a minimal chance that there will be record additions to the table after 2010.  Thus we now have a method to be able to get the equality comparisons.  The following might satisfy this need:

Next Rev   
Case when convert(varchar(4), @RecordDate, 112) = '2010' then ' convert(varchar(4), RecordDate, 112) = ''2010'''
	when convert(varchar(4), @RecordDate, 112) < '2010' then ' dateadd(yy, datediff(yy, 0, isnull(RecordDate,''1/1/2009'')), 0) = ''1/1/2009'' '
	else ' isnull(RecordDate,''1/1/2009'') = isnull(RecordDate,''1/1/2009'') '
End

This change is now using a Clustered Index seek.  This clustered Index seek is the most costly piece of the Execution plan.  However, the overall runtime of the query has improved by 1 sec.

Thinking I could still improve the query, I decided that a new column called ReportYear would help out the cause.  The change would allow for the query to be SARG-able, the column would have a default on it to change RecordDate to the first day of the year recorded in the RecordDate field.  I figured this idea would also still permit me to do an Eqaulity search in my Where clause.  Let’s see the results:

Wow, I have just come full circle back to the First Post in this series.  I just lost the clustered Index seek, but I did not lose any of the performance.  To see if I gained anything by making the query SARG-able, let’s look at the IO stats between the two.

First IO Stats

Second IO Stats

Top image represents the First Query and the bottom image represents the Second Query.  We have an improvement of about 600,000 logical reads.  That is a substantial improvement.  That improvement comes even with an Index Scan on an Index that is non-covering.  Now, to tune the index.  Much the same as the first article, I added an index to the SalesOpportunity table on LastUpdate and SalesOppID.  This creates a covering index and removes the hash match join in favor of the merge join in the execution plan.  Another effect is the IO Stats as displayed below.

Second Query Index

We can see here that there was once again an improvement made in the logical reads.  Another ~10,000 logical reads improvement.  Overall, both queries run in the same amount of time – but the second query uses less resources.  Again, if I change the clustered index on this table – I am certain I would see an improvement.

I am still in search of the reason behind the use of a bad index for this query.

Execution Plans can be downloaded here.

Indexes Out the Window

Categories: News, Professional
Comments: 1 Comment
Published on: January 22, 2010

I ran into a peculiarly puzzling situation with a query I have been tuning of late.  The query runs some aggregations and performs a range scan based on a date value passed to the query.  The primary target for this query is to be consumed by reporting services.  The query executes (from a stored proc) three times in a single report.  This is another case were the report currently outperforms the previous version by several fold.  I just want it to run a little faster – it is an executive report afterall.

While tuning the query itself, I removed some of the aggregations (not consumed by the report and not needed for accuracy), and that bought me a small performance gain.  Next was to remove a couple of “or” operations.  Again, that only bought me a small performance gain.

The next step was to check my indexing.  Looking at the maintenance plan for the query, indexes were being used – or so it seemed.  Based on the execution plan, the query was performing two Index Scans prior to a Hash Match.  In the first Index Scan, the query was expecting an Output of two columns (SalesOppID and LastUpdate) from IDX_SalesOpportunity_SalesStatusIDSalesContactID.  This is depicted in the next two images.

This makes a little sense since the index does contain the LastUpdate column.  But the index is not covering, and we do not have a Key lookup.

CREATE NONCLUSTERED INDEX [IDX_SalesOpportunity_SalesStatusIDSalesContactID] ON [dbo].[SalesOpportunity]
(
	[SalesStatusID] ASC,
	[SalesContactID] ASC,
	[ProspectID] ASC,
	[CampaignID] ASC,
	[LastUpdate] ASC
) ON [PRIMARY]

Out of curiosity, I decided to see what would happen if I commented out the lines that required the LastUpdate field.  I ended up with the following:

Interesting, the query optimizer decided that we should use IDX_LeadTypeID which has the following definition:

CREATE NONCLUSTERED INDEX [IDX_LeadTypeID] ON [dbo].[SalesOpportunity]
(
	[LeadTypeID] ASC
) ON [PRIMARY]

Once again, I have a query that is using a bad index.  There is no key lookup and the index is not covering.  I find this very peculiar.  Thus I proceeded to test a little more.  I dropped that Index and reran the query.  With each group of tests, it is important to note, I cleared the ProcCache.  After dropping the index, my execution plan showed no performance increase and still showed a funny result.  The query moved onto a new index that was not covering and there was no key lookup.

Now, let’s make this a little more interesting.  I do have a Clustered Index on SalesOppID – defined as follows:

ALTER TABLE [dbo].[SalesOpportunity] ADD  CONSTRAINT [PK_SalesOpportunity] PRIMARY KEY CLUSTERED
(
	[SalesOppID] ASC
)WITH (
	PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF
	, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90
) ON [PRIMARY]

I am puzzled as to why my query will not use the Clustered Index.  I decided to do the standard stuff – DBCC CheckDB, DBCC UpdateUsage etc.  There is no corruption, and the query still tries to perform the same way.  I could recreate the clustered index on the LastUpdate field.  Since there are Foreign Key relationships that I do not want to drop and recreate, I will just create an NC index on the appropriate two fields.  After doing that, I get the following:

The index is covering and pulling the correct columns.  With this index the query is about 3 times faster than previously.  Some of that is due to the change from a Hash Match to a Merge Join from the two tables (there is still a hash match but it is after the merge and is now less costly).

The moral is, even though an index may be used by the query – it may not be a good index choice.  I am still working on tuning this query, I think I can squeeze better performance out of it.

Execution Plans can be downloaded here.

page 1 of 1
Calendar
February 2012
M T W T F S S
« Jan    
 12345
6789101112
13141516171819
20212223242526
272829  
Follow me on Google+
Jason Brimhall

In 246 people's circles

Add to circlesi
Content
Categories

Categories

Now Reading

Now Reading

Planned books:

Current books:

  • ChiRunning: A Revolutionary Approach to Effortless, Injury-Free Running

    ChiRunning: A Revolutionary Approach to Effortless, Injury-Free Running by Danny Dreyer, Katherine Dreyer

  • Advanced Marathoning – 2nd Edition

    Advanced Marathoning – 2nd Edition by Peter Pfitzinger, Scott Douglas

  • SQL Server MVP Deep Dives

    SQL Server MVP Deep Dives by Nielsen Paul, Delaney Kalen, Machanic Adam, Tripp Kimberly, Randal Paul, Low Greg

  • A World Without Heroes (Beyonders)

    A World Without Heroes (Beyonders) by Brandon Mull

Recent books:

View full Library

SQLHelp

SQLHelp


Welcome , today is Saturday, February 4, 2012