Archives: May 2010

May UG Meeting

Categories: News, Professional
Tags:
Comments: No Comments
Published on: May 11, 2010

This is just a quick note to remind any interested parties that the May S3OLV meeting will be held this thursday (May 13, 2010).  We have a couple of presentations lined up to teach us about signing your digital certificates and also to learn about some of the DMVs in 2005 and 2008.

The meeting is held at the Learning Center (777 Rainbow) and is from 18H00 to 20H30.

We hope to see you there.

Marathon Season

Categories: Running
Tags:
Comments: No Comments
Published on: May 7, 2010

Last week I announced my desired marathon schedule.  This week the next phase is coming together nicely.  In order to be able to run the St. George marathon, you need to be selected in the lottery.

The lottery results were announced today.  I will be running the St. George marathon.  This is a nice relief and a good reinforcement to make sure training goes as needed.

SQL 2005 TableSpace

Comments: 4 Comments
Published on: May 5, 2010

In the first post of this series I highlighted and described two stored procedures that are shipped from Microsoft.  These stored procedures use different methods to display information about a table in a SQL Server database.  In this post, I want to delve a little deeper into the sp_MStablespace stored procedure.  In doing this, I plan to discover what causes the reporting difference between it and sp_spaceused.  I will also unveil a script that can be used in a Set-based fashion to return this information, unlike the RBAR method required forsp_MStablespace.  Like I stated in the first post, this analysis will be based on SQL 2008.

Problem

When one uses sp_MStablespace, one may be assuming that it will return the same information that sp_spaceused would return.  While this procedure is undocumented and less widely used, the design is that it will give table and index information for the table one provides as an input into the procedure.  In order to gain this information for more than one table (using this procedure), one must implement a looping mechanism.

Looping

I have been using a script that was passed along to me many years ago that helps to find the table information for all of the tables in a database.  I have seen a similar script passed around on the internet in several locations.  This script uses another Microsoft procedure to help loop through the tables and provide this sought-after information.  The name of that procedure is sp_msforeachtable.  Despite the looping mechanism employed, the script is pretty quick and dependable.

-- Create the temp table for further querying
CREATE TABLE #temp(
	rec_id		INT IDENTITY (1, 1),
	table_name	VARCHAR(128),
	nbr_of_rows	INT,
	data_space	DECIMAL(15,2),
	index_space	DECIMAL(15,2),
	total_size	DECIMAL(15,2),
	percent_of_db	DECIMAL(15,12),
	db_size		DECIMAL(15,2))
 
-- Get all tables, names, and sizes
EXEC sp_msforeachtable @command1="insert into #temp(nbr_of_rows, data_space, index_space) exec sp_mstablespace '?'",
			@command2="update #temp set table_name = '?' where rec_id = (select max(rec_id) from #temp)"
 
-- Set the total_size and total database size fields
UPDATE #temp
SET total_size = (data_space + index_space)/1024, db_size = (SELECT SUM(data_space + index_space)/1024 FROM #temp)
 
-- Set the percent of the total database size
UPDATE #temp
SET percent_of_db = (total_size/db_size) * 100
 
-- Get the data
SELECT *
FROM #temp
--ORDER BY nbr_of_rows DESC
ORDER BY total_size DESC
 
-- Comment out the following line if you want to do further querying
DROP TABLE #temp

The script is pretty straight forward in its design and goals.  This looping can be resolved with a set based solution that works faster in most of my test cases.

Inconsistencies

The second problem noted was that the data returned from the sp_MStablespace proc does not always match the data returned by sp_spaceused.  This is a substantially different problem to tackle.  I poured over the two procedures for hours trying to find what the problem was that was causing the difference.  My findings are somewhat surprising – they are both accurate and correct.  How can they both be correct when both return different values?  I hope to also explain that between this post and the subsequent post that will discuss sp_spaceused.

Discovery

In the first article, I mentioned some of the differences and subtleties of this stored procedure.  I also made an assumption that this procedure will be deprecated or at least massively changed.  As I discuss the internals of this procedure, I hope that assumption will be validated.

The objects employed in sp_MStablespace are sysobjects, sysindexes, sys.indexes, sys.partitions, and sys.allocation_units.  This is actually not too bad of a start for rewriting this procedure due to the use of sys.indexes, sys.partitions, and sys.allocation_units.  Let’s take a look at the use of the old and soon to be deprecated objects: sysobjects and sysindexes.

sysobjects

The role that this object plays is actually quite minor.  This object should easily be replaced by sys.objects.  The sysobjects view is being used to set the @id variable to the object id that is found in the sysobjects view.  If the tablename that was passed into the proc was not found in sysobjects, then the procedure should raise an error.

sysindexes

The real heavy lifting of this procedure is being doing through the sysindexes view.  From sysindexes, we are trying to determine the index size and the table size.  The calculation is based on dpages and used.  By summing those two fields, we will determine the size of the data in the table.  Through the used value, we will determine the size of the index by subtracting the data size from the value we get from the used value.  There are some additional qualifiers on those formulas to limit which index types will be calculated in which size calculation (i.e. datasize would be blobs, heaps and clustered indexes only).  A further calculation is used in this procedure via the use of the spt_values table.  The table is being used as a lookup to assign a pagesize value based on the value returned for the low attribute of “WINDOWS/NT.”  One could have chosen to hard code this since it is just calculating the page size and we know that a page size is ~8k.

Caveat

Now that I have explained what is happening with those old deprecated objects, the caveat is that the procedure has a bit of logic in there to check for the version of SQL Server.  Depending on the version, you could end up running the calculations based off the new objects.  However, that does not eliminate the dependency on the compatibility objects.  The same kind of logic is used to calculate the size of the data, but the indexes are still calculated via the sysindexes view.  I find myself asking “Why Bother?”  Take a look for yourself.  I don’t see this procedure being released to a SQL 2000 environment, but it was written for backward compatibility.  That seems like a waste of resources and development time to me.  Either leave it as it was or rewrite the whole thing – that seems to make more sense to me than to make this procedure backwards compatible with the release of SQL 2005.  The only thing I could imagine is that it was done with the intent of being able to be used in a Central Management server type of configuration.

Revision

I was looking to write a query that would replace the query I posted at the beginning of this article.  As I delved into this, I knew I wanted something that would return consistent results with sp_spaceused.  I wanted to have a reliable query that was quick and used the 2005 objects and DMVs.  What I ended up with was something a bit bigger than the original query.

In SQL 2005, one can find a lot of information through the use of sys.dm_db_index_physical_stats.  This function is quite handy in many situations where sysindexes was used previously.  This function can be used to help determine index size as well as fragmentation levels as a couple of examples.  For this new script, I needed to return all of the same data as the old script.  I also wanted to include some additional information that I felt was beneficial to having a good insight into the database.  So let’s break this script up again like I have shown in the past.  I am reusing some code that I have demonstrated a few times for other scripts.

Part I

/* Part I */
--Drop Table #indstats
IF exists (SELECT * FROM tempdb.sys.objects WHERE name like '%#indstats%')
BEGIN
	DROP TABLE tempdb.dbo.#indstats
END
 
BEGIN
CREATE TABLE #indstats (
         IndStatsID INT PRIMARY KEY CLUSTERED
        ,database_id BIGINT
        ,index_id BIGINT
        ,IndexSizeMB DECIMAL(16,1)
        ,OBJECT_ID	BIGINT
);
END
 
INSERT INTO #indstats (IndStatsID,database_id,index_id,OBJECT_ID,IndexSizeMB)
        SELECT Row_Number() OVER (ORDER BY OBJECT_ID) AS IndStatsID
				,database_id,index_id,OBJECT_ID
				,CONVERT(DECIMAL(19,2),(SUM(ps.page_count))) * 8 /1024 AS IndexSizeMB					--for accuracy
			FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
			GROUP BY database_id,OBJECT_ID,index_id;

I have one major departure here from how I populated this temp table in the past.  I have opted to use the windowing function Row_Number rather than an Identity value.  I made this change due to the fewer resources required to run it.  Using an Identity could require more disk IO than using the Row_Number.  I want this to run quickly and not hamper system responsiveness or system resources.  In observed testing, this change helps improve overall performance by 50%.  I tested this against SQL 2008 and SQL 2005 running different patches and on different hardware.  The overall improvement in performance was consistent.

Part II

/* Part II */
DECLARE @dbsize DECIMAL(19,2)
 
SET NOCOUNT ON
 
/*
**  Summary data.
*/
BEGIN
	SELECT @dbsize = SUM(CONVERT(DECIMAL(19,2),CASE WHEN type = 0 THEN SIZE ELSE 0 END)) * 8/1024
		FROM sys.database_files
 
END

This bit is quite simply just so I could provide the total data file size (excluding log file).  Spoiler alert:  This piece of code is also used in my new script for sp_spaceused.

Part III

/* Part III */
BEGIN
	WITH RegData AS (
		SELECT a.container_id,p.OBJECT_ID,p.index_id,us.database_id
			,FileGroupName = FILEGROUP_NAME(a.data_space_id)
			,TableName = OBJECT_NAME(p.OBJECT_ID)
			,NumRows = p.ROWS
			,UsedPages = IsNull(a.used_pages,0)
			,TotalPages = IsNull(a.total_pages,0)
			,DataSizeMB = CONVERT(DECIMAL(19,2),IsNull(a.used_pages,0)) * 8/1024
			,IndexSizeMB = CASE WHEN ps.index_id < 2 THEN 0 ELSE ps.IndexSizeMB END
			,UserRequests = IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0)
			,UserUpdates = IsNull(us.user_updates,0)
			,LastUpdate = IsNull(us.last_user_update,null)
			,RatioRequestsToUpdates = CAST(IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0) AS REAL)
				/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL)
		FROM sys.allocation_units a
			INNER Join sys.partitions p
				ON p.hobt_id = a.container_id
				And a.type = 1
			LEFT Outer Join sys.dm_db_index_usage_stats us
				ON us.OBJECT_ID = p.OBJECT_ID
				And us.index_id = p.index_id
				And us.database_id = DB_ID()
			LEFT Outer Join #indstats ps
				ON us.index_id = ps.index_id
				And us.database_id = ps.database_id
				And us.OBJECT_ID = ps.OBJECT_ID
		--WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
	)
	, LOBData AS (
		SELECT a.container_id,p.OBJECT_ID,p.index_id,us.database_id
			,FileGroupName = FILEGROUP_NAME(a.data_space_id)
			,TableName = OBJECT_NAME(p.OBJECT_ID)
			,NumRows = p.ROWS
			,UsedPages = IsNull(a.used_pages,0)
			,TotalPages = IsNull(a.total_pages,0)
			,DataSizeMB = CONVERT(DECIMAL(19,2),IsNull(a.used_pages,0)) * 8/1024
			,IndexSizeMB = CASE WHEN ps.index_id < 2 THEN 0 ELSE ps.IndexSizeMB END
			,UserRequests = IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0)
			,UserUpdates = IsNull(us.user_updates,0)
			,LastUpdate = IsNull(us.last_user_update,null)
			,RatioRequestsToUpdates = CAST(IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0) AS REAL)
				/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL)
		FROM sys.allocation_units a
			INNER Join sys.partitions p
				ON p.partition_id = a.container_id
				And a.type = 2
			LEFT Outer Join sys.dm_db_index_usage_stats us
				ON us.OBJECT_ID = p.OBJECT_ID
				And us.index_id = p.index_id
				And us.database_id = DB_ID()
			LEFT Outer Join #indstats ps
				ON us.index_id = ps.index_id
				And us.database_id = ps.database_id
				And us.OBJECT_ID = ps.OBJECT_ID
		--WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
	)
	, OverFlowData AS (
		SELECT a.container_id,p.OBJECT_ID,p.index_id,us.database_id
			,FileGroupName = FILEGROUP_NAME(a.data_space_id)
			,TableName = OBJECT_NAME(p.OBJECT_ID)
			,NumRows = p.ROWS
			,UsedPages = IsNull(a.used_pages,0)
			,TotalPages = IsNull(a.total_pages,0)
			,DataSizeMB = CONVERT(DECIMAL(19,2),IsNull(a.used_pages,0)) * 8/1024
			,IndexSizeMB = CASE WHEN ps.index_id < 2 THEN 0 ELSE ps.IndexSizeMB END
			,UserRequests = IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0)
			,UserUpdates = IsNull(us.user_updates,0)
			,LastUpdate = IsNull(us.last_user_update,null)
			,RatioRequestsToUpdates = CAST(IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0) AS REAL)
				/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL)
		FROM sys.allocation_units a
			INNER Join sys.partitions p
				ON p.hobt_id = a.container_id
				And a.type = 3
			LEFT Outer Join sys.dm_db_index_usage_stats us
				ON us.OBJECT_ID = p.OBJECT_ID
				And us.index_id = p.index_id
				And us.database_id = DB_ID()
			LEFT Outer Join #indstats ps
				ON us.index_id = ps.index_id
				And us.database_id = ps.database_id
				And us.OBJECT_ID = ps.OBJECT_ID
		--WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
	), IndexSum AS (
		SELECT a.OBJECT_ID
		,AllDataSizeMB = SUM(CASE WHEN a.index_id in (0,1)
							THEN IsNull(a.DataSizeMB,0) +  IsNull(p2.DataSizeMB,0) +  IsNull(p3.DataSizeMB,0)
							ELSE IsNull(p2.DataSizeMB,0) + IsNull(p3.DataSizeMB,0)
							END)
		FROM RegData a
			LEFT Outer Join LOBData p2
				ON p2.container_id = a.container_id
			LEFT Outer Join OverFlowData p3
				ON p3.container_id = a.container_id
		GROUP BY a.OBJECT_ID
	), SummaryInfo AS (
		SELECT
			TableName = MAX(a.TableName)
			,InRowDataSizeMB = SUM(IsNull(a.DataSizeMB,0))
			,LOBDataSizeMB = SUM(IsNull(p2.DataSizeMB,0))
			,OFlowDataSizeMB = SUM(IsNull(p3.DataSizeMB,0))
			,NumRows = MAX(COALESCE(a.NumRows,p2.NumRows,p3.NumRows,0))
			,AllUsedPages = SUM(IsNull(a.UsedPages,0) + IsNull(p2.UsedPages,0) + IsNull(p3.UsedPages,0))
			,AllPages = SUM(IsNull(a.TotalPages,0) + IsNull(p2.TotalPages,0) + IsNull(p3.TotalPages,0))
			,FreeDataSpace = CONVERT(DECIMAL(19,2),SUM(IsNull(a.TotalPages,0) + IsNull(p2.TotalPages,0) + IsNull(p3.TotalPages,0))
							- SUM(IsNull(a.UsedPages,0) + IsNull(p2.UsedPages,0) + IsNull(p3.UsedPages,0)))* 8 / 1024
			,AllDataSizeMB = MAX(ids.AllDataSizeMB)
			,IndexSizeMB = SUM(IsNull(a.IndexSizeMB,0)) + SUM(IsNull(p2.IndexSizeMB,0)) + SUM(IsNull(p3.IndexSizeMB,0))
			,UserRequests = AVG(IsNull(a.UserRequests,0) + IsNull(p2.UserRequests,0) + IsNull(p3.UserRequests,0))
			,UserUpdates = AVG(IsNull(a.UserUpdates,0) + IsNull(p2.UserUpdates,0) + IsNull(p3.UserUpdates,0))
			,LastUpdate = MAX(COALESCE(a.LastUpdate,p2.LastUpdate,p3.LastUpdate,null))
			,DatabaseSize = @dbsize
		FROM RegData a
			LEFT Outer Join LOBData p2
				ON p2.container_id = a.container_id
			LEFT Outer Join OverFlowData p3
				ON p3.container_id = a.container_id
			LEFT Outer Join sys.indexes i
				ON i.OBJECT_ID = a.OBJECT_ID
				And i.index_id = a.index_id
			LEFT Outer Join IndexSum ids
				ON i.OBJECT_ID = ids.OBJECT_ID
		--WHERE filegroup_name(a.data_space_id) = 'Primary'
		GROUP BY a.OBJECT_ID
	)
	SELECT TableName,InRowDataSizeMB,LOBDataSizeMB,OFlowDataSizeMB,NumRows,AllUsedPages
			,AllPages,FreeDataSpace,AllDataSizeMB,AllDataSizeMB,AllDataSizeMB,IndexSizeMB
			,UserRequests,UserUpdates,LastUpdate,DatabaseSize
			,PercentofDB = ((IndexSizeMB + AllDataSizeMB) / DatabaseSize) * 100
		FROM SummaryInfo
		ORDER BY PercentofDB DESC
END

Did I mention that this was lengthier than the predecessor?  I have five CTEs doing slightly different things here with this query.  The first three are gathering the information for the different Allocation Units and the fifth is summarizing that data into a single set.  The fourth cte is used to calculate the actual data size dependent on type of index employed.  I wanted to be able to display the size of the different Allocation Units so I could get a true feel for how the database looks in a single query (with regards to size and table size).  I also thought it important to be able to determine how much activity that table is seeing.  Knowing if the table is highly active, has BLOB data, and it’s size could potentially help me in determining if that table should be in a separate filegroup and on a separate LUN.

I have also thrown in some additional information that will be consistent with the new script for sp_spaceused.  The additional field here is in regards to free space.  This was mostly used as a check between the two scripts to help in developing the final solutions.  I felt it useful to get an idea of how much space is available of the allocated pages.

The critical calculations are performed based on information retrieved from the used_pages attribute in sys.allocation_units and the index size returned by sys.dm_db_index_physical_stats.

I left a few lines commented out in the script mostly for demonstration purposes.  I wanted to also be able to see just the user objects (IsMsShipped = 0 ), and also to be able to filter by specific filegroup.  Those are helpful pieces based on the task at hand.

Results

This script resolves the issue of returning the information for all tables in a set based fashion and it does it rapidly.  In the cases that it does not perform rapidly, one can break it up into the separate segments should one decide to tweak the column set or query.  The remaining question then is “Does it match the results of sp_spaceused?”.  The answer to that is yes and no.  I will discuss that in further detail in the final installment of this series.  As the query is currently constructed, the answer is no.  However, let it be understood that the two queries are really returning different things and one may perceive them to be returning the same thing.  That is not entirely true.

You can download the script from here.

Table Sizes

Comments: 2 Comments
Published on: May 3, 2010

In SQL Server there are two stored procedures that help one to determine the size of a table.  One is documented and more widely popular, the other is not.  These two stored procedures are sp_spaceused and sp_MStablespace.  Dukagjin Maloku wrote a blog article about the the two stored procedures in March of this year.  I don’t intend to rehash what he has covered.  What I intend is to look at the internals of the two stored procedures.  This will be the first of a multi-part series.

Why?

I saw a question asking how to find the table sizes for all of the tables in a database.  A few scripts were provided (one using sp_MStablespace and one using sp_spaceused).  I had been using the msTablespace version for quite some time and had never questioned it’s accuracy.  That changed when the question evolved into why do they (the two scripts) show the same results?  So, I decided I needed to figure out what is happening inside both procedures to see if I could determine a reason for the difference.  I will be comparing these on SQL Server 2008.

Comparison

sp_spaceused

This stored procedure has been mostly updated to use the new DMVs available since SQL 2005.  It currently relies heavily on sys.dm_db_partition_stats.  This procedure determines the size of the indexes by subtracting the number of pages from the number of used pages.  The idea here is that the number of Data Pages will be less than the total number of used pages.  This stored procedure does still rely on a soon to be deprecated object called sysfiles.  The replacement for that object would be sys.database_files.  The functionality would be somewhat different, in regards to how it is used in this procedure – the results are the same.  Despite that, this proc does not look to be on the replacement block any time soon.

sp_MStablespace

This stored procedure on the contrary does look like it will be deprecated, though I was unable to find information to back that up.  When delving into the procedure, one can quickly spot that it relies on sysindexes and sysobjects for much of its heavy lifting.  The sizing information will be coming from sys.partitions, sys.allocation_units and sysindexes.  This procedure uses sysindexes mostly for that sizing information and then calculates the data size based on clustered indexes and heaps.  Then to calculate the remaining index size, it subtracts that size from the total for all of the indexes on the specified table.  The math is a bit different and can be the reason for the difference.  This procedure seems to be in need of a decent sized overhaul if expected to not be deprecated.

Findings

After looking at the code and then running some tests with each of these procedures, I was able to corroborate the claim that they don’t return the same results.  With that in hand, I decided to take a stab at writing something different to help create a consistent return.  Thus I will be sharing some queries I have been working on for each of these.  The desired goal is that both would return the same results.  Based on the consistency of information between the DMVs and the necessary (and some bonus) elements for these queries, that may be a bit more difficult to do.  For the sp_MSTablespace query, the crux of the query will be the sys.dm_db_index_physical_stats function, while the sp_spaceused query may continue to rely on sys.dm_db_partition_stats or just sys.partitions.  Both queries may rely on a combination of these latter two DMVs, tune in to find out.

«page 2 of 2
Calendar
May 2010
M T W T F S S
« Apr   Jun »
 12
3456789
10111213141516
17181920212223
24252627282930
31  
Follow me on Google+

In 0 people's circles

Add to circlesi
Content
Categories

Categories

Now Reading

Now Reading

Planned books:

Current books:

  • ChiRunning: A Revolutionary Approach to Effortless, Injury-Free Running

    ChiRunning: A Revolutionary Approach to Effortless, Injury-Free Running by Danny Dreyer, Katherine Dreyer

  • Advanced Marathoning – 2nd Edition

    Advanced Marathoning – 2nd Edition by Peter Pfitzinger, Scott Douglas

  • SQL Server MVP Deep Dives

    SQL Server MVP Deep Dives by Nielsen Paul, Delaney Kalen, Machanic Adam, Tripp Kimberly, Randal Paul, Low Greg

  • A World Without Heroes (Beyonders)

    A World Without Heroes (Beyonders) by Brandon Mull

Recent books:

View full Library

SQLHelp

SQLHelp


Welcome , today is Thursday, May 17, 2012