SQL 2005 TableSpace

Comments: 4 Comments
Published on: May 5, 2010

In the first post of this series I highlighted and described two stored procedures that are shipped from Microsoft.  These stored procedures use different methods to display information about a table in a SQL Server database.  In this post, I want to delve a little deeper into the sp_MStablespace stored procedure.  In doing this, I plan to discover what causes the reporting difference between it and sp_spaceused.  I will also unveil a script that can be used in a Set-based fashion to return this information, unlike the RBAR method required forsp_MStablespace.  Like I stated in the first post, this analysis will be based on SQL 2008.

Problem

When one uses sp_MStablespace, one may be assuming that it will return the same information that sp_spaceused would return.  While this procedure is undocumented and less widely used, the design is that it will give table and index information for the table one provides as an input into the procedure.  In order to gain this information for more than one table (using this procedure), one must implement a looping mechanism.

Looping

I have been using a script that was passed along to me many years ago that helps to find the table information for all of the tables in a database.  I have seen a similar script passed around on the internet in several locations.  This script uses another Microsoft procedure to help loop through the tables and provide this sought-after information.  The name of that procedure is sp_msforeachtable.  Despite the looping mechanism employed, the script is pretty quick and dependable.

[codesyntax lang="tsql"]

[/codesyntax]

The script is pretty straight forward in its design and goals.  This looping can be resolved with a set based solution that works faster in most of my test cases.

Inconsistencies

The second problem noted was that the data returned from the sp_MStablespace proc does not always match the data returned by sp_spaceused.  This is a substantially different problem to tackle.  I poured over the two procedures for hours trying to find what the problem was that was causing the difference.  My findings are somewhat surprising – they are both accurate and correct.  How can they both be correct when both return different values?  I hope to also explain that between this post and the subsequent post that will discuss sp_spaceused.

Discovery

In the first article, I mentioned some of the differences and subtleties of this stored procedure.  I also made an assumption that this procedure will be deprecated or at least massively changed.  As I discuss the internals of this procedure, I hope that assumption will be validated.

The objects employed in sp_MStablespace are sysobjects, sysindexes, sys.indexes, sys.partitions, and sys.allocation_units.  This is actually not too bad of a start for rewriting this procedure due to the use of sys.indexes, sys.partitions, and sys.allocation_units.  Let’s take a look at the use of the old and soon to be deprecated objects: sysobjects and sysindexes.

sysobjects

The role that this object plays is actually quite minor.  This object should easily be replaced by sys.objects.  The sysobjects view is being used to set the @id variable to the object id that is found in the sysobjects view.  If the tablename that was passed into the proc was not found in sysobjects, then the procedure should raise an error.

sysindexes

The real heavy lifting of this procedure is being doing through the sysindexes view.  From sysindexes, we are trying to determine the index size and the table size.  The calculation is based on dpages and used.  By summing those two fields, we will determine the size of the data in the table.  Through the used value, we will determine the size of the index by subtracting the data size from the value we get from the used value.  There are some additional qualifiers on those formulas to limit which index types will be calculated in which size calculation (i.e. datasize would be blobs, heaps and clustered indexes only).  A further calculation is used in this procedure via the use of the spt_values table.  The table is being used as a lookup to assign a pagesize value based on the value returned for the low attribute of “WINDOWS/NT.”  One could have chosen to hard code this since it is just calculating the page size and we know that a page size is ~8k.

Caveat

Now that I have explained what is happening with those old deprecated objects, the caveat is that the procedure has a bit of logic in there to check for the version of SQL Server.  Depending on the version, you could end up running the calculations based off the new objects.  However, that does not eliminate the dependency on the compatibility objects.  The same kind of logic is used to calculate the size of the data, but the indexes are still calculated via the sysindexes view.  I find myself asking “Why Bother?”  Take a look for yourself.  I don’t see this procedure being released to a SQL 2000 environment, but it was written for backward compatibility.  That seems like a waste of resources and development time to me.  Either leave it as it was or rewrite the whole thing – that seems to make more sense to me than to make this procedure backwards compatible with the release of SQL 2005.  The only thing I could imagine is that it was done with the intent of being able to be used in a Central Management server type of configuration.

Revision

I was looking to write a query that would replace the query I posted at the beginning of this article.  As I delved into this, I knew I wanted something that would return consistent results with sp_spaceused.  I wanted to have a reliable query that was quick and used the 2005 objects and DMVs.  What I ended up with was something a bit bigger than the original query.

In SQL 2005, one can find a lot of information through the use of sys.dm_db_index_physical_stats.  This function is quite handy in many situations where sysindexes was used previously.  This function can be used to help determine index size as well as fragmentation levels as a couple of examples.  For this new script, I needed to return all of the same data as the old script.  I also wanted to include some additional information that I felt was beneficial to having a good insight into the database.  So let’s break this script up again like I have shown in the past.  I am reusing some code that I have demonstrated a few times for other scripts.

Part I

[codesyntax lang="tsql"]

[/codesyntax]

I have one major departure here from how I populated this temp table in the past.  I have opted to use the windowing function Row_Number rather than an Identity value.  I made this change due to the fewer resources required to run it.  Using an Identity could require more disk IO than using the Row_Number.  I want this to run quickly and not hamper system responsiveness or system resources.  In observed testing, this change helps improve overall performance by 50%.  I tested this against SQL 2008 and SQL 2005 running different patches and on different hardware.  The overall improvement in performance was consistent.

Part II

[codesyntax lang="tsql"]

[/codesyntax]

This bit is quite simply just so I could provide the total data file size (excluding log file).  Spoiler alert:  This piece of code is also used in my new script for sp_spaceused.

Part III

[codesyntax lang="tsql"]

[/codesyntax]

Did I mention that this was lengthier than the predecessor?  I have five CTEs doing slightly different things here with this query.  The first three are gathering the information for the different Allocation Units and the fifth is summarizing that data into a single set.  The fourth cte is used to calculate the actual data size dependent on type of index employed.  I wanted to be able to display the size of the different Allocation Units so I could get a true feel for how the database looks in a single query (with regards to size and table size).  I also thought it important to be able to determine how much activity that table is seeing.  Knowing if the table is highly active, has BLOB data, and it’s size could potentially help me in determining if that table should be in a separate filegroup and on a separate LUN.

I have also thrown in some additional information that will be consistent with the new script for sp_spaceused.  The additional field here is in regards to free space.  This was mostly used as a check between the two scripts to help in developing the final solutions.  I felt it useful to get an idea of how much space is available of the allocated pages.

The critical calculations are performed based on information retrieved from the used_pages attribute in sys.allocation_units and the index size returned by sys.dm_db_index_physical_stats.

I left a few lines commented out in the script mostly for demonstration purposes.  I wanted to also be able to see just the user objects (IsMsShipped = 0 ), and also to be able to filter by specific filegroup.  Those are helpful pieces based on the task at hand.

Results

This script resolves the issue of returning the information for all tables in a set based fashion and it does it rapidly.  In the cases that it does not perform rapidly, one can break it up into the separate segments should one decide to tweak the column set or query.  The remaining question then is “Does it match the results of sp_spaceused?”.  The answer to that is yes and no.  I will discuss that in further detail in the final installment of this series.  As the query is currently constructed, the answer is no.  However, let it be understood that the two queries are really returning different things and one may perceive them to be returning the same thing.  That is not entirely true.

You can download the script from here.

4 Comments - Leave a comment
  1. Dugi says:

    Very nice explanation and very professional review, I like it and you show us much in deep info about these two SYS SPs!

  2. Are you not finding significant disk impact when running sys.dm_db_index_physical_stats? Especially on larger tables and when using DETAILED.

    • Jason Brimhall says:

      There has been some, and a different mode might provide a relief from that impact. The DETAILED mode seems to provide the best accuracy for the size. I might have to reconsider that with much larger databases (500GB +), which I don’t have access to right now.

  3. [...] The last post in the series on finding the sizes of your tables showed us how we could find that size via a set-based method similar to using sp_MStablespace.  In that post I showed you how to find the sizes using the SQL 2005 system objects rather than the scheduled to be deprecated objects.  You can read more about it here. [...]

Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">








Calendar
May 2010
M T W T F S S
« Apr   Jun »
 12
3456789
10111213141516
17181920212223
24252627282930
31  
Content
SQLHelp

SQLHelp


Welcome , today is Thursday, October 23, 2014