Table Space – CS Part Deux

Categories: News, Professional, Scripts, SSC
Comments: No Comments
Published on: November 21, 2011

Another script, another day.  And as promised, I am providing an update to the Table Space script that followed the sp_MStableSpace script.  Not a lot more to be said about this one, since much was said already.

I updated the script for those CS collations.  I also provided the update to show the % of DB column to be driven based on the data file usage information.

[codesyntax lang=”tsql”]

[/codesyntax]

Phew, I finally took care of some of those somedays that have been nagging me.  Sure, there has been a someday that has evolved due to that – but that is a good thing.

It helps that I also need these scripts to be CS.  Add to that, that I need to use them more frequently and it was a perfect opportunity to do a little housecleaning.

Re-purpose my TableSize Script

Comments: No Comments
Published on: March 3, 2011

Last Year I introduced a couple of scripts that I worked on.  Those scripts developed into a series covering the comparison of some methods and MS provided stored procs that could help you in the retrieval of table sizes in SQL Server.

I pulled out one of those scripts recently in order to find what tables, in an R and D database, were consuming alot of space.  While running that script, I realized that it could easily be used to help me with an additional task.  I decided that this script could help me deduce the top 10 biggest tables in a database that doesn’t necessarily need to have that data.  Or maybe, I just need to clean out the data so I can test populating the database.  This script is predicated on a lack of foreign keys – but can easily be adapted.

So, in all of its glory:

[codesyntax lang=”tsql”]

[/codesyntax]

I think the script, along with prior explanations, is pretty straight forward.  This can quickly help reset those LARGE tables for continued testing.  Of course, that is predicated on you not already having a script for that, and that you don’t know what tables need to be reset (maybe you are in a large team).

This is just one example of a script that can be useful for more than what it was first designed to do.  As DB professionals, we often come across situations were a prior script can easily be repurposed for the task at hand.  Knowing that, and how to do it, is an important tool in your toolbox.

Space Used

Comments: 1 Comment
Published on: May 25, 2010

At long last I am bringing the next installment in this mini-series.  You can find the rest of the articles by perusing the last article I wrote on the topic.  That article was a correction to the TableSpace script about which I had already blogged.

As you will recall, this series was designed to compare and contrast two MS provided stored procedures.  Those procedures are sp_spaceused and sp_MStablespace.  They both calculate table sizes and information but they do not always arrive at the same result.  It was the difference in results that prompted this little series of blogs.  In this article I will be examining sp_spaceused, or at least the re-write I did for this proc.  There will likely be one more article in the series.  That article will be to compare performance and wrap up the series.

Problem

As was stated in the Tablespace article, there are two main problems with this procedure.  The first is that some measure of looping must be used, if one is to use the stored procedure as is, to gather all of the information for all of the tables in a database.  This problem can be rectified by understanding the inner workings of the stored procedure.  The second issue is that it does not return data that is consistent with that of sp_MStablespace.  I would like for the data to be consistent between the two scripts that may be used.  If the data is the same, then it will only be a matter of preference as to which script one decides to use.  This should help eliminate doubt.

Looping

Much the same as needs to be done with the sp_MStablespace procedure, a loop can be used to return the data for all tables.  Based on my experience, the looping query does not take long to execute.  It really is somewhat unnecessary though.  Here is a sample of how such a loop might be done.

[codesyntax lang=”tsql”]

[/codesyntax]

In this version of the loop, it must be noted that an additional step must be taken due to the nature of the data that is returned by sp_spaceused.  This procedure returns ‘ KB’ in the output.  This additional data must be treated so proper calculations can be performed.

Inconsistencies

As stated in the aforementioned article on sp_MStablespace, the data returned between the two does not always match (in regards to table size).  The reason for this is due to a difference in calculation and it seems intent.  Thus both procs can be deemed accurate and correct.  However, that does not breed any higher level of confidence in the return data from both procedures.  I hope to elaborate somewhat on these subtle differences.

Discovery

I am using the stored procedure from SQL 2008 as the basis for my analysis.  One of the first things I have noted is that the procedure has been mostly updated in support of the new SQL 2008 objects.  The one exception is that it still uses dbo.sysfiles to try and determine the database size and logsize.  Beyond that, the proc makes use of the system views sys.partitions, sys.allocation_units, sys.internal_tables, sys.objects and sys.dm_db_partition_stats.

sys.objects

The use of this view is primarily to get the object_id of the table that was passed in to the stored procedure.  For this particular version, this can be eliminated since we are trying to get a solution that does not require looping.  For a later version, I will look into reincorporating that view into the script to allow either a single table or all tables to be returned.

sysfiles

As mentioned already, sysfiles is present in order to retrieve the database size and the logfile size.  This can be replaced by sys.database_files.  In testing, I have found that using sys.database_files is more consistent and it is also more accurate than using sysfiles.

Revision

Since this procedure was already mostly up to date with the new objects for 2005 / 2008, most of it is reusable and the changes are less dramatic than they were for the counterpart sp_MStablespace.  Much the same as the script for MStablespace, I did employ the use of CTEs to get the data that I wanted.  I use three CTEs to retrieve all of the necessary information.  I am also using the same calculations that were used in the original procedure that determined table size.  The exception being in the database size and the log size.  For ease of consumption, I will once again break the script up into smaller chunks.

Part I

[codesyntax lang=”tsql”]

[/codesyntax]

This is pretty straightforward here.  I am retrieving and summing the values needed for the database size and the log size.

Part II

[codesyntax lang=”tsql”]

[/codesyntax]

Note here that I have left code to remove MSShipped objects from the total.  This is a placeholder simply for if somebody would like to test that result.  The only thing other than that to explain is the internal_types.  I am checking for a range of internal table types just as is done in the original stored procedure.  A quick rundown of what those types is as follows (and can be found on MSDN):  202 = xml index nodes, 204 = fulltext_catalog_map, 211 = FULLTEXT_AVDL, 213 = FULLTEXT_DOCID_STATUS, 214 = FULLTEXT_INDEXED_DOCID, 215 = FULLTEXT_DOCID_FILTER, 216 = FULLTEXT_DOCID_MAP, and I was unable to locate what type 212 is.  I presume it is another FullText type.  For those types, the data is not counted as part of the data but is aggregated into the index size.

The final point with this script to note is in regards to an additional comment I left in the script.  The comment is in regards to how the size is calculated and I wanted to be certain that it was explained how that relates.  In the MStablespace script, I use sys.dm_db_index_physical_stats to help determine the index and table size.  My comment here relates to that DMF in that the spaceused version is equivalent to the max_record_size field in that DMF.  This procedure and script determine the size of a table based on the maximum size and of a page and then multiplies that across all pages.  The MStablespace script is more equivalent to calculating based on average fill of those pages.  The MStablespace script can easily be modified to support the max size and thus bring both scripts into harmony.  Beyond those points, I think the rest of the script is pretty self-explanatory.

Results

This script now permits us to get the full size of all tables in a database without the need for looping through a single stored procedure.  We also see that at this point we have outlined why the two Microsoft stored procedures returned different results.  With both scripts now returning the same data, and doing so consistently we can now begin to evaluate performance measures.

TableSpace Update

Comments: 11 Comments
Published on: May 19, 2010

The last post in the series on finding the sizes of your tables showed us how we could find that size via a set-based method similar to using sp_MStablespace.  In that post I showed you how to find the sizes using the SQL 2005 system objects rather than the scheduled to be deprecated objects.  You can read more about it here.

I needed to go back and revisit that post and perform a little spring cleaning on it.  I noticed after publication that the script had multiple columns by the same name that were unintended.  I discovered this as I was prepping and reviewing for my presentation at our SSSOLV (PASS UG) meeting last week.  During that review I came across another bug.  The script was not consistently returning the index size for the tables.  After reviewing the script more closely, against more databases, I found the problem and fixed it.  The change was rather simple – I changed the Join from the #indstats table to go directly to sys.partitions rather than through sys.dm_db_index_usage_stats.  I was able to spot this one due to running the query against one of my databases that is seldom used and thus the indexes were less likely to have statistics populated in that DMV.

So, here is the updated query.  Rather than breaking out into sections like I did in that last article, I am just pasting the script in its entirety.

[codesyntax lang=”tsql”]

[/codesyntax]

At some point in the future, I intend on modifying this query to make it more flexible in output, as well as to make it into a stored procedure.

SQL 2005 TableSpace

Comments: 4 Comments
Published on: May 5, 2010

In the first post of this series I highlighted and described two stored procedures that are shipped from Microsoft.  These stored procedures use different methods to display information about a table in a SQL Server database.  In this post, I want to delve a little deeper into the sp_MStablespace stored procedure.  In doing this, I plan to discover what causes the reporting difference between it and sp_spaceused.  I will also unveil a script that can be used in a Set-based fashion to return this information, unlike the RBAR method required forsp_MStablespace.  Like I stated in the first post, this analysis will be based on SQL 2008.

Problem

When one uses sp_MStablespace, one may be assuming that it will return the same information that sp_spaceused would return.  While this procedure is undocumented and less widely used, the design is that it will give table and index information for the table one provides as an input into the procedure.  In order to gain this information for more than one table (using this procedure), one must implement a looping mechanism.

Looping

I have been using a script that was passed along to me many years ago that helps to find the table information for all of the tables in a database.  I have seen a similar script passed around on the internet in several locations.  This script uses another Microsoft procedure to help loop through the tables and provide this sought-after information.  The name of that procedure is sp_msforeachtable.  Despite the looping mechanism employed, the script is pretty quick and dependable.

[codesyntax lang=”tsql”]

[/codesyntax]

The script is pretty straight forward in its design and goals.  This looping can be resolved with a set based solution that works faster in most of my test cases.

Inconsistencies

The second problem noted was that the data returned from the sp_MStablespace proc does not always match the data returned by sp_spaceused.  This is a substantially different problem to tackle.  I poured over the two procedures for hours trying to find what the problem was that was causing the difference.  My findings are somewhat surprising – they are both accurate and correct.  How can they both be correct when both return different values?  I hope to also explain that between this post and the subsequent post that will discuss sp_spaceused.

Discovery

In the first article, I mentioned some of the differences and subtleties of this stored procedure.  I also made an assumption that this procedure will be deprecated or at least massively changed.  As I discuss the internals of this procedure, I hope that assumption will be validated.

The objects employed in sp_MStablespace are sysobjects, sysindexes, sys.indexes, sys.partitions, and sys.allocation_units.  This is actually not too bad of a start for rewriting this procedure due to the use of sys.indexes, sys.partitions, and sys.allocation_units.  Let’s take a look at the use of the old and soon to be deprecated objects: sysobjects and sysindexes.

sysobjects

The role that this object plays is actually quite minor.  This object should easily be replaced by sys.objects.  The sysobjects view is being used to set the @id variable to the object id that is found in the sysobjects view.  If the tablename that was passed into the proc was not found in sysobjects, then the procedure should raise an error.

sysindexes

The real heavy lifting of this procedure is being doing through the sysindexes view.  From sysindexes, we are trying to determine the index size and the table size.  The calculation is based on dpages and used.  By summing those two fields, we will determine the size of the data in the table.  Through the used value, we will determine the size of the index by subtracting the data size from the value we get from the used value.  There are some additional qualifiers on those formulas to limit which index types will be calculated in which size calculation (i.e. datasize would be blobs, heaps and clustered indexes only).  A further calculation is used in this procedure via the use of the spt_values table.  The table is being used as a lookup to assign a pagesize value based on the value returned for the low attribute of “WINDOWS/NT.”  One could have chosen to hard code this since it is just calculating the page size and we know that a page size is ~8k.

Caveat

Now that I have explained what is happening with those old deprecated objects, the caveat is that the procedure has a bit of logic in there to check for the version of SQL Server.  Depending on the version, you could end up running the calculations based off the new objects.  However, that does not eliminate the dependency on the compatibility objects.  The same kind of logic is used to calculate the size of the data, but the indexes are still calculated via the sysindexes view.  I find myself asking “Why Bother?”  Take a look for yourself.  I don’t see this procedure being released to a SQL 2000 environment, but it was written for backward compatibility.  That seems like a waste of resources and development time to me.  Either leave it as it was or rewrite the whole thing – that seems to make more sense to me than to make this procedure backwards compatible with the release of SQL 2005.  The only thing I could imagine is that it was done with the intent of being able to be used in a Central Management server type of configuration.

Revision

I was looking to write a query that would replace the query I posted at the beginning of this article.  As I delved into this, I knew I wanted something that would return consistent results with sp_spaceused.  I wanted to have a reliable query that was quick and used the 2005 objects and DMVs.  What I ended up with was something a bit bigger than the original query.

In SQL 2005, one can find a lot of information through the use of sys.dm_db_index_physical_stats.  This function is quite handy in many situations where sysindexes was used previously.  This function can be used to help determine index size as well as fragmentation levels as a couple of examples.  For this new script, I needed to return all of the same data as the old script.  I also wanted to include some additional information that I felt was beneficial to having a good insight into the database.  So let’s break this script up again like I have shown in the past.  I am reusing some code that I have demonstrated a few times for other scripts.

Part I

[codesyntax lang=”tsql”]

[/codesyntax]

I have one major departure here from how I populated this temp table in the past.  I have opted to use the windowing function Row_Number rather than an Identity value.  I made this change due to the fewer resources required to run it.  Using an Identity could require more disk IO than using the Row_Number.  I want this to run quickly and not hamper system responsiveness or system resources.  In observed testing, this change helps improve overall performance by 50%.  I tested this against SQL 2008 and SQL 2005 running different patches and on different hardware.  The overall improvement in performance was consistent.

Part II

[codesyntax lang=”tsql”]

[/codesyntax]

This bit is quite simply just so I could provide the total data file size (excluding log file).  Spoiler alert:  This piece of code is also used in my new script for sp_spaceused.

Part III

[codesyntax lang=”tsql”]

[/codesyntax]

Did I mention that this was lengthier than the predecessor?  I have five CTEs doing slightly different things here with this query.  The first three are gathering the information for the different Allocation Units and the fifth is summarizing that data into a single set.  The fourth cte is used to calculate the actual data size dependent on type of index employed.  I wanted to be able to display the size of the different Allocation Units so I could get a true feel for how the database looks in a single query (with regards to size and table size).  I also thought it important to be able to determine how much activity that table is seeing.  Knowing if the table is highly active, has BLOB data, and it’s size could potentially help me in determining if that table should be in a separate filegroup and on a separate LUN.

I have also thrown in some additional information that will be consistent with the new script for sp_spaceused.  The additional field here is in regards to free space.  This was mostly used as a check between the two scripts to help in developing the final solutions.  I felt it useful to get an idea of how much space is available of the allocated pages.

The critical calculations are performed based on information retrieved from the used_pages attribute in sys.allocation_units and the index size returned by sys.dm_db_index_physical_stats.

I left a few lines commented out in the script mostly for demonstration purposes.  I wanted to also be able to see just the user objects (IsMsShipped = 0 ), and also to be able to filter by specific filegroup.  Those are helpful pieces based on the task at hand.

Results

This script resolves the issue of returning the information for all tables in a set based fashion and it does it rapidly.  In the cases that it does not perform rapidly, one can break it up into the separate segments should one decide to tweak the column set or query.  The remaining question then is “Does it match the results of sp_spaceused?”.  The answer to that is yes and no.  I will discuss that in further detail in the final installment of this series.  As the query is currently constructed, the answer is no.  However, let it be understood that the two queries are really returning different things and one may perceive them to be returning the same thing.  That is not entirely true.

You can download the script from here.

Table Sizes

Comments: 2 Comments
Published on: May 3, 2010

In SQL Server there are two stored procedures that help one to determine the size of a table.  One is documented and more widely popular, the other is not.  These two stored procedures are sp_spaceused and sp_MStablespace.  Dukagjin Maloku wrote a blog article about the the two stored procedures in March of this year.  I don’t intend to rehash what he has covered.  What I intend is to look at the internals of the two stored procedures.  This will be the first of a multi-part series.

Why?

I saw a question asking how to find the table sizes for all of the tables in a database.  A few scripts were provided (one using sp_MStablespace and one using sp_spaceused).  I had been using the msTablespace version for quite some time and had never questioned it’s accuracy.  That changed when the question evolved into why do they (the two scripts) show the same results?  So, I decided I needed to figure out what is happening inside both procedures to see if I could determine a reason for the difference.  I will be comparing these on SQL Server 2008.

Comparison

sp_spaceused

This stored procedure has been mostly updated to use the new DMVs available since SQL 2005.  It currently relies heavily on sys.dm_db_partition_stats.  This procedure determines the size of the indexes by subtracting the number of pages from the number of used pages.  The idea here is that the number of Data Pages will be less than the total number of used pages.  This stored procedure does still rely on a soon to be deprecated object called sysfiles.  The replacement for that object would be sys.database_files.  The functionality would be somewhat different, in regards to how it is used in this procedure – the results are the same.  Despite that, this proc does not look to be on the replacement block any time soon.

sp_MStablespace

This stored procedure on the contrary does look like it will be deprecated, though I was unable to find information to back that up.  When delving into the procedure, one can quickly spot that it relies on sysindexes and sysobjects for much of its heavy lifting.  The sizing information will be coming from sys.partitions, sys.allocation_units and sysindexes.  This procedure uses sysindexes mostly for that sizing information and then calculates the data size based on clustered indexes and heaps.  Then to calculate the remaining index size, it subtracts that size from the total for all of the indexes on the specified table.  The math is a bit different and can be the reason for the difference.  This procedure seems to be in need of a decent sized overhaul if expected to not be deprecated.

Findings

After looking at the code and then running some tests with each of these procedures, I was able to corroborate the claim that they don’t return the same results.  With that in hand, I decided to take a stab at writing something different to help create a consistent return.  Thus I will be sharing some queries I have been working on for each of these.  The desired goal is that both would return the same results.  Based on the consistency of information between the DMVs and the necessary (and some bonus) elements for these queries, that may be a bit more difficult to do.  For the sp_MSTablespace query, the crux of the query will be the sys.dm_db_index_physical_stats function, while the sp_spaceused query may continue to rely on sys.dm_db_partition_stats or just sys.partitions.  Both queries may rely on a combination of these latter two DMVs, tune in to find out.

page 1 of 1






Calendar
December 2014
M T W T F S S
« Nov    
1234567
891011121314
15161718192021
22232425262728
293031  
Content
SQLHelp

SQLHelp


Welcome , today is Saturday, December 20, 2014