Space Used

Comments: 1 Comment
Published on: May 25, 2010

At long last I am bringing the next installment in this mini-series.  You can find the rest of the articles by perusing the last article I wrote on the topic.  That article was a correction to the TableSpace script about which I had already blogged.

As you will recall, this series was designed to compare and contrast two MS provided stored procedures.  Those procedures are sp_spaceused and sp_MStablespace.  They both calculate table sizes and information but they do not always arrive at the same result.  It was the difference in results that prompted this little series of blogs.  In this article I will be examining sp_spaceused, or at least the re-write I did for this proc.  There will likely be one more article in the series.  That article will be to compare performance and wrap up the series.

Problem

As was stated in the Tablespace article, there are two main problems with this procedure.  The first is that some measure of looping must be used, if one is to use the stored procedure as is, to gather all of the information for all of the tables in a database.  This problem can be rectified by understanding the inner workings of the stored procedure.  The second issue is that it does not return data that is consistent with that of sp_MStablespace.  I would like for the data to be consistent between the two scripts that may be used.  If the data is the same, then it will only be a matter of preference as to which script one decides to use.  This should help eliminate doubt.

Looping

Much the same as needs to be done with the sp_MStablespace procedure, a loop can be used to return the data for all tables.  Based on my experience, the looping query does not take long to execute.  It really is somewhat unnecessary though.  Here is a sample of how such a loop might be done.

[codesyntax lang=”tsql”]

[/codesyntax]

In this version of the loop, it must be noted that an additional step must be taken due to the nature of the data that is returned by sp_spaceused.  This procedure returns ‘ KB’ in the output.  This additional data must be treated so proper calculations can be performed.

Inconsistencies

As stated in the aforementioned article on sp_MStablespace, the data returned between the two does not always match (in regards to table size).  The reason for this is due to a difference in calculation and it seems intent.  Thus both procs can be deemed accurate and correct.  However, that does not breed any higher level of confidence in the return data from both procedures.  I hope to elaborate somewhat on these subtle differences.

Discovery

I am using the stored procedure from SQL 2008 as the basis for my analysis.  One of the first things I have noted is that the procedure has been mostly updated in support of the new SQL 2008 objects.  The one exception is that it still uses dbo.sysfiles to try and determine the database size and logsize.  Beyond that, the proc makes use of the system views sys.partitions, sys.allocation_units, sys.internal_tables, sys.objects and sys.dm_db_partition_stats.

sys.objects

The use of this view is primarily to get the object_id of the table that was passed in to the stored procedure.  For this particular version, this can be eliminated since we are trying to get a solution that does not require looping.  For a later version, I will look into reincorporating that view into the script to allow either a single table or all tables to be returned.

sysfiles

As mentioned already, sysfiles is present in order to retrieve the database size and the logfile size.  This can be replaced by sys.database_files.  In testing, I have found that using sys.database_files is more consistent and it is also more accurate than using sysfiles.

Revision

Since this procedure was already mostly up to date with the new objects for 2005 / 2008, most of it is reusable and the changes are less dramatic than they were for the counterpart sp_MStablespace.  Much the same as the script for MStablespace, I did employ the use of CTEs to get the data that I wanted.  I use three CTEs to retrieve all of the necessary information.  I am also using the same calculations that were used in the original procedure that determined table size.  The exception being in the database size and the log size.  For ease of consumption, I will once again break the script up into smaller chunks.

Part I

[codesyntax lang=”tsql”]

[/codesyntax]

This is pretty straightforward here.  I am retrieving and summing the values needed for the database size and the log size.

Part II

[codesyntax lang=”tsql”]

[/codesyntax]

Note here that I have left code to remove MSShipped objects from the total.  This is a placeholder simply for if somebody would like to test that result.  The only thing other than that to explain is the internal_types.  I am checking for a range of internal table types just as is done in the original stored procedure.  A quick rundown of what those types is as follows (and can be found on MSDN):  202 = xml index nodes, 204 = fulltext_catalog_map, 211 = FULLTEXT_AVDL, 213 = FULLTEXT_DOCID_STATUS, 214 = FULLTEXT_INDEXED_DOCID, 215 = FULLTEXT_DOCID_FILTER, 216 = FULLTEXT_DOCID_MAP, and I was unable to locate what type 212 is.  I presume it is another FullText type.  For those types, the data is not counted as part of the data but is aggregated into the index size.

The final point with this script to note is in regards to an additional comment I left in the script.  The comment is in regards to how the size is calculated and I wanted to be certain that it was explained how that relates.  In the MStablespace script, I use sys.dm_db_index_physical_stats to help determine the index and table size.  My comment here relates to that DMF in that the spaceused version is equivalent to the max_record_size field in that DMF.  This procedure and script determine the size of a table based on the maximum size and of a page and then multiplies that across all pages.  The MStablespace script is more equivalent to calculating based on average fill of those pages.  The MStablespace script can easily be modified to support the max size and thus bring both scripts into harmony.  Beyond those points, I think the rest of the script is pretty self-explanatory.

Results

This script now permits us to get the full size of all tables in a database without the need for looping through a single stored procedure.  We also see that at this point we have outlined why the two Microsoft stored procedures returned different results.  With both scripts now returning the same data, and doing so consistently we can now begin to evaluate performance measures.

Content Copy Followup

Tags: ,
Comments: 2 Comments
Published on: May 24, 2010

Last week I posted an article about plagiarism.  This is a follow-up to that.  After posting that blog article, Steiner quickly removed the copied content.  The tone may have been a bit harsh in that article.  My feeling on the subject has shifted only slightly.  Steinar was trying to do something good, only went about it the wrong way.  The issue for me is not the intention but the posting of entire articles for consumption by somebody else.  The articles did have a category on each one with the original authors name.  However, that was not evident when first reading the article.  Had I not known that the articles were mine, I may have missed the category at the end of the article.   Having known that the article was mine, I looked harder for some sort of attribution.

Here is where my opinion has shifted slightly.  Steinar was only plagiarizing inadvertently (prior to my initial post I was a little uncertain in this regard).  I don’t think he was blatantly intending to do harm.  Despite that, whether intentional or not – it was plagiarism.  Has he corrected his actions – YES!  The speed with which he reacted to the emails he received tells me that it was unintentional.  Do I still take issue with it – NO!  He apologized, corrected the wrong and we are moving on.  I also must apologize for plastering his email on my blog.  I have removed the email address.  I should have done so sooner.

This morning I received an email from Steinar about working with him to post some content for his aggregation.  I responded with some questions about his plans for how he will be managing his aggregation.  I hope that we can work something out that would be beneficial for both.

The blog site that Steinar runs is www.sqlblogs.net.  Go check it out.

SQL Book Review

Categories: News, Professional
Comments: 5 Comments
Published on: May 21, 2010

Recently I received a new opportunity via email.  Steve Jones at SQLServerCentral sent me an email to see if I would be interested in helping review a book and write about it on my blog.  I told Steve I would be happy to help out.

The book is titled Defensive Database Programming and is currently available at RedGate.  This book was authored by Alex Kuznetsov and is 10 chapters long and376 pages.  I will be writing between 1 and 10 articles concerning this book.  I would like to dive into the book and check things out and give it a good fair shake.

After reading the author Bio, I added Alex to my blog roll.  I added him if for nothing more than he runs Ultra marathons and is a SQL Server database guy.  You can find Alex’s blog here.

One last tidbit for now, the book had a technical review performed by Hugo Kornelis.

Plagiarized or Copied

Tags: ,
Comments: 5 Comments
Published on: May 20, 2010

Today I ran into an interesting website.  While I was trying to locate information concerning a SQL Saturday event, I came across one of my blog entries on another site.  I syndicate to SQLServerCentral and SQLServerpedia and am fully aware that my blogs would be verbatim on each of those sites.  That is expected, both parties and I agreed to terms and conditions of syndicating my blog on those sites.  This site was neither of them.  My blogs that appear on this site could be found here (as of the writing of this article).

I had to ponder it for a bit.  The intent of the site, as they state, is to pull the best of the blogs to a single site.  That may seem a little flattering for somebody new to the blogging world.  I decided to ask some peers what they thought of the site.  You can find that question here and follow the conversation about it.  Flattery or not it is stealing of content as Buck Woody brought up in his blog on the topic after word spread.  Thanks to Gail Shaw for forwarding it out on twitter.  A lot of people have their blogs copied to this site, and none via permission.  Emails are being generated and being sent to (removed email address) the admin of the site.  If your content has been stolen, I suggest you do the same.

Personally, if a request came to me asking my permission to have my material syndicated on that blog site – I would consider it.  But please don’t just post my content without proper attribution or having requested my permission.

New SQL Standard Article

Categories: News, Professional
Comments: 1 Comment
Published on: May 20, 2010

A new article for the SQL Standard is now available.  You can find the article here.  The article is by Sam Bendayan and covers the topic of set-based programming.

Check it out.

TableSpace Update

Comments: 11 Comments
Published on: May 19, 2010

The last post in the series on finding the sizes of your tables showed us how we could find that size via a set-based method similar to using sp_MStablespace.  In that post I showed you how to find the sizes using the SQL 2005 system objects rather than the scheduled to be deprecated objects.  You can read more about it here.

I needed to go back and revisit that post and perform a little spring cleaning on it.  I noticed after publication that the script had multiple columns by the same name that were unintended.  I discovered this as I was prepping and reviewing for my presentation at our SSSOLV (PASS UG) meeting last week.  During that review I came across another bug.  The script was not consistently returning the index size for the tables.  After reviewing the script more closely, against more databases, I found the problem and fixed it.  The change was rather simple – I changed the Join from the #indstats table to go directly to sys.partitions rather than through sys.dm_db_index_usage_stats.  I was able to spot this one due to running the query against one of my databases that is seldom used and thus the indexes were less likely to have statistics populated in that DMV.

So, here is the updated query.  Rather than breaking out into sections like I did in that last article, I am just pasting the script in its entirety.

[codesyntax lang=”tsql”]

[/codesyntax]

At some point in the future, I intend on modifying this query to make it more flexible in output, as well as to make it into a stored procedure.

Exercise Recap

Categories: Running
Tags:
Comments: 1 Comment
Published on: May 17, 2010

I am now full swing into my two-a-days.  It took me a little longer to get into the swing but I have been pretty steady for a little over a week now with the two-a-days.  Some days are easier than others.  Some days it is one cardio session and one strength session.  Some days it is two cardio sessions.  I don’t do two strength sessions a day.  Today I was thinking about doing a third session, but I am too wasted after two harder cardio sessions than I have done in a long time.  It is also getting a little more tiring with the heat going up.  Today was a little cooler than the past few days but the humidity was elevated (woot) which added a different element than what I have been getting used to.

I feel pretty good about the direction I am heading with my exercise.  Some day I will figure out how to drag myself out of bed earlier for an exercise session or training run when there is nobody else to run or exercise with.  Race days and group runs are a different animal.  It is easy to get up in the morning on those days.

May 2010 S3OLV Meeting Recap

Comments: 1 Comment
Published on: May 14, 2010

On the evening of May 13th, S3OLV held its monthly meeting.  We had some really good discussion, and got to participate in a couple of good presentations.

Charley (Chapter President) gave a presentation on his current SAN migration and some of the sticking points that he is experiencing.  To say they are experiencing mild frustration would be putting it lightly.  We joked about it a bit, but it raises some good discussion points.  The primary point is SAN configuration and doing it correctly.  The secondary point is to not make hasty decisions or decisions based on knee-jerk reactions.  The final discussion point to be made would be to test.  They are testing it currently – and that is good.  The principle problem is that the disk performance has decreased by a factor of two.

Charley also showed us some of his dabbling in burning his own circuit boards and programming the processors for the board.  Cool stuff.

The second presentation was by myself.  I showed a few different methods for quickly finding the table sizes for all of the tables in the database.  This presentation goes hand in hand with the series I am currently doing on the same subject.

It was a good meeting and fun was had.  The discussion was both light and serious at times.  We even had a carrion and a howling flying monkey visit us.

Tune in next month for more about S3OLV.

T-SQL Tuesday #006: A Blobbing We Will Go

Comments: 1 Comment
Published on: May 11, 2010

Once again it looks like I was early for the T-SQL Tuesday event.  Last month I submitted my entry on reporting as was the topic last month.  If you recall, last month I was a week early.  As it turns out, this month I am a full month early.  This month we have the distinct pleasure of writing about BLOBS.  Last month I wrote about Reporting on BLOBS.  This month, the event is hosted by Michael Coles.

This month the topic is a little more difficult for me.  As luck would have it, I have written a fair share of articles about BLOB data, or at least involving BLOBs over the last month.  What should I write that is unique to what I have already written?  Or could I slide by with simply writing a recap of some of those posts?  Hmmmmm…

Well, lucky I know of something that I can add to the meme on the topic of LOB or BLOB data.  Oh, and it is more than just the fish you see to the left.

Did You Know…

In the DMVs for SQL 2005 and SQL 2008 there is more than one place you can find the information about the size of your LOB data?  And it goes without saying that there is more than one way to find information about LOBs in your database.  I have outlined a couple of methods already in previous posts (and in another post coming up that will conclude the table size procs series that I started).

One of the methods to find a little information about the LOBs is through a function that I have been using for various different purposes.  This function is the sys.dm_db_index_physical_stats function.  In this function there is a column returned called allocation_unit_type.  If you want to learn more about this function and the column, I recommend you read what Microsoft has to say about it from this page.  In short, this column contains what type of data is being stored for the specified database and object within that database with its varying other stats.

That’s all fine and dandy…

Since we know (from reading that Microsoft document and from past experiences) that we can retrieve various pieces of information from this function such as page counts, allocation unit types, record count and avg record size; it stands to reason that we can also determine the size of our BLOB infrastructure through the use of this function.  The question is, how do we calculate the size and which option should we use to do it?

Weapons of Choice

Due to the length of time it may take to run the function, one may want to have a few options.  It is also desirable that one use the most efficient method to retrieve the data.  In order to find this, I decided to use test each of the function options to determine accuracy and speed.  These options are Limited (Default), Sampled, and Detailed.

The script setup for each is essentially the same script.  The only difference being that option has been changed for the data pertinent to that option.  After running each of the scripts, I find the results to be somewhat surprising.  I will be examining execution time, Io statistics and the execution plans and execution cost.  I will also note that I am also dumping the results of each of the function calls into a temp table so I can run multiple queries against that data without having to wait for the function to run again.

Limited

[codesyntax lang=”tsql”]

[/codesyntax]

The query is pretty straightforward and self-explanatory.  I am querying the function for all tables in the database in a limited fashion.  I am grouping those results into usable groupings based on allocation unit type, object, index and database.  From this query, I see a CPU Time of 3062 ms and elapsed time of 8809ms.  I have also taken the sum of all objects returned from this query to get a value of 1163.5 MB for this database.  At this point I have not broken out the sum by allocation unit type.

Sampled

For the sampled run, I used the same query with minor changes.

[codesyntax lang=”tsql”]

[/codesyntax]

The only changes being the option and the name of the temp table.  For this query, I get a CPU time of 3203ms, elapsed time of 5768ms and a total for the objects of 11678.2 MB.  If I were to stop at this point I would conclude that the actual size would be larger than I just saw and I could expect that the cpu time and elapsed times should be longer.  The bit of surprise is that the sampled return is faster overall than using the limited option.

Detailed

Same script as the prior to tests.  Again, the only changes are in regards to the temp table name and the use of the Detailed option in the function.  When I run the detailed query, I see that the cpu time jumps to 12313ms, elapsed time jumps to 29191ms and the total size decreases to 11635 MB.  What?  The total size decreased.  It is more in line with what we see from the Limited option.  With that knowledge, one may conjecture that it would suffice for these queries to simply run the function in Limited fashion.  The speed of that option is comparable to the Sampled run and the data appears to be more accurate (since sample is just an estimate).

Conclusion

The execution plan and query cost for each run is identical in my testing.  The real differences were seen in execution time.  I can now take this data and modify the queries slightly further to just return the lines that tell me the size information for the allocation unit type of “LOB_DATA.”  This will help me to gain a better understanding of the sizes of that kind of data in my environment and a better understanding of the environment in general.

Exceptional DBA

Categories: News, Professional
Comments: No Comments
Published on: May 11, 2010

It is once again that time of year.  It is time to nominate the DBA in your life for the Exceptional DBA of the Year Award.  This award is sponsored by RedGate.  You can read more about it here or here.

For many of us there is a very high bar either set by ourselves or by our employer.  For our type of job great things are inherently expected.  If you are able to exceed your own expectations then you may be an exceptional DBA.  I don’t think that you necessarily need to exceed your own expectations though.  For this type of award, I think meeting your own expectations or coming close may be just what is necessary.  Another measure might be to determine if you are doing any better at your job this year than you were last year.

Even if you do not feel you may be an exceptional DBA, what are you doing in your job right now that is helping to save the business some money?  What are you doing to make it appear as if all is running smoothly?  What are you doing outside of your job to contribute to the SQL community?  Those are just a few things to ponder in regards to whether or not you may be an exceptional DBA whether or not you are nominated for this award.

If you know somebody, nominate them.  If you think you are an exceptional DBA it is up to you to determine whether you want to self-nominate or not.  If you are doing your job with high efficiency, it is in part your responsibility to make sure somebody knows what you are doing.

As far as the award goes: some think it may be a bit arrogant to self-nominate.  That is not entirely true – though don’t dismiss it either.  As I said, you do need to let somebody know what you are doing and that you are doing it well – tactfully.  If you do nominate yourself for this award, you still have to be reviewed by the panel of judges and they have the final say on who is the recipient of the award.  In the end, it really is your peers that say whether or not you are exceptional, with regards to this award.

«page 1 of 2








Calendar
May 2010
M T W T F S S
« Apr   Jun »
 12
3456789
10111213141516
17181920212223
24252627282930
31  
Content
SQLHelp

SQLHelp


Welcome , today is Friday, April 28, 2017