What’s the Size of that Index?

Categories: News, Professional, Scripts, SSC
Comments: 1 Comment
Published on: July 27, 2015

I have previously written about finding what the big space consumers are within the database. I even shared scripts in those articles on how to find those big consumers (read all about it here).

sizematters2That is grand when it comes to quickly finding which objects happen to be consuming the greatest amount of space within the database. It really helps to quickly assess where some extra TLC may need to be spent. But what about more specific details? What if you wished to figure out what the size was for each of the indexes within that object?

Well, the solution is pretty simple. I have another script to help determine which indexes are the big space consumers within the specified table for the given database.

 

First, let’s see what that script is.

As you can see, I am parsing a few different measures on space usage from the dynamic management view sys.dm_db_partition_stats. This will tell me information such as how much space is being reserved for each index as well as how much space is actually being used on each index. Then I use a grouping set to summarize that size information so I can then compare the total index size back to the size I would have seen from the table space script (discussed lightly earlier).

Why might I want to do something like this?

That is a really good question! The first obvious answer to a question such as this is in the off chance that you discover (again, from the use of the previous script), that one of your terabyte tables is actually only 150GB of data and the rest of that space consumption is actually from all of the indexes on that table.

When I see that a table has more space allocated due to indexes than due to data, I might be curious which indexes are contributing to that consumption. In addition, I can look at numbers much faster to see if maybe there are multiple indexes that are the exact same size. If they are, I might want to evaluate and dig a little deeper into the definitions of those indexes. We would only really love to have 850GB of indexes on a table to only find that three or four of those indexes are exact duplicates.

From there, I might even decide to look at the usage stats for the indexes in question. Why go about it in this fashion? Well, on some databases it is much faster to run a few quick queries than to try and run a big query against the entire database to get all of the index usage stats. I am sure you are familiar with those times when trying to query physical and operational stats takes hours to complete.

But But But…

I can hear you thinking right now. Another question has popped into your head. What about sp_spaceused. That old proc does work just fine for this as well. But there is something about being able to get to know what contributes to the space of an index and being able to work through how to get it calculated. Additionally, with a minor tweak, this query can provide that (detailed) information very quickly for all objects in the database.

Tweaked query

 

Missing Indexes Script v2

Comments: 11 Comments
Published on: May 15, 2014

missing_idx

Frequently we hear the analogy that <insert item here> is like opinions, everybody has one and not all of them are good (some may stink).

Well, this may just be another one of those <items>.  Whether it stinks or not may depend on your mileage.

I had shared a similar script back in January 2012 and wanted to share something a little more current.  As is the case for many DB professionals, I am always tweaking (not twerking) and refining the script to try and make it more robust and a little more accurate.

This version does a couple of things differently than the previous version.  For one, this is a single database at a time (the prior version looped through all of the databases with a less refined query).  Another significant difference is that this query is designed to try and pull information from multiple places about the missing indexes and execution statistics.  I felt this could prove more advantageous and useful than to just pull the information from one place.

Here is the current working script.

The following script gets altered on display.  n.VALUE is displayed but in the code it is actually n.value.  The code display is wrong but it is correct in the code as presented in the editor.  If copying from this page, please change the U-cased “VALUE” in the XML segment to “value” so it will work.  A download of the script has been added at the end.

[codesyntax lang=”tsql”]

[/codesyntax]

As has been the case in the past, this is not the end query.  I have a backlog of updates I want to make to the query and try to create something even more useful from it.

As a bit of fair caution, just because an index shows up as a recommended implementation from this script, it does not mean you should blanket implement it.  Also, like the previous script, the higher the impact value, the more probable the index will have a positive impact on your system.

In addition to the impact, I also look at the “Similar Queries” and “Similar Plans” numbers from the result set.  These values can sometimes lead to an understanding that their are other queries or other plans that could also benefit from the creation of the index.  Your mileage may vary.

Download script Missing_Index.

Quick and Dirty Index Info

Categories: News, Professional, Scripts, SSC
Tags: ,
Comments: No Comments
Published on: November 1, 2011

From time to time, somebody needs to find some information about the indexes in the database.  As DBA’s, there are a few tools at your disposal to look up that information.  The required/requested information is not always the same.  That said, you can usually adapt a query quickly to find what you need.

Where to Look

SQL 2005 and up have provided us with some very useful means to gain insight into the indexes in our databases.  The following DMOs is a small representation of that information.

  • sys.dm_db_missing_index_details
  • sys.dm_db_missing_index_group_stats
  • sys.dm_db_missing_index_groups
  • sys.dm_db_index_usage_stats
  • sys.dm_db_index_physical_stats

Add to that, there are some catalog views to help with finding information relevant to indexes in the databases.

  • sys.indexes
  • sys.index_columns

With just these few objects, there is a wealth of information at your fingertips.  I often find myself querying for information about indexes.  I also find myself being asked how to find this information on a relatively frequent interval.  Usually, I just end up writing out the query again (depends on mood and day).

Rewriting the query doesn’t take too much time when you know what you need to query.  That said, sometimes it is nice to have a base query ready to go.  From this query you can add/remove items as you see fit to get the information that you really need.

For instance, here is a query to get some of the fragmentation information.  I would probably just re-use this query over and over where applicable – only changing the parts that mattered to get the necessary result set.

[codesyntax lang=”tsql”]

[/codesyntax]

There are many types of queries we could write for the different relevant information pertaining to our indexes.  It is all a matter of knowing where to start.  I hope that knowing about these DMOs and catalog views, it will help shorten your efforts to retrieving this information.

Table Sizes

Comments: 2 Comments
Published on: May 3, 2010

In SQL Server there are two stored procedures that help one to determine the size of a table.  One is documented and more widely popular, the other is not.  These two stored procedures are sp_spaceused and sp_MStablespace.  Dukagjin Maloku wrote a blog article about the the two stored procedures in March of this year.  I don’t intend to rehash what he has covered.  What I intend is to look at the internals of the two stored procedures.  This will be the first of a multi-part series.

Why?

I saw a question asking how to find the table sizes for all of the tables in a database.  A few scripts were provided (one using sp_MStablespace and one using sp_spaceused).  I had been using the msTablespace version for quite some time and had never questioned it’s accuracy.  That changed when the question evolved into why do they (the two scripts) show the same results?  So, I decided I needed to figure out what is happening inside both procedures to see if I could determine a reason for the difference.  I will be comparing these on SQL Server 2008.

Comparison

sp_spaceused

This stored procedure has been mostly updated to use the new DMVs available since SQL 2005.  It currently relies heavily on sys.dm_db_partition_stats.  This procedure determines the size of the indexes by subtracting the number of pages from the number of used pages.  The idea here is that the number of Data Pages will be less than the total number of used pages.  This stored procedure does still rely on a soon to be deprecated object called sysfiles.  The replacement for that object would be sys.database_files.  The functionality would be somewhat different, in regards to how it is used in this procedure – the results are the same.  Despite that, this proc does not look to be on the replacement block any time soon.

sp_MStablespace

This stored procedure on the contrary does look like it will be deprecated, though I was unable to find information to back that up.  When delving into the procedure, one can quickly spot that it relies on sysindexes and sysobjects for much of its heavy lifting.  The sizing information will be coming from sys.partitions, sys.allocation_units and sysindexes.  This procedure uses sysindexes mostly for that sizing information and then calculates the data size based on clustered indexes and heaps.  Then to calculate the remaining index size, it subtracts that size from the total for all of the indexes on the specified table.  The math is a bit different and can be the reason for the difference.  This procedure seems to be in need of a decent sized overhaul if expected to not be deprecated.

Findings

After looking at the code and then running some tests with each of these procedures, I was able to corroborate the claim that they don’t return the same results.  With that in hand, I decided to take a stab at writing something different to help create a consistent return.  Thus I will be sharing some queries I have been working on for each of these.  The desired goal is that both would return the same results.  Based on the consistency of information between the DMVs and the necessary (and some bonus) elements for these queries, that may be a bit more difficult to do.  For the sp_MSTablespace query, the crux of the query will be the sys.dm_db_index_physical_stats function, while the sp_spaceused query may continue to rely on sys.dm_db_partition_stats or just sys.partitions.  Both queries may rely on a combination of these latter two DMVs, tune in to find out.

BLOB Index Columns

Tags: , , ,
Comments: 1 Comment
Published on: April 27, 2010

For the April T-SQL Tuesday, I blogged about a report to find information on the BLOBs in a database.  I have since seen a request to add to that script.  The addition would add some good information concerning the columns involved in a BLOB index.  This information is to find all of the columns that are involved in the index that includes a BLOB in the index.

Base Script

In that article I posted a script to help arrive at the final report.  There were a couple of things required for the setup.  I am including all of that information here in a single script.

[codesyntax lang=”tsql”]

[/codesyntax]

In this script, I made a slight alteration from the article I posted.  In that article, I somehow missed a change to the script I had been testing.  That change is in the Temp table that I created (to properly support the Join statements on each side of the Union Select statement).  I simply added the object_id.

The Change

[codesyntax lang=”tsql”]

[/codesyntax]

The guts of the change to add in the columns for this script comes with the following segment of code.

[codesyntax lang=”tsql”]

[/codesyntax]

With that snippet, I also needed to Join it to the select statements, and thus it was added on both sides of the Union statement as illustrated in the next example.

[codesyntax lang=”tsql”]

[/codesyntax]

In the above CTE, you will note that I used the STUFF function along with a FOR XML Path statement.  The combination of these statements allows one to concatenate a list into a comma separated list as I have done with the ColList (column list) column.  I also want to note here that I am using the text() keyword along with the FOR XML Path.  There are other methods of returning information back to the screen when using FOR XML Path.  I chose to use the text() in this case because I am just returning a concatenated list of columns that really only should read as text.  If I were returning a SQL statement, I would choose a different method to make the text more readable.

Conclusion

I chose to make this subtle change via the CTE due to the ease of understanding and readability of the code for me.  By illustrating the columns involved in an index that is on a BLOB column, one can gain greater insight into the use of the database.  I am glad that this change was requested because it makes sense to me.

I hope you find it useful.

Edit: Made a correction to the first script

Index Stats Duplication

Tags: ,
Comments: 2 Comments
Published on: April 21, 2010

I came across a recent posting about seeing multiple entries in sys.dm_db_index_usage_stats for the same index.  This kind of behavior can be somewhat concerning since that view should have a unique entry for each index.  As described by Microsoft, this view

“Returns counts of different types of index operations and the time each type of operation was last performed.”

You can read more about it here.  Further in the same document, one will read

“When an index is used, a row is added to sys.dm_db_index_usage_stats if a row does not already exist for the index. When the row is added, its counters are initially set to zero.”

Thus, I started checking to see if I could reproduce the same sort of results.  While investigating, I determined that I should Join that view to the sys.indexes view.  Since the sys.dm_db_index_usage_stats view does not contain an Index Name, I felt it necessary to be able to see the name and not just a number.

My first run at the query to try and reproduce the results is displayed below.

[codesyntax lang=”tsql”]

[/codesyntax]

Well this appears to produce results that are consistent with the description offered by Microsoft.  So I wanted to verify and added a windowing function into the mix.

[codesyntax lang=”tsql”]

[/codesyntax]

You will notice the line with Row_Number().  This gives me a count of each time that IndexName appears in the system view.  Well, now having verified further, I still have results consistent with the documentation from Microsoft.  I decided to work backwards from this script (yeah I know).  I will reduce the number of fields I have returned and try to get a duplication on a base level and then re-expand the fields being gathered.  The next stab at this looked like the following script.

[codesyntax lang=”tsql”]

[/codesyntax]

So for this revision, I simply removed any of the counter fields and the group by clause.  I was hoping that my Group By would reveal the answer.  Lo and behold, I am still unable to reproduce the results.  Ok, time to trim a little bit more as I work backward trying to create the same results.

[codesyntax lang=”tsql”]

[/codesyntax]

This time, I removed a Join condition of s.database = db_id() and the database_id and DatabaseName.  When I run the query now, I start to see the same results.  So the immediate indicator to me is that there is a duplication of indexes but not within the same database.  The duplication appears across different databases.  To verify, I know need to expand my query just a bit.

[codesyntax lang=”tsql”]

[/codesyntax]

Here, I was able to verify that the places where RowNum did not equal one – I had an index in a different database by the same name.  Not only is the index name the same, but the Object holding that index is also the same in the other database.  Now, if I want, I can expand my query all the way back to the original query knowing that I verified that the entries in sys.dm_db_index_usage_stats are unique for each index.  The conclusion I would draw from this exercise is that a query can have an impact on the results displayed.  One should check and then double check to make sure the results jive.  Try the query a few different ways and see if the results are consistent.  If not, what factor in the query was changed that changed your results?

Or is it?

While exploring this further I decided to verify some of the objects in different databases.  For instance, I might see the object being reported in the ReportServer database, and then also in msdb and then again in another database.  But the object only exists in ReportServer.  Why is this happening?  My thoughts on this are currently just an educated guess.  It appears that a connection is established in one database and then makes a call to an object in a second database and causing a use against the actual index in the call to the second database.  The record gets inserted into the metadata tables that this view pulls from with each database that was touched – but each time with the object_id and index_id of the second database.  Does that make sense?

Index Info

Categories: News, Professional, Scripts
Tags: , ,
Comments: 3 Comments
Published on: March 31, 2010

I recently came across an interesting script to help show index info and usage.  As I began to dissect the script, I noticed that the script was intended for SQL 2005 and beyond but was using SQL 2000 compatibility views.  I decided I would take it upon myself to update the script to use the 2005 objects and not the compatibility objects.

The design of the script is to give a basic view of the usage of the index and then try to evaluate the cost of that index.  If the index updates far outweigh the queries that use it, then the index may not be a useful index.  Furthermore, the index cost probably outweighs its use.

The original query employed the use of dbo.sysobjects and dbo.sysindexes.  The reason for the use of dbo.sysindexes was merely to calculate the size of the index.  The use of dbo.sysobjects seemed like an oversight.  The area that I ran into a sticky point was with calculating the size of the index.  The problem was in figuring out a method to execute the entire query and have it perform as well as the original query.  The first attempt yielded a query that was absolutely abysmal.

[codesyntax lang=”tsql”]

[/codesyntax]

The method I am using to calculate the size whilst using the SQL 2005 objects was to use the function sys.dm_db_index_physical_stats.  I employed this directly as a subquery in the above posted query.  This version takes in excess of 7 minutes to return results.  I moved on from it to find a better method.

The next attempt was to move the subquery into a CTE.

[codesyntax lang=”tsql”]

[/codesyntax]

This version returns in about thirty seconds.  Still not acceptable but substantially better.  The original query returned in roughly three seconds.  It is also important to note that the accuracy of both queries is equal.  Since both return the same result set, the only focus is on performance.  The delay in this query completely revolves around the use of the function.  Armed with the knowledge that the real delay is in the use of that function, there needs to be alternative method to pull back that information.  I decided to use a temp table to stage the data.  Using this temp table I could test directly the efficiency of the query.

[codesyntax lang=”tsql”]

[/codesyntax]

The population of the temp table consumes 97% of the query cost.  The prior version of the query also showed a cost of 97% associated directly with the function.  If I populate the temp table prior to the execution of the remainder of the script, then I can compare performance of this script versus the original.  When comparing, this version runs in about 1 second.

The next steps from here would be to employ a method that populates a table in an “Admin” database.  By using a pre-populated table I can maintain good performance to gain the information concerning Index Usage.

Index your FK

Categories: News, Professional, Scripts
Comments: No Comments
Published on: March 4, 2010

A few months ago I read an article from SQLServerCentral.com about some Foreign Key gotchas.  Since that article, I have seen several people asking about whether a foreign key should be indexed or not.  I have seen the benefits of having FKs indexed as well as the agony of not indexing them.  The article discusses some of the benefits.  In summary, it will help to avoid deadlocks, reduce locks, and improve performance.  My intent is not to go into depth on the article, and to focus on a couple of scripts to help identify which tables need indexes on the foreign keys.  This article is a part of my Key documentation series.

If you browse the internet, you will likely be able to find several scripts to accomplish the same goal.  There is one script that I have seen that will generate the script for you to create the index on the foreign key.  Many scripts are similar – some perform slightly different things.  I thought about evaluating some of the scripts and comparing them.  I have found some to be less accurate than others.  I decided after some thought that I would just share my scripts.  I will show the evolution of the script that I now use.

Rev 1

This was just to hammer out a solution that would get me the results I sought.  This version uses some objects that are to be deprecated.  Thus should only really be used on Servers that are SQL 2000.

[codesyntax lang=”tsql”]

[/codesyntax]

This works fine and with limitations.  Obviously there was the limitation of using objects that are scheduled to be deprecated.  The second limitation is that it is not 100% accurate.  This script does not pull all of the Foreign Keys that are missing Indexes.  This is the same sort of issue that I found with several scripts on the internet.  That said, this query is rather efficient.  When compared to future versions, it was 8% less costly and about 150ms faster.  However, it does consume more disk IO and more CPU time.

With these limitations in mind, I decided to create a second query.

Rev 2

[codesyntax lang=”tsql”]

[/codesyntax]

This query is less limited than the first.  Here, I have improved the performance substantially over the first query and the execution plan cost is lower.  I have implemented the use of SQL 2005 objects, decreased Disk IO, and decreased run time.  CPU time remains about the same as the previous example.  Still one limitation that is pretty glaring.  I am not returning all Foreign Keys that are missing an index.

Moving onto attempt number three.

Rev 3

As I was testing version three of this query, I noticed there was another item that I needed.  I ran the tests in a database with multiple schemas and had difficulty locating the tables that were in the list.  When looking at the expanded view in SSMS, the tables are grouped by schema and listed alphabetically within that schema.  When I realized where the table was, it dawned on me the need to include the schema in the query results.  Adding this little piece of information will save a few seconds when trying to verify the information in the report.

[codesyntax lang=”tsql”]

[/codesyntax]
In the code, note that I have opted to use the schema_name() function rather than use the Join on the sys.schemas view.  By using the function, the script performs a little better.  There is another addition I made to the script.

[codesyntax lang=”tsql”]

[/codesyntax]

An additional condition was added to the Join.  With the addition of this condition, I am now able to pull back the additional foreign keys that are missing indexes.  In addition to this additional condition, I have added an Order By, which will slow it down a bit.  The offset to the decrease there is that the data is more manageable for verification.

Conclusion

There are many methods to arrive at this data.  The difference in performance between the methods I wrote and evolved was negligible.  The most important concept in the end is having a solution that is accurate and timely to provide this data.  All of these solutions finish in under a second for me.  None of them place a significant impact on the server, and thus I can choose for accuracy over performance in this instance.

This exercise is a part of getting to know your database.  Finding and documenting is very helpful when it comes time to troubleshoot.  Once this information is garnered, one needs to evaluate whether or not to place an index on that foreign key.

SQLFool IDX Defrag Scripts

Categories: News, Professional, Scripts
Comments: No Comments
Published on: January 26, 2010

I just found out that Michelle Ufford (@sqlfool) has updated her Index Defrag scripts.  Right now she is looking for Beta testers.  Anybody that is interested could find more info at:

http://sqlfool.com/2010/01/index-defrag-script-updates-beta-testers-needed/

Indexes Out the Window II

Categories: News, Professional
Comments: 2 Comments
Published on: January 25, 2010

In Part I, I discussed some of the peculiarities and troubleshooting done in relation to a peculiar execution plan.  In this post, I will discuss some of what I did to further tune the query.

First, a little more background into the design of this query.  This query uses some dynamic SQL to try and build a dynamic where clause based on the value passed to a single parameter.  I need to be able to create three different where clauses that could essentially create a range-scan on a date field.  In two of the three queries could be an inequality date comparison rather than an equality comparison.  The specific requirements are to have the report return 1 of the following: 1) 2010 data only, 2) < 2010 data, and 3) all data.  This was initially accomplished with a query similar to:

[codesyntax lang=”sql” title=”Initial Where”]

[/codesyntax]

One of the first comments might be, “That query is not SARG-able.”  True it is not a SARG query.  That will be the last item to be evaluated.  The slowness of the query in this case has less to do with the SARG-ability of it than it does with the inequality scan.  The puzzle in the query was “How to make the query an equality comparison” rather than an < or > comparison.  The above query uses the indexes as discussed in the prior post – it is just not as fast as it could be.

It is at this point that I am reminded of an integral part of my last two presentations at my local PASS meetings.  And it dawns on me that an understanding of the data is truly necessary.  Most of the RecordDate values are Null since there was no backfill on the Column after it’s implementation.  Prior to it’s implementation there was no Date control on the table.  The table is in a warehouse and only contains data since the beginning of 2009.  It is also necessary to understand that there is a minimal chance that there will be record additions to the table after 2010.  Thus we now have a method to be able to get the equality comparisons.  The following might satisfy this need:

[codesyntax lang=”sql” title=”Next Rev”]

[/codesyntax]

This change is now using a Clustered Index seek.  This clustered Index seek is the most costly piece of the Execution plan.  However, the overall runtime of the query has improved by 1 sec.

Thinking I could still improve the query, I decided that a new column called ReportYear would help out the cause.  The change would allow for the query to be SARG-able, the column would have a default on it to change RecordDate to the first day of the year recorded in the RecordDate field.  I figured this idea would also still permit me to do an Eqaulity search in my Where clause.  Let’s see the results:

Wow, I have just come full circle back to the First Post in this series.  I just lost the clustered Index seek, but I did not lose any of the performance.  To see if I gained anything by making the query SARG-able, let’s look at the IO stats between the two.

First IO Stats

Second IO Stats

Top image represents the First Query and the bottom image represents the Second Query.  We have an improvement of about 600,000 logical reads.  That is a substantial improvement.  That improvement comes even with an Index Scan on an Index that is non-covering.  Now, to tune the index.  Much the same as the first article, I added an index to the SalesOpportunity table on LastUpdate and SalesOppID.  This creates a covering index and removes the hash match join in favor of the merge join in the execution plan.  Another effect is the IO Stats as displayed below.

Second Query Index

We can see here that there was once again an improvement made in the logical reads.  Another ~10,000 logical reads improvement.  Overall, both queries run in the same amount of time – but the second query uses less resources.  Again, if I change the clustered index on this table – I am certain I would see an improvement.

I am still in search of the reason behind the use of a bad index for this query.

Execution Plans can be downloaded here.

«page 1 of 2








Calendar
July 2017
M T W T F S S
« Jun    
 12
3456789
10111213141516
17181920212223
24252627282930
31  
Content
SQLHelp

SQLHelp


Welcome , today is Thursday, July 20, 2017