Linked Servers and Stats

Linked Servers

A linked server is a fabulous feature in SQL Server to help accomplish various data tasks between local and remote servers. There is a time and a place for the use of linked servers. Unfortunately, some are of the belief that a linked server is the solution to all problems including the way to end world hunger.

You may recall that I have written about linked servers on a few occasions. Usually that is due to the relationship of a poorly performing process and the linked server. Here is a prime example here or here. On the other hand, here are some examples of using a linked server in a much better light here and here.

Using a linked server is entirely up to you. If you choose to use a linked server, you will need to understand there are risks involved – beyond just the documentation and performance nightmares it can cause. Today, I will share one of those fabulous gotchas – mostly because it has cropped up more than once with my clients.

Remote Stats

On more than one occasion I have had an emergency request because everything was broken. The everything in almost every incident is an SSIS package that is failing with error messages. The error message will typically have text similar to the following:

Could not locate statistics ‘_WA_Sys_00000015_346C780E’ in the system catalogs.

Due to the error, the package fails processing and grinds to a halt. When diving into the package it is discovered that the missing stats happen to be coming from a linked server query. This raises a big bright blaring alarm for me. Why is the SSIS package accessing the data via a linked server? This is rather counter-productive and definitely contrary to what is desired from a performance perspective.

You can certainly see why this is a sudden emergency right? Absolutely nothing is working anymore, right? Well, at least that is the way it seems for the person reporting the issue. Understanding that their realm of work has come to a halt is important in order to get to a resolution more quickly. Knowing that the point of failure is not as grande scale as claimed is critical because it can help you keep your cool while assessing the real problem.

Since the problem presents itself when accessing a specific object or even set of objects across a linked server, we have our scope of the issue scaled down quite a bit already. Now, what do we do from here? Obviously the statistic with that name is missing, right? Don’t be so hasty to assume it is actually missing. Let’s verify that the stat is indeed missing first. Even better – let’s just eliminate as many variables as possible. We can do this by querying the affected objects directly from SSMS.

Easy Troubleshooting for the DBA


For me, it is essential to eliminate variables when we run into a failure. In this case, I have just a few major variables that can potentially be causing this problem. Those variables are:

  1. SSIS
  2. Linked Server
  3. The query itself
  4. Security
  5. Bad Plan
  6. The stat is legitimately missing

I can easily assess the legitimacy of each variable through various quick tests. To eliminate or validate the “query” variable, I can issue a very simple query to retrieve data from the affected object. So let’s eliminate variables 1 and 3 in one fell swoop.

This query, in this scenario, results in the following:

This, while disappointing, is actually quite productive. This has eliminated two variables for me. I now know that SSIS is not related to the problem. I also know that query complexity is not related to the problem. There are still a few variables left to either validate or eliminate. Since I know the problem occurs when querying via linked server, let’s try querying the remote server direct (not via the linked server).

Well, while that does not entirely eliminate or validate any variables, it does tell me that the problem is still specifically related to the linked server. What if I try to use OPENQUERY instead of the traditional linked server query?

Wow, look at that? Ok, so this is a bit of trickery because I have told SQL Server to execute that query on the linked server as if it were a “local” query on that server. This does work without error and is definitely pushing the findings to be more conclusive that it is a linked server problem.

While the openquery version works, I do still need to eliminate some variables from the problem. One of the variables is security. Since the error mentions sp_table_statistics2_rowset, I googled about for that proc and found some mentions that maybe there are some column denies related to the stats in question that is giving the linked server some major fits. Or it could also be insufficient permissions to execute DBCC SHOW_Statistics. I highly doubt this to be an issue since the openquery version works while using the same pass through authentication of the linked server that the traditional linked server query would use.

In order to eliminate security as a potential cause, the test is simple (while it could be more complex, I went for the jugular to just eliminate the theory as quickly as possible) – I will add my account as the pass through account (which is a sysadmin on the remote server) and then query the linked server all over again. Suffice it to say, there was no change in the result – the error persisted.

This does not just yet eliminate the security variable because there could be a cached plan somewhere. So, just to be sure, I chose to flush the cache on both the linked server and the local server. Running the simple “Select *” query all over again yielded no difference in the query results. I can now safely saw that the problem is not related to a cached plan nor is it related to the security. At this point, I set the linked server security back to the way it was. I have effectively eliminated all variables but 2 and 6 (linked server and the stat is missing).

Let’s eliminate the missing stat variable right now.

Undoubtedly you have noticed that I built a drop statement into the result set from this particular query. That aside, the green highlighted row is the very stat that was producing the error. This stat is most certainly available on the source server (linked server). This entirely eliminates the sixth variable because the stat is not missing.

This brings us to the final step – see what happens when we drop the stat and try the query again. I have the drop statement ready to roll, so let’s just run that and see what happens. My risk here is very small. This is an auto-generated stat and will be recreated if needed. After dropping the stat, I run the simple “Select *” query across the linked server again and guess what? It works perfectly fine.

In my opinion, we can just skip a few of these steps if the problem is on a system generated stat and just go straight to dropping the stat and trying again.

That said, the better solution in this case would be to do one of two things.

  1. Modify the Package to not use the linked servers in this way and create a data source connection for each of the different server connections.
  2. Modify the Package to use OPENQUERY style queries.

My preference between those two options would be to create specific data sources. It seems very silly to pull data across the linked server to then be consumed via SSIS package.

I have just illustrated what I would classify as a nuisance problem related to linked servers. Simplifying the approach as I described with the two potential solutions would alleviate this nuisance while also giving an inevitable performance boost.

What’s the Size of that Index?

Categories: News, Professional, Scripts, SSC
Comments: 1 Comment
Published on: July 27, 2015

I have previously written about finding what the big space consumers are within the database. I even shared scripts in those articles on how to find those big consumers (read all about it here).

sizematters2That is grand when it comes to quickly finding which objects happen to be consuming the greatest amount of space within the database. It really helps to quickly assess where some extra TLC may need to be spent. But what about more specific details? What if you wished to figure out what the size was for each of the indexes within that object?

Well, the solution is pretty simple. I have another script to help determine which indexes are the big space consumers within the specified table for the given database.


First, let’s see what that script is.

As you can see, I am parsing a few different measures on space usage from the dynamic management view sys.dm_db_partition_stats. This will tell me information such as how much space is being reserved for each index as well as how much space is actually being used on each index. Then I use a grouping set to summarize that size information so I can then compare the total index size back to the size I would have seen from the table space script (discussed lightly earlier).

Why might I want to do something like this?

That is a really good question! The first obvious answer to a question such as this is in the off chance that you discover (again, from the use of the previous script), that one of your terabyte tables is actually only 150GB of data and the rest of that space consumption is actually from all of the indexes on that table.

When I see that a table has more space allocated due to indexes than due to data, I might be curious which indexes are contributing to that consumption. In addition, I can look at numbers much faster to see if maybe there are multiple indexes that are the exact same size. If they are, I might want to evaluate and dig a little deeper into the definitions of those indexes. We would only really love to have 850GB of indexes on a table to only find that three or four of those indexes are exact duplicates.

From there, I might even decide to look at the usage stats for the indexes in question. Why go about it in this fashion? Well, on some databases it is much faster to run a few quick queries than to try and run a big query against the entire database to get all of the index usage stats. I am sure you are familiar with those times when trying to query physical and operational stats takes hours to complete.

But But But…

I can hear you thinking right now. Another question has popped into your head. What about sp_spaceused. That old proc does work just fine for this as well. But there is something about being able to get to know what contributes to the space of an index and being able to work through how to get it calculated. Additionally, with a minor tweak, this query can provide that (detailed) information very quickly for all objects in the database.

Tweaked query


Missing Indexes Script v2

Comments: 11 Comments
Published on: May 15, 2014


Frequently we hear the analogy that <insert item here> is like opinions, everybody has one and not all of them are good (some may stink).

Well, this may just be another one of those <items>.  Whether it stinks or not may depend on your mileage.

I had shared a similar script back in January 2012 and wanted to share something a little more current.  As is the case for many DB professionals, I am always tweaking (not twerking) and refining the script to try and make it more robust and a little more accurate.

This version does a couple of things differently than the previous version.  For one, this is a single database at a time (the prior version looped through all of the databases with a less refined query).  Another significant difference is that this query is designed to try and pull information from multiple places about the missing indexes and execution statistics.  I felt this could prove more advantageous and useful than to just pull the information from one place.

Here is the current working script.

The following script gets altered on display.  n.VALUE is displayed but in the code it is actually n.value.  The code display is wrong but it is correct in the code as presented in the editor.  If copying from this page, please change the U-cased “VALUE” in the XML segment to “value” so it will work.  A download of the script has been added at the end.

[codesyntax lang=”tsql”]


As has been the case in the past, this is not the end query.  I have a backlog of updates I want to make to the query and try to create something even more useful from it.

As a bit of fair caution, just because an index shows up as a recommended implementation from this script, it does not mean you should blanket implement it.  Also, like the previous script, the higher the impact value, the more probable the index will have a positive impact on your system.

In addition to the impact, I also look at the “Similar Queries” and “Similar Plans” numbers from the result set.  These values can sometimes lead to an understanding that their are other queries or other plans that could also benefit from the creation of the index.  Your mileage may vary.

Download script Missing_Index.

Quick and Dirty Index Info

Categories: News, Professional, Scripts, SSC
Tags: ,
Comments: No Comments
Published on: November 1, 2011

From time to time, somebody needs to find some information about the indexes in the database.  As DBA’s, there are a few tools at your disposal to look up that information.  The required/requested information is not always the same.  That said, you can usually adapt a query quickly to find what you need.

Where to Look

SQL 2005 and up have provided us with some very useful means to gain insight into the indexes in our databases.  The following DMOs is a small representation of that information.

  • sys.dm_db_missing_index_details
  • sys.dm_db_missing_index_group_stats
  • sys.dm_db_missing_index_groups
  • sys.dm_db_index_usage_stats
  • sys.dm_db_index_physical_stats

Add to that, there are some catalog views to help with finding information relevant to indexes in the databases.

  • sys.indexes
  • sys.index_columns

With just these few objects, there is a wealth of information at your fingertips.  I often find myself querying for information about indexes.  I also find myself being asked how to find this information on a relatively frequent interval.  Usually, I just end up writing out the query again (depends on mood and day).

Rewriting the query doesn’t take too much time when you know what you need to query.  That said, sometimes it is nice to have a base query ready to go.  From this query you can add/remove items as you see fit to get the information that you really need.

For instance, here is a query to get some of the fragmentation information.  I would probably just re-use this query over and over where applicable – only changing the parts that mattered to get the necessary result set.

[codesyntax lang=”tsql”]


There are many types of queries we could write for the different relevant information pertaining to our indexes.  It is all a matter of knowing where to start.  I hope that knowing about these DMOs and catalog views, it will help shorten your efforts to retrieving this information.

Table Sizes

Comments: 2 Comments
Published on: May 3, 2010

In SQL Server there are two stored procedures that help one to determine the size of a table.  One is documented and more widely popular, the other is not.  These two stored procedures are sp_spaceused and sp_MStablespace.  Dukagjin Maloku wrote a blog article about the the two stored procedures in March of this year.  I don’t intend to rehash what he has covered.  What I intend is to look at the internals of the two stored procedures.  This will be the first of a multi-part series.


I saw a question asking how to find the table sizes for all of the tables in a database.  A few scripts were provided (one using sp_MStablespace and one using sp_spaceused).  I had been using the msTablespace version for quite some time and had never questioned it’s accuracy.  That changed when the question evolved into why do they (the two scripts) show the same results?  So, I decided I needed to figure out what is happening inside both procedures to see if I could determine a reason for the difference.  I will be comparing these on SQL Server 2008.



This stored procedure has been mostly updated to use the new DMVs available since SQL 2005.  It currently relies heavily on sys.dm_db_partition_stats.  This procedure determines the size of the indexes by subtracting the number of pages from the number of used pages.  The idea here is that the number of Data Pages will be less than the total number of used pages.  This stored procedure does still rely on a soon to be deprecated object called sysfiles.  The replacement for that object would be sys.database_files.  The functionality would be somewhat different, in regards to how it is used in this procedure – the results are the same.  Despite that, this proc does not look to be on the replacement block any time soon.


This stored procedure on the contrary does look like it will be deprecated, though I was unable to find information to back that up.  When delving into the procedure, one can quickly spot that it relies on sysindexes and sysobjects for much of its heavy lifting.  The sizing information will be coming from sys.partitions, sys.allocation_units and sysindexes.  This procedure uses sysindexes mostly for that sizing information and then calculates the data size based on clustered indexes and heaps.  Then to calculate the remaining index size, it subtracts that size from the total for all of the indexes on the specified table.  The math is a bit different and can be the reason for the difference.  This procedure seems to be in need of a decent sized overhaul if expected to not be deprecated.


After looking at the code and then running some tests with each of these procedures, I was able to corroborate the claim that they don’t return the same results.  With that in hand, I decided to take a stab at writing something different to help create a consistent return.  Thus I will be sharing some queries I have been working on for each of these.  The desired goal is that both would return the same results.  Based on the consistency of information between the DMVs and the necessary (and some bonus) elements for these queries, that may be a bit more difficult to do.  For the sp_MSTablespace query, the crux of the query will be the sys.dm_db_index_physical_stats function, while the sp_spaceused query may continue to rely on sys.dm_db_partition_stats or just sys.partitions.  Both queries may rely on a combination of these latter two DMVs, tune in to find out.

BLOB Index Columns

Tags: , , ,
Comments: 1 Comment
Published on: April 27, 2010

For the April T-SQL Tuesday, I blogged about a report to find information on the BLOBs in a database.  I have since seen a request to add to that script.  The addition would add some good information concerning the columns involved in a BLOB index.  This information is to find all of the columns that are involved in the index that includes a BLOB in the index.

Base Script

In that article I posted a script to help arrive at the final report.  There were a couple of things required for the setup.  I am including all of that information here in a single script.

[codesyntax lang=”tsql”]


In this script, I made a slight alteration from the article I posted.  In that article, I somehow missed a change to the script I had been testing.  That change is in the Temp table that I created (to properly support the Join statements on each side of the Union Select statement).  I simply added the object_id.

The Change

[codesyntax lang=”tsql”]


The guts of the change to add in the columns for this script comes with the following segment of code.

[codesyntax lang=”tsql”]


With that snippet, I also needed to Join it to the select statements, and thus it was added on both sides of the Union statement as illustrated in the next example.

[codesyntax lang=”tsql”]


In the above CTE, you will note that I used the STUFF function along with a FOR XML Path statement.  The combination of these statements allows one to concatenate a list into a comma separated list as I have done with the ColList (column list) column.  I also want to note here that I am using the text() keyword along with the FOR XML Path.  There are other methods of returning information back to the screen when using FOR XML Path.  I chose to use the text() in this case because I am just returning a concatenated list of columns that really only should read as text.  If I were returning a SQL statement, I would choose a different method to make the text more readable.


I chose to make this subtle change via the CTE due to the ease of understanding and readability of the code for me.  By illustrating the columns involved in an index that is on a BLOB column, one can gain greater insight into the use of the database.  I am glad that this change was requested because it makes sense to me.

I hope you find it useful.

Edit: Made a correction to the first script

Index Stats Duplication

Tags: ,
Comments: 2 Comments
Published on: April 21, 2010

I came across a recent posting about seeing multiple entries in sys.dm_db_index_usage_stats for the same index.  This kind of behavior can be somewhat concerning since that view should have a unique entry for each index.  As described by Microsoft, this view

“Returns counts of different types of index operations and the time each type of operation was last performed.”

You can read more about it here.  Further in the same document, one will read

“When an index is used, a row is added to sys.dm_db_index_usage_stats if a row does not already exist for the index. When the row is added, its counters are initially set to zero.”

Thus, I started checking to see if I could reproduce the same sort of results.  While investigating, I determined that I should Join that view to the sys.indexes view.  Since the sys.dm_db_index_usage_stats view does not contain an Index Name, I felt it necessary to be able to see the name and not just a number.

My first run at the query to try and reproduce the results is displayed below.

[codesyntax lang=”tsql”]


Well this appears to produce results that are consistent with the description offered by Microsoft.  So I wanted to verify and added a windowing function into the mix.

[codesyntax lang=”tsql”]


You will notice the line with Row_Number().  This gives me a count of each time that IndexName appears in the system view.  Well, now having verified further, I still have results consistent with the documentation from Microsoft.  I decided to work backwards from this script (yeah I know).  I will reduce the number of fields I have returned and try to get a duplication on a base level and then re-expand the fields being gathered.  The next stab at this looked like the following script.

[codesyntax lang=”tsql”]


So for this revision, I simply removed any of the counter fields and the group by clause.  I was hoping that my Group By would reveal the answer.  Lo and behold, I am still unable to reproduce the results.  Ok, time to trim a little bit more as I work backward trying to create the same results.

[codesyntax lang=”tsql”]


This time, I removed a Join condition of s.database = db_id() and the database_id and DatabaseName.  When I run the query now, I start to see the same results.  So the immediate indicator to me is that there is a duplication of indexes but not within the same database.  The duplication appears across different databases.  To verify, I know need to expand my query just a bit.

[codesyntax lang=”tsql”]


Here, I was able to verify that the places where RowNum did not equal one – I had an index in a different database by the same name.  Not only is the index name the same, but the Object holding that index is also the same in the other database.  Now, if I want, I can expand my query all the way back to the original query knowing that I verified that the entries in sys.dm_db_index_usage_stats are unique for each index.  The conclusion I would draw from this exercise is that a query can have an impact on the results displayed.  One should check and then double check to make sure the results jive.  Try the query a few different ways and see if the results are consistent.  If not, what factor in the query was changed that changed your results?

Or is it?

While exploring this further I decided to verify some of the objects in different databases.  For instance, I might see the object being reported in the ReportServer database, and then also in msdb and then again in another database.  But the object only exists in ReportServer.  Why is this happening?  My thoughts on this are currently just an educated guess.  It appears that a connection is established in one database and then makes a call to an object in a second database and causing a use against the actual index in the call to the second database.  The record gets inserted into the metadata tables that this view pulls from with each database that was touched – but each time with the object_id and index_id of the second database.  Does that make sense?

Index Info

Categories: News, Professional, Scripts
Tags: , ,
Comments: 3 Comments
Published on: March 31, 2010

I recently came across an interesting script to help show index info and usage.  As I began to dissect the script, I noticed that the script was intended for SQL 2005 and beyond but was using SQL 2000 compatibility views.  I decided I would take it upon myself to update the script to use the 2005 objects and not the compatibility objects.

The design of the script is to give a basic view of the usage of the index and then try to evaluate the cost of that index.  If the index updates far outweigh the queries that use it, then the index may not be a useful index.  Furthermore, the index cost probably outweighs its use.

The original query employed the use of dbo.sysobjects and dbo.sysindexes.  The reason for the use of dbo.sysindexes was merely to calculate the size of the index.  The use of dbo.sysobjects seemed like an oversight.  The area that I ran into a sticky point was with calculating the size of the index.  The problem was in figuring out a method to execute the entire query and have it perform as well as the original query.  The first attempt yielded a query that was absolutely abysmal.

[codesyntax lang=”tsql”]


The method I am using to calculate the size whilst using the SQL 2005 objects was to use the function sys.dm_db_index_physical_stats.  I employed this directly as a subquery in the above posted query.  This version takes in excess of 7 minutes to return results.  I moved on from it to find a better method.

The next attempt was to move the subquery into a CTE.

[codesyntax lang=”tsql”]


This version returns in about thirty seconds.  Still not acceptable but substantially better.  The original query returned in roughly three seconds.  It is also important to note that the accuracy of both queries is equal.  Since both return the same result set, the only focus is on performance.  The delay in this query completely revolves around the use of the function.  Armed with the knowledge that the real delay is in the use of that function, there needs to be alternative method to pull back that information.  I decided to use a temp table to stage the data.  Using this temp table I could test directly the efficiency of the query.

[codesyntax lang=”tsql”]


The population of the temp table consumes 97% of the query cost.  The prior version of the query also showed a cost of 97% associated directly with the function.  If I populate the temp table prior to the execution of the remainder of the script, then I can compare performance of this script versus the original.  When comparing, this version runs in about 1 second.

The next steps from here would be to employ a method that populates a table in an “Admin” database.  By using a pre-populated table I can maintain good performance to gain the information concerning Index Usage.

Index your FK

Categories: News, Professional, Scripts
Comments: No Comments
Published on: March 4, 2010

A few months ago I read an article from about some Foreign Key gotchas.  Since that article, I have seen several people asking about whether a foreign key should be indexed or not.  I have seen the benefits of having FKs indexed as well as the agony of not indexing them.  The article discusses some of the benefits.  In summary, it will help to avoid deadlocks, reduce locks, and improve performance.  My intent is not to go into depth on the article, and to focus on a couple of scripts to help identify which tables need indexes on the foreign keys.  This article is a part of my Key documentation series.

If you browse the internet, you will likely be able to find several scripts to accomplish the same goal.  There is one script that I have seen that will generate the script for you to create the index on the foreign key.  Many scripts are similar – some perform slightly different things.  I thought about evaluating some of the scripts and comparing them.  I have found some to be less accurate than others.  I decided after some thought that I would just share my scripts.  I will show the evolution of the script that I now use.

Rev 1

This was just to hammer out a solution that would get me the results I sought.  This version uses some objects that are to be deprecated.  Thus should only really be used on Servers that are SQL 2000.

[codesyntax lang=”tsql”]


This works fine and with limitations.  Obviously there was the limitation of using objects that are scheduled to be deprecated.  The second limitation is that it is not 100% accurate.  This script does not pull all of the Foreign Keys that are missing Indexes.  This is the same sort of issue that I found with several scripts on the internet.  That said, this query is rather efficient.  When compared to future versions, it was 8% less costly and about 150ms faster.  However, it does consume more disk IO and more CPU time.

With these limitations in mind, I decided to create a second query.

Rev 2

[codesyntax lang=”tsql”]


This query is less limited than the first.  Here, I have improved the performance substantially over the first query and the execution plan cost is lower.  I have implemented the use of SQL 2005 objects, decreased Disk IO, and decreased run time.  CPU time remains about the same as the previous example.  Still one limitation that is pretty glaring.  I am not returning all Foreign Keys that are missing an index.

Moving onto attempt number three.

Rev 3

As I was testing version three of this query, I noticed there was another item that I needed.  I ran the tests in a database with multiple schemas and had difficulty locating the tables that were in the list.  When looking at the expanded view in SSMS, the tables are grouped by schema and listed alphabetically within that schema.  When I realized where the table was, it dawned on me the need to include the schema in the query results.  Adding this little piece of information will save a few seconds when trying to verify the information in the report.

[codesyntax lang=”tsql”]

In the code, note that I have opted to use the schema_name() function rather than use the Join on the sys.schemas view.  By using the function, the script performs a little better.  There is another addition I made to the script.

[codesyntax lang=”tsql”]


An additional condition was added to the Join.  With the addition of this condition, I am now able to pull back the additional foreign keys that are missing indexes.  In addition to this additional condition, I have added an Order By, which will slow it down a bit.  The offset to the decrease there is that the data is more manageable for verification.


There are many methods to arrive at this data.  The difference in performance between the methods I wrote and evolved was negligible.  The most important concept in the end is having a solution that is accurate and timely to provide this data.  All of these solutions finish in under a second for me.  None of them place a significant impact on the server, and thus I can choose for accuracy over performance in this instance.

This exercise is a part of getting to know your database.  Finding and documenting is very helpful when it comes time to troubleshoot.  Once this information is garnered, one needs to evaluate whether or not to place an index on that foreign key.

SQLFool IDX Defrag Scripts

Categories: News, Professional, Scripts
Comments: No Comments
Published on: January 26, 2010

I just found out that Michelle Ufford (@sqlfool) has updated her Index Defrag scripts.  Right now she is looking for Beta testers.  Anybody that is interested could find more info at:

«page 1 of 2

November 2017
« Oct    


Welcome , today is Sunday, November 19, 2017