A linked server is a fabulous feature in SQL Server to help accomplish various data tasks between local and remote servers. There is a time and a place for the use of linked servers. Unfortunately, some are of the belief that a linked server is the solution to all problems including the way to end world hunger.
You may recall that I have written about linked servers on a few occasions. Usually that is due to the relationship of a poorly performing process and the linked server. Here is a prime example here or here. On the other hand, here are some examples of using a linked server in a much better light here and here.
Using a linked server is entirely up to you. If you choose to use a linked server, you will need to understand there are risks involved – beyond just the documentation and performance nightmares it can cause. Today, I will share one of those fabulous gotchas – mostly because it has cropped up more than once with my clients.
On more than one occasion I have had an emergency request because everything was broken. The everything in almost every incident is an SSIS package that is failing with error messages. The error message will typically have text similar to the following:
Could not locate statistics ‘_WA_Sys_00000015_346C780E’ in the system catalogs.
Due to the error, the package fails processing and grinds to a halt. When diving into the package it is discovered that the missing stats happen to be coming from a linked server query. This raises a big bright blaring alarm for me. Why is the SSIS package accessing the data via a linked server? This is rather counter-productive and definitely contrary to what is desired from a performance perspective.
You can certainly see why this is a sudden emergency right? Absolutely nothing is working anymore, right? Well, at least that is the way it seems for the person reporting the issue. Understanding that their realm of work has come to a halt is important in order to get to a resolution more quickly. Knowing that the point of failure is not as grande scale as claimed is critical because it can help you keep your cool while assessing the real problem.
Since the problem presents itself when accessing a specific object or even set of objects across a linked server, we have our scope of the issue scaled down quite a bit already. Now, what do we do from here? Obviously the statistic with that name is missing, right? Don’t be so hasty to assume it is actually missing. Let’s verify that the stat is indeed missing first. Even better – let’s just eliminate as many variables as possible. We can do this by querying the affected objects directly from SSMS.
Easy Troubleshooting for the DBA
For me, it is essential to eliminate variables when we run into a failure. In this case, I have just a few major variables that can potentially be causing this problem. Those variables are:
- Linked Server
- The query itself
- Bad Plan
- The stat is legitimately missing
I can easily assess the legitimacy of each variable through various quick tests. To eliminate or validate the “query” variable, I can issue a very simple query to retrieve data from the affected object. So let’s eliminate variables 1 and 3 in one fell swoop.
SELECT * FROM [MyLinkedServer].[DBName].[dbo].[WAXWORKLINE];
This, while disappointing, is actually quite productive. This has eliminated two variables for me. I now know that SSIS is not related to the problem. I also know that query complexity is not related to the problem. There are still a few variables left to either validate or eliminate. Since I know the problem occurs when querying via linked server, let’s try querying the remote server direct (not via the linked server).
Well, while that does not entirely eliminate or validate any variables, it does tell me that the problem is still specifically related to the linked server. What if I try to use OPENQUERY instead of the traditional linked server query?
Wow, look at that? Ok, so this is a bit of trickery because I have told SQL Server to execute that query on the linked server as if it were a “local” query on that server. This does work without error and is definitely pushing the findings to be more conclusive that it is a linked server problem.
While the openquery version works, I do still need to eliminate some variables from the problem. One of the variables is security. Since the error mentions sp_table_statistics2_rowset, I googled about for that proc and found some mentions that maybe there are some column denies related to the stats in question that is giving the linked server some major fits. Or it could also be insufficient permissions to execute DBCC SHOW_Statistics. I highly doubt this to be an issue since the openquery version works while using the same pass through authentication of the linked server that the traditional linked server query would use.
In order to eliminate security as a potential cause, the test is simple (while it could be more complex, I went for the jugular to just eliminate the theory as quickly as possible) – I will add my account as the pass through account (which is a sysadmin on the remote server) and then query the linked server all over again. Suffice it to say, there was no change in the result – the error persisted.
This does not just yet eliminate the security variable because there could be a cached plan somewhere. So, just to be sure, I chose to flush the cache on both the linked server and the local server. Running the simple “Select *” query all over again yielded no difference in the query results. I can now safely say that the problem is not related to a cached plan nor is it related to the security. At this point, I set the linked server security back to the way it was. I have effectively eliminated all variables but 2 and 6 (linked server and the stat is missing).
Let’s eliminate the missing stat variable right now.
Undoubtedly you have noticed that I built a drop statement into the result set from this particular query. That aside, the green highlighted row is the very stat that was producing the error. This stat is most certainly available on the source server (linked server). This entirely eliminates the sixth variable because the stat is not missing.
This brings us to the final step – see what happens when we drop the stat and try the query again. I have the drop statement ready to roll, so let’s just run that and see what happens. My risk here is very small. This is an auto-generated stat and will be recreated if needed. After dropping the stat, I run the simple “Select *” query across the linked server again and guess what? It works perfectly fine.
In my opinion, we can just skip a few of these steps if the problem is on a system generated stat and just go straight to dropping the stat and trying again.
That said, the better solution in this case would be to do one of two things.
- Modify the Package to not use the linked servers in this way and create a data source connection for each of the different server connections.
- Modify the Package to use OPENQUERY style queries.
My preference between those two options would be to create specific data sources. It seems very silly to pull data across the linked server to then be consumed via SSIS package.
I have just illustrated what I would classify as a nuisance problem related to linked servers. Simplifying the approach as I described with the two potential solutions would alleviate this nuisance while also giving an inevitable performance boost.
Parting thought: One observation of the “fix” that involves dropping the stats is that on busy systems it is a very short lived kluge. I have seen the problem re-surface within five minutes on busy systems. The stats are not really a problem, it is just how the linked server connection interfaces with the stats.