The Mystery of Missing Backup Files

Comments: 1 Comment
Published on: December 10, 2018

Yoink – Backups Gone

Not only is this mystery of missing backup files worth discussing now, but has been a mystery worth talking about previously (albeit in slightly different form). One such exploration into this mystery can be read here. And as a requirement for exploring any mystery some tools are necessary to be able to explore the clues that might be revealed. For this mystery, we will be utilizing this script along with our acutely developed sense of perception, ability to fact-find and our ability to troubleshoot.

Without further ado and all of our tools in hand, let’s begin!

Peter Deebweehay and Tommy Tsisamin work at Global Exchange Wholesale Retailers (any similarity to real entities is wholly unintentional and completely fictitious). Both Peter and Tommy are diligent hard working individuals. Peter happens to be a DBA and is stubborn about ensuring databases are backed up regularly. He has carefully planned out the disaster recovery scenarios and has nit-pickingly designed the backup strategy to ensure business continuity.

Tommy on the other hand is just as head-strong about performing his daily duties as the systems administrator. Tommy is responsible for the virtual machines, network infrastructure, operating systems, domain, mail services and desktop machines (OS to hardware). Tommy has also carefully planned out all of the disaster recovery needs of the business for all of the servers and specific workstations of individuals that might be pertinent to the continuity of the business.

Tommy and Peter get along together glowingly. The two work together any time there is a new SQL Server to be stood up and have worked together previously many times to ensure good performance of the SQL Servers on both physical and virtual machines.

Lately users have been complaining about a performance slowdown. The slowdown passes quickly but it keeps coming back. Peter starts to look into the reports and is slowly developing a pattern. He notices that the slowdown occurs every hour. Like a good DBA, Peter decides to implement an Extended Event session to capture the queries and server activity that occurs for a period of 3 hours to hopefully capture data that could indicate what the offender is.

After the three hour window has passed, Peter pulls out the captured data and starts looking it over. He notices that every hour there is a backup that occurs against this database. Peter knows that shouldn’t be happening because it is out of cycle with what he had implemented. To be absolutely certain, Peter decides to check the SQL Agent jobs and the Windows task scheduler. Nothing indicating a backup in either location. Next Peter pulls out his trusty backup history script to confirm the data he pulled from the XE session data. When he starts looking at the data, his jaw drops as he sees something that looks like the following.

Missing backup files

Looking at these results, Peter starts looking in the Backup share he had created for these backups and can’t find anything representing those backups with the GUIDs in the BackupPath result. Then he ponders for a minute. Peter knows he has seen something like this before and then recalls where he saw it. Those Full backups with the GUID as the path are coming from outside of SQL Server.

Peter is curious if Tommy could possibly help with the new-found mystery on this server. When Peter presents the information to Tommy, Peter is met with another jaw-dropping moment. Tommy, with a wry smile and pride in himself, explains to Peter exactly what those GUIDs happen to be (not knowing the connection). Tommy explains that he has set the virtual machine to perform an incremental backup every hour. This incremental backup is represented to SQL Server as a Full database backup and produces a backup that equates to .005MB. With this data, Peter is curious to know if Tommy is quiescing the database(s) prior to doing this incremental backup. As it turns out, Tommy was not. Peter was mildly frustrated but he was able to work out a better solution with Tommy. Without quiescing the database, these full backups become questionable as to the usability to actually recover this server (an incremental OS backup is not the same as a Database Differential backup).

As far as the database is concerned, these GUID backups are much the same as a missing backup. Any backup that is taken that a DBA cannot explain or find would also be a missing backup.

What does this mean for you?

In the case of Peter and Tommy, this meant that there was a bit of a mis-communication and a bit of a performance hit for these off cycle backups. Had Peter implemented differential backups, the recovery plan he had implemented would be for naught because each of those full backups caused by the incremental backup for the VM would break his differential chain. Also, if Peter had attempted to recover the database to a different server, he might have discovered this sooner and before the users had reported the performance issues that led him to the discovery. Not only is a backup plan essential, but monitoring of the logs and backups as well as routine practice restores is necessary.

After reading this article, enjoy more articles similar in nature such as this series on Extended Events or this series called “Back to Basics

Capture the Flag – The Trace Flag

Many people work in a tightly controlled environment. Every change requires a change control form and approval to make a change. In some environments, things are more lax, but you still wish you could tell when certain changes were made. That is even true in some of those tightly controlled environments. You know what I am talking about. Despite the controls in place there always seems to be some change that is unknown or even some cowboy that makes changes as s/he sees fit and forgets to notify people or follow process.

Then come the times when you are new to an environment and you really need to know some basic information about the servers you have just inherited. Well, truth be told, you don’t necessarily have to be new to a job to inherit some new servers. Sometimes, blackbox servers end up getting dropped into your realm of responsibility.

When you run into something like this, you really need to have some means to figure out what changed, when it changed and who changed it. Having that information as a base, you would then have the tools to inquire as to why it might have been changed. One of the things that would be nice to know is around the trace flags that can be enabled in SQL Server. There are trace flags (or knobs) for so many different things within SQL Server that it would be nice to get a handle on them if they do happen to be enabled or even disabled.

As a DBA in SQL Server, you should know by now that there are multiple different ways to do all sorts of things within SQL Server. Today we will discuss three different methods that you could use to perform this investigation for yourself.

Report Style

Did you know that Management Studio comes with a dashboard report? It does! In that report, one will find a pretty decent amount of overview information as a first pass to get to know your server. If you were to right click the instance within SSMS, you can browse to the Reports/Standard Reports submenus. From there you can select the Dashboard report. The path to the reports is similar to what is displayed in the following screen capture.

The report that is of interest is circled in red in the preceding image.

If you have any trace flags enabled (globally) you can see those listed in the report similarly to the following image:

Once again, I have encircled the relevant data in red. From this instance that I queried, I currently have one trace flag enabled. As it stands, this particular trace flag is the one that disables the noise of successful backups from being recorded to your log.

This is one particularly easy method to retrieve that information. A benefit from this method is that you will be able to see some graphs and charts and so forth by using the report.

SQL Query

As logic would dictate, it would seem that we should be able to trap that information through the use of our most frequently used tool – TSQL. Case in point is that the report itself requires the use of TSQL to produce the dataset for the nice display we would see when the report is rendered.

Using similar code to what is used to produce the Dashboard report, we can get a decent understanding of the trace flags that might be employed on your particular instance.  Let’s take a quick look at what we could do.

As you can see, that code is terribly simple. The crux of the code boils down to a DBCC command to return the results of TRACESTATUS. Granted, this is a bit over the top and designed to dump the data into a table for later consumption (for good reason). You can legitimately get away with simply running DBCC TRACESTATUS. The results of this would produce the following for the same instance of SQL Server shown in the Server Dashboard report we already explored. Those results look like the following.

queryout

So far, so good. If I had several trace flags enabled globally on this instance then I would see every single one of them in the result set just displayed. Cool!

To this point, this is wonderful for anything that might be currently enabled on the instance. Should any of those trace flags be disabled shortly after being enabled, or even be disabled months after being enabled but before you run either of the two preceding methods, then you are up a creek without a paddle. That is a problem. How would you trap that cowboy sysadmin or cowboy DBA in that case? Well, you could go with a full blown audit. Or you could try a similar approach but on a somewhat smaller scale.

If you are paying close attention, you are probably thinking to yourself that the query just demonstrated seems more complex than it need be.  After all, the documentation for DBCC TRACESTATUS says that I can run that command with a -1 parameter and just have it return the global trace flags.  And in the script I am showing, I filter on Global = 1.  Well, as it turns out, the -1 option in DBCC TRACESTATUS does not work as documented.  The -1 parameter does not return the appropriate result set and that is reproducible.  If you were curious, the MSDN reference is here.  The use of that parameter value still returns session flags as well.

Extended Event

Oh my, we are talking about something really abstract and difficult now! No, not really.  This is a pretty straightforward approach and easy enough to implement. Within XE, there is an event called trace_flag_changed. If you were curious, you could use a query such as the following to find any events that might be related. The key is finding the correct key terms to find what you need/want.

Running that query will return just one result. You might find a few more if you expand your search criteria to include the term “flag” within the description field. It’s up to you to explore a bit more. As the description and name imply for this particular event, you can presume correctly that it tracks when a TF is either enabled or disabled (the state of the TF changes). Let’s call this a mini audit for a very targeted type of event.

Knowing the name of the event, let’s go ahead and look at a possible setup for an event session to trap data:

This gets us to a point where we can now trap all changes to a trace flag status (enabled/disabled). At this point, a bit of a caution is that the event appears to fire twice for every time it occurs. There is one case where it will not produce a double result for each change. That case happens to be if the TF is already set to the same setting to which you are attempting to change it to. So if it is enabled, trying to enable it again just records one event to fire. However, trying to enable or disable the TF when it is in the opposite state, you will see two occurrences of the event firing. Here is a sample output to help show what happens.

xe_results Trace Flag

Displayed in the preceding image, you will notice both scenarios. In the scenario where the TF was not enabled, and I attempted to enable it, you will see two entries. And then when I disabled that TF after enabling it, I once again got a double entry. A corresponding example is encircled by the green. Now, when I tried to disable a TF that was not currently enabled, you see there is a single entry. That occurrence is marked via the orange circle.

A problem does present itself here with this result set and the use of XE.  How in the heck did I get the results from the XE Session so I could peruse it? Here is a sample script that will help parse this session data.

Now isn’t that really helpful?

Conclusion

Today I covered three different means to help isolate the trace flags in use or the changes to the state of those trace flags. This is good information that any DBA should want to know. It is information to help get an understanding of the health and operating environment of your instance. It is also a means to help identify what might be changing under your nose!

For more uses of Extended Events, I recommend my series of articles designed to help you learn XE little by little.

Need a little help with extensive audits of your servers, check out these articles.

Auditing when Database Files Change

As a Database Administrator, something that should be part of your database audit is monitoring the growth of files. That means tracking when the log file grows and tracking when the data file(s) grow(s). Many will argue that there is nothing worse than not knowing when a database-related file changed in size except maybe when it rapidly and constantly grows until the disk is filled.

Beyond just trapping when these events occur, the DBA usually also wants to know what caused the file to grow or shrink. This has been a problem that has plagued many a DBA over time. Sure there are plenty of methods to assist in the capture of such things. I even wrote about that quite some time ago here. In that article, I gave the tools to help track when a transaction log grew. Today, I am looking to share a different method on how to trap the executing SQL that causes a file to grow. And unlike that previous method, this method will also help to track when a file decreases in size.

I do want to point out that there is an alternate method to capture the shrink events if you wish to read more. I wrote previously about using the default trace to capture when a shrink occurs within the instance. This can be captured thanks to the default trace. You can read all about that over here. Despite the availability of the information within the default trace, I think the method I share here will be lightweight enough that it won’t hurt to have it running—at least occasionally.

Enter the Database Audit

If you have become accustomed to reading my articles, you will probably surmise that there is some degree of setup that needs to be done before I get to the meat of the matter. That holds true today. It just wouldn’t be right to share a method to do something without a harness to show how to get there. So, let’s get the setup out of the way.

Since we will be wanting to create something that will track when a file changes in size, we will need to have a baseline of the size of the files for the newly created Sandbox2 database. Let’s capture that with the following query. Small note on this query is that I am including the tempdb related file sizes. You don’t need to include that, but it may be of interest for future testing.

That is all the setup that is needed at this point. We are now ready to implement a tool that can help us monitor these changes. This tool is lightweight and pretty efficient at capturing the info that would be essential for the exceptional DBA to keep on top of the changes occurring in his/her environment. This tool comes to us in the form of an extended event. Let’s do a little checking into what is available before we dive into the actual XE session itself.

First, when you are curious, if there is an extended event that may be sufficient for your needs, you should check the event store to see what is there. We can do that via a query similar to the following.

This query may return something along the lines of what we see in the following image:

xe_events_2k8

From that list, we can immediately see that we have some options to help us try and capture what is happening with our databases when our back is turned. From here, we can query to find what kind of data is captured with each of these events. To do that, we simply need to run a query such as the following query:

Looking at the available data points in each of these sessions can prove promising. It is indeed promising enough that we can now proceed with the creation of an XE session.

I decided to just focus on the two events that included file_size_changed in the name of the event. You can also see that I chose to send this to two targets. You would be fine to just send this to an asynchronous file target. Just bear in mind that you do need to specify a path for the file_target that does exist or an error will be thrown. I have also specified that this session will restart on server startup and then I turned the session to the started state.

Does it work?

That is a really good question. Now that we have a test database and a session to trap events, all we need is to figure out how to test it. We would want to do a few different things to test—like grow and shrink a file in some way. And that is exactly what we are going to do at this point.

First test, since we have a really small empty database, is to try and force the database to grow by inserting some data. We will do that with this next query.

I didn’t post the numbers earlier for my files for the Sandbox2 database, but they were small. I had a 2MB data file and a 1mb log file. Now, when I look at the results from that last query that included the file size information, I will see a database that has grown a fair amount.

Database audit newfilesize

That should be enough of a change to have triggered something in our extended event session. Let’s take a look at the session. To do that, I am going to pull out a query to help parse the XML and see what has happened. Here is that query:

Despite having two targets in my session, we will only cover the query that helps parse the data from the asynchronous file target. When I run that query against the files that exist for this session I get a result set of 90 records on my system. That probably seems like a ton of results for such a small increase in the database. As it would happen, I left the growth settings at the default growth increments (don’t do that in a production system) and this means I get a ton of growth activities at very small numbers (1mb for the data file and 10% for the log file). Here is a sample of the output:

xe_output_grow

You can see how this might be helpful for when a process runs hog wild in the environment. What about going in the other direction though? What if we need to know about when the database is shrunk or when a file is shrunk? Well, let’s do that too. Let’s try the following query.

Again, I check for a baseline on the files to see if the file sizes changed. In this case, you can run that final query and compare or just trust me on it. Having shrunk both files in the Sandbox2 database, let’s check the XE session data again:

xe_output_shrink

Check that out! We have captured the shrink events too! There are multiple shrink events in this case only because I ran the shrink statements multiple times. This is excellent news for everybody that is trying to keep an eye on these database size changes. You can see in the XE session that I applied the sql_text() action. I have done this so I will be able to see what query caused the growth or shrink event to occur. Even better news is that this event session is perfect for those of you still on SQL 2008.

If you enjoyed this article, check out some of the follow-up articles: Data Growth Audits or Mysterious Growth.

Azure Data Studio and XEvents

Azure Data Studio (ADS) is getting all sorts of love and attention these days. So much so that they have finally gotten around to adding Extended Events (XE) to the tool – sort of. Now we have the power to run traces on SQL Server via ADS.

The presence of XE in ADS comes via an extension and comes with a few other caveats. I will explore the extension for XE available in ADS in this article and discuss some of the caveats. As you read the article, it might be helpful to go ahead and download ADS if you do not already have it.

History

Roughly 10 years ago Microsoft felt it necessary to introduce a cool tool called Extended Events. Soon after they decided deprecate the features called “Profiler” and “Trace”. Unfortunately the page with the deprecation announcement is no longer available, but some evidence of how long it has been deprecated is available here.

The deprecation announcement remains in effect (and online) for all versions since SQL Server 2012. It just may be difficult to find the 2012 announcement as we roll into newer releases of SQL Server.

Now, we have XE Profiler (or XEvent profiler depending on your release of SSMS – read more here). Profiler is deprecated and now we have some confusion in SSMS as to what is Profiler since we are now using that term with the “XE Profiler” feature.

Now enter ADS. XE is not included with ADS by default. You have to install an extension to gain access to the feature. So, the first thing you will need to do is visit the extensions node and then search for “SQL Server Profiler”. I can hear you right now. It isn’t even using any part of the real feature name anymore – they are just calling it the same exact thing as the deprecated feature.

And yes, my heart breaks a little more every time I see “SQL Server Profiler”. We have been teaching Database Professionals for years to use Extended Events and not SQL Server Profiler. And they have been adopting that change in rather large numbers. This just seems like it will cause so much more confusion. Nevertheless, once you have selected the extension, look to the right hand side and you will see a screen similar to this.

After installing the extension and the reloading ADS we are ready to start using this extension. For the remainder of this article, I will just refer to it as “XE extension”.

Where did it go?

After the reload of ADS, finding the XE extension is not really that easy. If you read the info page where you clicked install, there is some info there on how to access it. In short, on a Windows machine Alt-P will be your friend. Where you use that key combination is not your friend though. If you are in a script for instance and hit that key combo, no connection will be made to your server – even if your script is connected.

In addition to that not-so-obvious message, there is a more obvious message box that pops up in the bottom right corner letting you know a connection could not be established. Unfortunately, the problem can’t be resolved from this screen. Just close the tab and try again from the instance connection as shown here.

While a bit annoying, I can manage with that little caveat – just as long as I remember between uses what I did. After the XE extension is open, you should see a screen similar to the following.

In the preceding picture, I show three indicators of a connection being established. The top right corner in the example is difficult to tell that there is a server name there but it is. In my example I am just using the shorthand notation to connect to my server or “.” (a dot) which connects me to the localhost instance.

The top left indicator is a drop down list of all XE sessions I have on the server.

Beyond that, I don’t find it terribly useful. I can’t edit a session or script it from this tool yet.

Managing a session doesn’t appear to be possible at this point, so let’s try to create a new session and see what happens.

Sweet! The create session does something useful. I can create a new session. Unfortunately, all I can do is use one of three templates similar to the XE Profiler tool in SSMS. The full feature XE GUI tool has a much more complete list of templates and possibilities ever since SQL Server 2012. I documented that in this article.

Unfortunately, I have no use for the three default templates. So, for me, this tool drives me back to needing to use TSQL to create my sessions if I want to use ADS. Here is the big takeaway from that statement. The use of TSQL has been the biggest detractor for most Data Professionals when using XE. They want a full featured GUI. So, you are using ADS and must create an XE session, you will need to pull out your TSQL skills and probably need to pull down some of my helper scripts to get you going. The alternative would be to use SSMS where there is a full featured GUI that is more powerful than you might think.

Conclusion

There surely will continue to be more development around this idea of an XE style profiler. More development generally means that the product will mature and get better over time. This article shows how there is more being added to the feature to try and give you better control over the tool. We love control so the addition of these options is actually a good thing. Is it enough to sway me away from using the already established, more mature, and high performing tools that have been there for several generations? Nope! I will continue to use TSQL and the GUI tools available for XE that predated the XEvent Profiler.

Some say that data professionals really want the “Profiler” tool. In my opinion, that is certainly not the majority and now calling XE by the name “Profiler” is going to cause confusion at the least. Some say that maybe this tool needs to integrate a way to shred XML faster. To that, I say there are methods already available for that such as Powershell, the live data viewer, the Target Data viewer, or even my tools I have provided in the 60 day series.

How Long is That Event Taking

Knowing just how long an event takes is a common requirement when troubleshooting. Sometimes, figuring out the unit of time is a bit troublesome. Is it milliseconds, seconds or microseconds?

It is really easy to mistakenly use milliseconds when microseconds is required. It is also really easy to forget that one event is measured in seconds while another might be measured in milliseconds. To add to the confusion, what if the unit of measure changes between one version of SQL Server and the next?

While none could think it should be easy and consistent to figure out time, sometimes it just takes a little more effort. The same unit of time just isn’t applicable for every type of event. That is not just true in SQL Server but in life in general. You wouldn’t want to use hours when timing muzzle velocity, but hours could be entirely applicable to a surgery or training seminar.

Where does that leave us SQL Geeks when looking at timing of internal events inside of SQL Server? Well, we either need to do a little digging or we could simply read the rest of this article to find a simple script that can do the bulk of the work for us.

When dealing with events inside of SQL Server, the tool of choice to use is Extended Events (XE). If you are unfamiliar with XE, I really encourage you to take a look at these resources to become more familiar with the tool.

Time

None of us want to translate or interpret time incorrectly. Imagine the CIO looming over your shoulder asking how long before the database is back online. If you miscalculate the time and tell him 30 seconds when it really is 5 hours, the level of frustration and anger probably becomes exponentially worse.

While that may seem like a bit of an exaggeration, it’s not an experience any DBA likes to encounter. On a slightly smaller scale, when a developer asks how long a piece of code is running in production, you do want to be as accurate as possible. If the code takes 5 minutes to execute, the developer needs to know it was 5 minutes and not a miscalculated 3.9 seconds. 3.9 seconds may be entirely within the realm of acceptable for the project and thus be dismissed by the development team.

While trapping the precise time and calculating it may seem trivial, it is important (as previously mentioned) to use the correct time unit. There are many events within XE that provide a time unit of measure. The unit of measure is different through most of the events so a different calculation may be required depending on what you are troubleshooting at the moment.

Thankfully, there is some means to figure out if the time measurement is in seconds, milliseconds, microseconds or something different. We just need to pull it all out from the metadata views. Here is how we get to that data.

Holy crap! That doesn’t look easy. That is a ton more code than you might have expected. True. However, I like to have as much information as possible at my finger tips. In this case, I want to know the SQL Server version, channels, and search terms (keywords) related to the event.

Why have the extra data? When troubleshooting, it is nice to know if there are other events that might be related in nature that could shed a bit more light on the problem from a different angle. The use of keywords (think google search) and channels is perfect for helping me find those other events.

Here is what a snippet of that data might look like.

With this snippet, I can see there are multiple different units of measure but there are also multiple different keywords. These keywords can be essential to the troubleshooting process.

You will also see from that image that there are some events that don’t define the time unit very clearly. In fact it is just a null value for the description. That is a bit of a problem – but significantly less troublesome than not knowing the unit of measure for any of the events.

Conclusion

Figuring out the correct unit of time measurement can mean the difference between accurate troubleshooting or leaping down the wrong path. Figuring out the time units is made easier when you are able to query the metadata efficiently and have all of the pertinent details at your fingertips.

Extended Events is a powerful tool to help in troubleshooting and tuning your environment. I recommend investing a little time in reading the 60 day series about Extended Events. This is not a short series but is designed to provide an array of topics to help learn the tool over time. Don’t forget to go back and read the companion article showing how to audit these events via the default trace.

Monitor Database Offline Events

The other day, I shared an article showing how to audit database offline events via the default trace. Today, I will show an easier method to both audit and monitor for offline events. What is the difference between audit and monitor? It largely depends on your implementation, but I generally consider an audit as something you do after the fact. Monitor is a little more proactive.

Hopefully, a database being taken offline is a known event and not a surprise. Occasionally there are gremlins, in the form of users with too many permissions, that tend to do very strange things to databases and database servers.

Having read the previous article, you already know one method to try and find these database offline anomalies. That method may not be the most sleek solution nor most reliable given the possibility that events can quickly roll out of your default trace files. The better more reliable method is use Extended Events (XE) to monitor explicitly for those types of events. If you are unfamiliar with XE, I really encourage you to take a look at these resources to become more familiar with the tool.

Monitor

Beyond the power of XE to be able to better diagnose problems and trace events in your server, there is the ability to monitor for specific events as well. I won’t go into details about how to monitor with XE until a later article, but suffice it say I can monitor for Events to occur and immediately alert necessary parties to get more immediate action. This is quite some power for a built in tool and it is better than event notifications or agent alerts when it comes to ease of use and reliability.

As I look to monitor for these odd unplanned database offline events, I have several events within XE that can provide the requisite information: sqlserver.object_altered, sqlserver.database_started, sqlserver.database_stopped, and sqlserver.errorlog_written. I can hear you asking already “Wait, this seems to be a bit like a drill sergeant – very overbearing!”

Yes, it is probably a bit excessive for this session. However, I prefer to be comprehensive and the ability to link events together so I can better understand if it is a single one-off or if there is a bigger problem with the entire instance. Databases being stopped, started or set to offline should be rare and far between really. With that rarity in mind, the session should be relatively quiet.

If I have that session running and then take a database online/offline or vise versa, I will see something very similar to this output.

Starting from the bottom and working my way up, I can see that a command was issues to bring the 👻s database ONLINE. The very first thing that occurs is the request is written to the error log. Then I see that the database is in the stopped state. Next a message that the database is starting up (because it was stopped). Then we see two events for object_altered (similar to the default trace) due to the begin and commit phases of that transaction.

After that database was brought ONLINE, you can see that I immediately took 👻s back offline – starting with the errorlog event, then a stopped event and the object_altered begin and commit events.

Capturing each of the events I noted previously, not only gives me a complete picture of the event, it also can help me to identify if something happens in between the various “expected” events. If I use this session in my monitoring setup, then I can be quickly alerted to problems with a database as well as have the archive of the events to go back in time and AUDIT or troubleshoot the event of a database being offline or unable to come online.

With this XE Session running, I can be more confident that I have trapped and correlated the correct events in each of the sources. Using the default trace method, I have to make some highly likely correlations but there is still some “magic” involved. With the XE session, you will be far less likely to see any of those events roll out of the log as well. I can’t underscore the importance of that fact enough. The data will be there when you need it!

Conclusion

We all aspire to having a perfect database environment where nothing surprising or unexpected happens. Unfortunately, that is the desire of dreams and fairy tales. The unexpected will happen. A database can unexpectedly be taken offline. Are you prepared to address the problem fully to the CTO should it happen? This XE session can help you with that.

Extended Events is a powerful tool to help in troubleshooting and tuning your environment. I recommend investing a little time in reading the 60 day series about Extended Events. This is not a short series but is designed to provide an array of topics to help learn the tool over time. Don’t forget to go back and read the companion article showing how to audit these events via the default trace.

Extended Events File Initialization Failure

It should come as no surprise that I write a lot of articles about Extended Events (XE). This happens to be another article on Extended Events. Truth be told, this article is hopefully something that is more of an edge case scenario. Well, I sure hope that is the case and that it is not a common problem.

One of the recommended methods to trap payload data in an XE session is via the use of the event_file target. Sending data to a file has numerous benefits such as being able to take the trace and evaluate the trace file from a different machine (locally to that machine).

Every once in a blue moon you just may run into various issues with the event_file such as explained here or here. Though slightly different, the net effect is quite similar and should be treated with roughly the same kind of troubleshooting steps.

Configuration Error

As luck would have it, I ran into one of these rare opportunities to troubleshoot an error occurring on a client server. Truth be told, I was unfamiliar with the actual error at first. Here is that error.

Error: 25602, Severity: 17, State: 22.

The preceding error was scraped out of the SQL Server error log. Obviously a little more detail was needed because this error is far from useful without that detail. Looking a little deeper, I found some errors like this.

Msg 25602, Level 17, State 22, Line 43
The target, “5B2DA06D-898A-43C8-9309-39BBBE93EBBD.package0.event_file”,
encountered a configuration error during initialization. Object cannot be added to the event session.

Some very good clues are actually contained in that particular message. Some of these clues include the following: a) the term “target”, b) the term “event_file”, and c) the phrase “event session.” Ok, I get it at this point. One of my Extended Event Sessions I had put on the server and used previously was broken. But, since it had been working and I know I had fetched data from it, I found myself puzzled as to why it might be busted.

The next logical thing to do at this point was to test the various sessions that are stopped and try to figure out which one is causing the problem and see if the error is reproduced. Finally upon finding the session that is failing, I ran into the complete message.

Msg 25602, Level 17, State 22, Line 43
The target, “5B2DA06D-898A-43C8-9309-39BBBE93EBBD.package0.event_file”,
encountered a configuration error during initialization. Object cannot be added to the event session.
The operating system returned error 3: ‘The system cannot find the path specified.
‘ while creating the file ‘C:\Database\XEDROPME\SVRLoginAudit_0_131650006658030000.xel’.

The additional info that I needed is in bold text in the previous text. So, for some reason, there is a problem with the path for the XE trace file output. Going out to the file system to check it out, I found that the client in this case decided to delete the entire folder. How does that happen? Well, it does! When it happens, the XE traces will start to fail and you will no longer capture the intended trace data. Let’s take a look at a simulated reproduction of this issue.

First, I will create a session and then start the session, then validate the session exists and then stop the session.

And here is what I see on my test server when I validate the session exists.

Perfect so far. Now, let’s make that trace output directory disappear. For this demo, you might note that I had created the directory as “C:\Database\XEDROPME”. My intent in the name was obviously to notify the world that the folder was to be dropped.

That statement is easy enough and is performed from my test environment for those getting weary of the use of xp_cmdshell. Now, let’s try to start that session that we knew was previously running.

I re-formatted the output of the error for ease of readability. Otherwise, the output in the preceding image is what will happen when the output directory is dropped. The fix is rather simple at this point – put the folder structure back into place. To read an introductory post about checking if a session exists or not on your server, check this out (a more advanced post is coming soon).

Conclusion

From time to time we will run into various problems supporting Extended Events. This is bound to happen more frequently as we support more varied environments with more hands in the kitchen (so to speak). We need to learn that even small changes can have a ripple effect to other things that may be running on the server. It is worthwhile to perform a little due diligence and clean things up as we make changes – or at minimum to observe the system for a time to ensure no unintended consequences have occurred.

Extended Events is a powerful tool to help in troubleshooting and tuning your environment. I recommend investing a little time in reading the 60 day series about Extended Events. This is not a short series but is designed to provide an array of topics to help learn the tool over time.

Profiler for Extended Events: Quick Settings

Not long ago, I wrote a rather long article about a new-ish feature within SQL Server Management Studio (SSMS) that impacted Extended Events. You can read that book – here! The XEvents Profiler feature is one of those things that you may or may not use. If you consider using the feature, I do believe it is important that you research it a bit and try to learn the pros and cons first.

With that there is a little more about the feature that the aforementioned book did not cover. In fact, this information has pretty much gone ignored and mostly stays hidden under the covers.

Settings

As of SSMS 17.4 we have been given the ability to control XEvents Profiler just a tiny bit more. For what it is worth, we as Database Professionals love to be able to control our database environment. So this teeny tiny bit of new control ability is potentially a huge win, right?

If you are the controlling type, or maybe just the curious type, you will be pleased to know that under “Options” from the Tools menu in SSMS, Microsoft has tucked some new control options to help you configure XEvents Profiler – to a degree. If you open options, you will see this new node.

If you expand the “XEvent Profiler” node (circled in red), you will discover the “options” node. If you click on this “options” node and do a quick comparison (in SSMS 17.4 and SSMS 17.5) you will also find that you don’t need t expand the “XEvent Profiler” node at all because the options are listed in the right hand pane for both nodes and they are exactly the same. So, choose one or the other and you will end up at the same place.

The options that you currently have are:

  • Stop Session on Viewer Closed
  • Toolbar commands stop and restart

You can either set these options to True or False. I recommend you play with them a bit to discover which you really prefer. That said, I do prefer to have the “Stop Session on Viewer Closed” set to true. There is “profiler” in the name of the feature afterall. And if you have read the “book” I wrote about this feature, you would know that the filtering offered by the default sessions of this feature basically turn on the fire hose effect and can have a negative impact on your server. Are you sure you want a profiler style fire hose running on your production server?

Conclusion

There surely will continue to be more development around this idea of an XE style profiler. More development generally means that the product will mature and get better over time. This article shows how there is more being added to the feature to try and give you better control over the tool. We love control so the addition of these options is actually a good thing. Is it enough to sway me away from using the already established, more mature, and high performing tools that have been there for several generations? Nope! I will continue to use TSQL and the GUI tools available for XE that predated the XEvent Profiler.

Some say this is a way of bridging the gap. In my opinion, that gap was already bridged with the GUI that has been available for several years. Some say that maybe this tool needs to integrate a way to shred XML faster. To that, I say there are methods already available for that such as Powershell, the live data viewer, the Target Data viewer, or even my tools I have provided in the 60 day series.

I would challenge those that are still unfamiliar with the XE GUI (out for nearly 6 years now) to go and read some of my articles or articles by Jonathan Kehayias about the power that is in XE as well as some of the power in the GUI.

Checking the Health of your CLR

Comments: 1 Comment
Published on: January 16, 2018

The Common Language Runtime (CLR) is the fundamental nerve center of the Microsoft .NET Framework. It is this nerve center that provides the execution environment for all .NET Framework code. One may sometimes refer to CLR code as managed code.

CLR hosted from within SQL Server (CLR Integration), provides the ability to create stored procedures, triggers, user-defined functions, user-defined types, and user-defined aggregates. And because managed code compiles to native code, we sometimes see a bit of a performance gain for various scenarios.

It is this bit of a performance gain, that we may see more CLR integration use within our SQL environments. And with increased CLR there will be an increased chance for something to go a little haywire. If things can (and will go haywire), it is important to have a means to be able to monitor them.

If you know me, you are probably aware that when it comes to monitoring a problem, I will most probably recommend a solution that involves a little bit of XE. If you don’t know me, here is a hint: some of those monitoring tools can be found in this series.

CLR

I had a client reach out recently because they were having all sorts of issues with their CLR procs. The client is losing some sleep, a little weight, and a fair amount of hair trying to figure out what is causing their CLR nightmares. This client has magic CLR. The CLR works fabulous or a day or three. Then suddenly the CLR procs just disappear and have to be re-created on the server. Granted this seems more like a problem of somebody did something they didn’t realize they were doing and poof the CLR is gone.

So, what can we do about this? The answer is quite simple. I am going to use an extended event session that will monitor the various aspects of CLR from a couple of different angles. First, I need to try and find events that fit my needs. I can do that with the queries and instructions found here.

As I query the XEvent catalog, I can see that there is a limited set of options and they can be seen in the following image.

Most of the events can be found in the sqlclr package and in the debug channel. The exception in both cases being the “assembly_load” event. I will go ahead and add all of those events to my session except for the garbage collection event.

The next step is to help cover my bases and see if somebody might be changing the objects and causing the odd behavior. I will do that with the object_created, object_altered, and object_deleted events. The caveat here is I will limit the scope of those events to only look for CLR specific changes. I will be able to do that by filtering on object_type in each of those three events. Here is a look at those object types.

 

In Extended Events, we see that we can monitor for object changes in the various different objects such as “TRIGASM”,”PROCASM”, and “FNTABASM”. What do these map to though? Well, here is the answer to that question!

  • 16724 TRIGASM = Assembly (CLR) DML trigger
  • 17232 PROCASM = Assembly (CLR) stored-procedure
  • 21313 ASM = CLR Assembly
  • 21318 FNSCLASM = Assembly (CLR) scalar-function
  • 21574 FNTABASM = Assembly (CLR) table-valued function
  • 17985 AGG — Assembl Aggregate function (CLR)

With all of this rolled together, we can now assemble the XE Session.

Once deployed, I can go ahead and follow the instructions here to test different CLR managed code objects.

The Wrap

Having a tool to be able to monitor CLR health will be essential as you deploy more and more managed code within SQL Server. Extended Events offers a great lightweight means to do just that. This article has shown how to deploy a session that will capture the various changes with CLR objects within our database environment. In addition, you will be able to capture various conditions related to performance or problems with the managed code. You may even recognize some of the CLR events from the system_health session.

If you are interested in learning more about Extended Events, I recommend you read my “60 day” series of articles on Extended Events. The series continues to grow and covers a pretty decent depth and breadth on the topic.

Sharepoint Diagnostics and XE

Comments: 2 Comments
Published on: January 3, 2018

One of the all-time greatest and most beloved applications among DBAs happens to be Sharepoint. Most of us would be lying if we said that we loved Sharepoint and the kind of performance issues it can cause on a SQL Server.

When you have an application that comes with a notoriety of causing performance pain, you will want to have some tools or find some tools that can help you find the pain points. Lucky for the savvy DBA, there are tools to be able to capture this kind of information. Hint: some of those tools can be found in this series.

Tools

One of the best tools for capturing performance impacting data is Extended Events. Extended Events is simply a tool that everybody needs to learn. And in this case, it provides an opportunity to support and troubleshoot Sharepoint as well as SQL Server.

Truth be told, the default XE session for Sharepoint does not come installed when you install Sharepoint. But, there is a session that gets installed when you configure SPDiag for your Sharepoint needs. I am not going to dive deeply into that tool or the Sharepoint tooling. That would be well outside the use and scope of this post. Here is the default session created by that tool.

As you can see, there is not much special about the session. This is a basic session to capture statements queries executed against the server. In fact, you will probably note it is similar to both of the default XE Profiler sessions I discussed here. The first caveat here is the application filter for Sharepoint apps. That said, if you are running multiple Sharepoint sites and configure SPDiag on both, and both share the same backend database server, guess what happens? Yup, you guessed it – you get two Sharepoint sessions that capture the exact same data.

The second caveat here is a fun thing about the service that gets created to support the Sharepoint XE Sessions. The SPDiag will alter each of your XE (sharepoint related) sessions many times every few seconds. Every alter of the sessions will be to either stop or restart the session. This seems rather unnecessary and huge miss. There are not many misses with XE, but there can be a misuse of XE similar to all of the abuses of profiler and this is one of those abuses in my opinion. You can easily discover the constant changes to these sessions via an XE session that I will be sharing in a future post. I would recommend you stay tuned for that article and also watch the 60 Day Series for an update that includes that post.

The next caveat here is that this particular session is only configured to go to the ring_buffer. What is the problem there? Well, the ring_buffer means the trace data is volatile and you can easily lose it. In addition, this particular session has been known to cause some memory issues on many installations. You can check for a memory issue by watching for a growing MEMORYCLERK_XE memory count. You can watch it with a query like the following.

In my opinion, I see no good value in running the session as currently built. I would just rather build a better XE session and leave it running instead of constantly stopping and starting the XE session.

The Wrap

In this article I have introduced you to a quick session setup that comes from using the SPDiag tool that could plausibly be useful in the troubleshooting of various different problems (most probably performance related) with Sharepoint. This assuredly is in the absence of a better alternative such as a custom session that you write yourself.

If we browse the data made available to us, we can see that there is adequate info to help us convert all of those SQL Traces to the high-performing XE versions that we should be using these days. I encourage you to browse the capabilities of XE as you work towards converting your profiler based traces to the much friendlier and more powerful counterparts.

If you are interested in learning more about Extended Events, I recommend you read my “60 day” series of articles on Extended Events. The series continues to grow and covers a pretty decent depth and breadth on the topic.

«page 1 of 10

Calendar
December 2018
M T W T F S S
« Nov    
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Welcome , today is Wednesday, December 12, 2018