SQL Servers Black Box Recorder – system_health

Yesterday i shared the first article in a three part series about the SQL Black Box Recorder, that you can read here, with a promise of at least two more posts. The first article in the series laid the groundwork about the black box recorder in SQL Server. Furthermore, there was a brief introduction into the three components of the black box recorder. The three components as laid out in that article are:

  • Default Trace
  • system_health Extended Event Session
  • sp_server_diagnostics procedure

The first article went into deeper detail about the first leg of this three-legged black box recorder – the default trace. In this article, the focus will be directed to the next leg of the black box recorder – or the system_health Extended Event Session. If by some measure you are in the dark about what Extended Events is, then I recommend you read my personal blog series on the subject. There are numerous articles stepping through the subject in easy-to-digest fashion. You can find many of the articles in the following table of contents – here.

The focus of this article will not be to introduce the topic of Extended Events. Rather, it will be to introduce the system_health session and dive into it a fair bit.

SQL Black Box Recorder

What is the system_health session?

Beyond being a component of the black box for SQL Server, what exactly is this event session? The system_health is much as the name implies – it is a “trace” that attempts to gather information about various events that may affect the overall health of the SQL Server instance.

The event session will trap various events related to deadlocks, waits, clr, memory, schedulers, and reported errors. To get a better grasp of this, let’s take a look at the event session makeup based on the available metadata in the dmvs and catalog views.

In addition to the types of events I mentioned, there are also a few more interesting things to discover from the results of this particular query. For instance, the actions that are being employed help to gather even more information such as the callstack, tsql_callstack, database id, and sql text of the query that was executing at the moment of the event that is trapped.

One very peculiar event that is trapped is the sp_server_diagnostics_component_result event. Remember from earlier in this article that I noted the third leg of the black box record happens to be a very similarly named session – “sp_server_diagnostics procedure”. When I cover this third leg, it will make a little more sense. Understand, for now, that your suspicion of it being a bit more integral to the black box (than merely 1/3 of the overall black box) is confirmed.

This is a good set of information that can be trapped. Furthermore, this is a good set of information which can give a fairly good diagnosis of several different potential problems within your SQL Server Instance! (Yes it may seem repetitive, but I wanted to underscore the importance of the data that can be trapped.) That said, compare the types of events to those of the default trace. Take note of how many events actually overlap between the two components of the black box recorder.

For ease of comparison, let’s simplify that previous query to get a set of events to read without the duplication due to the actions on each event.

And since I am feeling particularly generous, I have lumped the events from the two traces together in the following query:

That should help to compare the 17 events from the system_health session and the 34 from the default trace with a little more ease and speed. The events from the two sessions are rather complimentary to each as they help to construct the SQL Server Black Box recorder.

How to use this session?

With the events trapped by this session, there are a few ways to use the trapped data that should stand out pretty easily. To me, the quick application of the data from this session can come from the review of deadlocks, reported errors, or even waits analysis. Using the deadlock as probably the most common problem people will want to troubleshoot, this information is suddenly much more valuable and easy to access (no more need for that trace flag). Since the system_health is now trapping the deadlock by default, one could quickly access the deadlock graph from the event data and proceed to troubleshooting. More information on getting that deadlock graph can be found here and here.

Learn to use the data made available through this member of the black box recorder. Becoming familiar with this tool (and additionally with Extended Events) will make you a rock-star in the office. The information needed to troubleshoot most problems is right at your fingertips through the use of the different pieces of the black box recorder. If it is not available there, then more than likely it can be made available through a bit of tinkering with Extended Events. Get to know the tools and be that rock-star DBA.

For more uses of Extended Events, I recommend my series of articles designed to help you learn XE little by little.

Interested in seeing the power of XE over Profiler? Check this one out!

This has been the fifth article in the 2018 “12 Days of Christmas” series. For a full listing of the articles, visit this page.

1 Comment - Leave a comment
  1. […] Part two looks at the system_health Extended Event session: […]

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.



Calendar
December 2018
M T W T F S S
« Nov   Jan »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Welcome , today is Monday, January 21, 2019