Monitor Database Offline Events

The other day, I shared an article showing how to audit database offline events via the default trace. Today, I will show an easier method to both audit and monitor for offline events. What is the difference between audit and monitor? It largely depends on your implementation, but I generally consider an audit as something you do after the fact. Monitor is a little more proactive.

Hopefully, a database being taken offline is a known event and not a surprise. Occasionally there are gremlins, in the form of users with too many permissions, that tend to do very strange things to databases and database servers.

Having read the previous article, you already know one method to try and find these database offline anomalies. That method may not be the most sleek solution nor most reliable given the possibility that events can quickly roll out of your default trace files. The better more reliable method is use Extended Events (XE) to monitor explicitly for those types of events. If you are unfamiliar with XE, I really encourage you to take a look at these resources to become more familiar with the tool.

Monitor

Beyond the power of XE to be able to better diagnose problems and trace events in your server, there is the ability to monitor for specific events as well. I won’t go into details about how to monitor with XE until a later article, but suffice it say I can monitor for Events to occur and immediately alert necessary parties to get more immediate action. This is quite some power for a built in tool and it is better than event notifications or agent alerts when it comes to ease of use and reliability.

As I look to monitor for these odd unplanned database offline events, I have several events within XE that can provide the requisite information: sqlserver.object_altered, sqlserver.database_started, sqlserver.database_stopped, and sqlserver.errorlog_written. I can hear you asking already “Wait, this seems to be a bit like a drill sergeant – very overbearing!”

Yes, it is probably a bit excessive for this session. However, I prefer to be comprehensive and the ability to link events together so I can better understand if it is a single one-off or if there is a bigger problem with the entire instance. Databases being stopped, started or set to offline should be rare and far between really. With that rarity in mind, the session should be relatively quiet.

If I have that session running and then take a database online/offline or vise versa, I will see something very similar to this output.

Starting from the bottom and working my way up, I can see that a command was issues to bring the 👻s database ONLINE. The very first thing that occurs is the request is written to the error log. Then I see that the database is in the stopped state. Next a message that the database is starting up (because it was stopped). Then we see two events for object_altered (similar to the default trace) due to the begin and commit phases of that transaction.

After that database was brought ONLINE, you can see that I immediately took 👻s back offline – starting with the errorlog event, then a stopped event and the object_altered begin and commit events.

Capturing each of the events I noted previously, not only gives me a complete picture of the event, it also can help me to identify if something happens in between the various “expected” events. If I use this session in my monitoring setup, then I can be quickly alerted to problems with a database as well as have the archive of the events to go back in time and AUDIT or troubleshoot the event of a database being offline or unable to come online.

With this XE Session running, I can be more confident that I have trapped and correlated the correct events in each of the sources. Using the default trace method, I have to make some highly likely correlations but there is still some “magic” involved. With the XE session, you will be far less likely to see any of those events roll out of the log as well. I can’t underscore the importance of that fact enough. The data will be there when you need it!

Conclusion

We all aspire to having a perfect database environment where nothing surprising or unexpected happens. Unfortunately, that is the desire of dreams and fairy tales. The unexpected will happen. A database can unexpectedly be taken offline. Are you prepared to address the problem fully to the CTO should it happen? This XE session can help you with that.

Extended Events is a powerful tool to help in troubleshooting and tuning your environment. I recommend investing a little time in reading the 60 day series about Extended Events. This is not a short series but is designed to provide an array of topics to help learn the tool over time. Don’t forget to go back and read the companion article showing how to audit these events via the default trace.

T-SQL Tuesday #108: New Horizons Beyond SQL Server

Comments: No Comments
Published on: November 13, 2018

There comes a point in one’s career when a change is requisite. Big or small there always seems to be a tipping point that mandates some sort of change. Maybe the change is an entirely new career field. Maybe the change is adapting to the ever improving features of a specific software or product. Maybe, that change means learning a tangential technology.

This is precisely the goal Malathi Mahadevan (b | t) seems to have envisioned for the 108th installment of TSQL Tuesday.

If you are interested in reading the original invite, you can find that here.

So the challenge for this T-SQL Tuesday is – pick one thing you want to learn that is not SQL Server. Write down ways and means to learn it and add it as another skill to your resume. If you are already learning it or know it – explain how you got there and how it has helped you. Your experience may help many others looking for guidance on this.”

Personally, I am not one to settle, so learning and improving are important. New technologies, changes in technologies, new features, tangential technologies – they are ways to continue to learn and improve – most of the time. Sometimes, a new technology offers a good change of pace and offers an exit from something that is becoming too standard, while providing an entrance to something different, difficult, exciting and expanding.

Through the year (2018), I created a few goals for myself around some of these new or different technologies:

  1. Become proficient at MySQL (maybe even certify who knows)
  2. Become proficient at PowerShell
  3. Work towards the TCM (I got to busy with 1 and 2 to even accord any time towards this one)

Proficient is sort of a vague term because it can have a different meaning to different people. For me, I will describe what I have been doing to become proficient in both PoSh and MySQL.

PowerShell

A shout out is absolutely necessary for Adam Machanic (twitter) for picking the right blog meme that has been able to survive so long in the SQLFamily. This party has helped many people figure out fresh topics as well as enabled them to continue to learn.

I have dabbled in PowerShell over the years – just enough to “be dangerous” as some may say. I wouldn’t call it proficient and probably not even dangerous really. I was able to muddle my way through being able to create some basic scripts to perform certain tasks.

This rudimentary ability just isn’t enough to be able to put the skill on a resume (imho). It certainly wasn’t enough skill to be able to manage a large group of servers and perform various enterprise type tasks. My objective was to be able to create a set of tools for myself that I could use and have repeatable success at great ease.

I sat down with my first set of various tasks I wanted to be able to perform and worked on them as I set time aside to learn PoSh better – every day. It took several weeks and by no means am I on the same level as Rob Sewell (b | t) or Chrissy LeMaire (b | t) or Sean McCown (b | t). That said, I do feel I am far more capable in PoSh now than I was last year.

This is a skill that I plan to continue to hone. I am learning every time I pick it up and try to do something different. It is good that I am able to keep learning. Next year, I plan on being able to say I am more proficient than I am now. I am also hopeful to be able to be good enough to properly contribute to the dbaChecks project. I do also hope to share some of the scripts I have created on my blog as well.

Here are some of my first dabbles with powershell that I have integrated into other blog posts. Yes, they are very rudimentary.

MySQL

Learning MySQL is probably not too big of a stretch to be honest. I certainly think it is far more similar to SQL Server in many regards than PoSh. After all, it is just another little DBMS and does happen to be rather popular.

I wanted to pick up MySQL so I could support clients that have it installed here there and everywhere in their environments. Many clients have more than one DBMS platform and it is good to understand and be able to administer multiple platforms with a high level of competence. Unfortunately, MySQL comes with a fair amount of gotchas. There are serious limitations depending on version and flavor. Some clients may be stuck on a rather old version of MariaDB (akin to SQL 2000). This means developing a broad set of scripts and skills quickly and on the fly.

I have a ways to go in my learning due to the varied flavors of MySQL but I am getting there. I do feel pretty comfortable hopping in and troubleshooting performance issues and doing a quick health assessment at this point. I would call that proficient. Similar to what I said about PoSh, I plan on being able to say next year that I am more proficient. More opportunity with this platform is what lends itself to better proficiency.

TSQL2sDay150x150The Wrap

I believe in continuous integration / improvement when it comes to personal growth and development. It is necessary to keep your personal skills sharp as well as keep yourself marketable.

Oh, and if you are interested in some of my community contributions (which according to Jens Vestargaard is an awesome contribution), read this series I have published.

Audit Database Offline Events

Categories: News, Professional, Scripts, SQLBP, SSC
Comments: No Comments
Published on: November 13, 2018

Knowing when an event occurred within the database environment is a very important thing. Being able to act quickly when certain events occur is equally as important. Sometimes, we may not find out about an event for a few days or weeks and then we are asked to figure out the who, when, why and how of the event. Hopefully there was enough foresight in place to plan for such a request to travel back in time.

Recently while networking at a large database related conference, I had the chance to discuss some of this with a semi-desperate DBA. In his environment, apparently a database can be taken offline at random times without the involvement of the DBA team. He was looking for a solution to prevent it from happening. For me, the simplest solution is to revoke permissions from the person doing it. If you can’t revoke the permissions, then at least capture the pertinent information to document who and when the databases are being taken offline.

Auditing and Facts

I am a huge fan of auditing the database environment to gather facts. There is comfort that can be gained in knowing that if a change happens to a database or server under your purview, that you can report some facts back up the chain when a post-mortem is required. I recently shared another example of such an auditing scenario – here.

There is a technique in that article that may be of interest for you if you are restricted to implementing this kind of audit. I won’t go into detail about the poor man audit techniques in this article, so it is worth reading that previous article.

When looking to find data about a database being taken offline, one could capture the data from the default trace or from the SQL Server Error Log. While this article will show how to capture this data from both of those sources, it is important to understand that the data just might not be available in either location depending on settings and activity of the server combined with how old the offline event was.

Lets first take a look at the default trace. That one should be easy enough.

And a snippet of some possible output looks like the following:

If you look closely at the output, you will probably notice that I don’t get a lot of information. I find out that there was an ALTER event to a database but I do not know what the event was nor do I see the text from the SQL statement that caused the change. So, at best, I get a partial view of the picture by looking at just the default trace.

With that in mind, let’s look at the error log and see if that provides better information.

This is pretty cool here. I am checking all of the log files available on SQL Server and scanning for a search term of “offline.”

Ok, this shows a little bit of promise. I can see that the database

was indeed set to offline (or “option OFFLINE” was set to ON) with a spid and a timestamp. Unfortunately, the data shown here is far from being super useful in a post-mortem. The error log does not show who took the database offline.

What if, there was a way to combine these two queries into a single result set and better correlate the results into something highly useful? Something like that seems like it could be possible given we have a spid and timestamp in each result set.

Let’s try it and see.

If I use the timestamp from each result and the spid from each result set, I can surely join the two results together. As you can see, that is exactly what I did. I had to format the spid and timestamp a little bit – but that is ok. With that little bit of magic, I get a result set similar to the following now.

Now, I can easily see who put the database into offline mode. I can also tell you what the statement was when they executed the offline. I also know what the timestamp was when the database was put offline. This is usable data now when it comes time for a post-mortem or fact finding request.

The Wrap

Is this the most eloquent method to trap this kind of data? Absolutely not, but it is usable. When it comes to something more sleek and modern, I would recommend a better tool such as Extended Events. I will show how to do this same task with greater ease and reliability through the use of XE in my next article.

Oh, if you are interested in any of my other articles about Auditing or Extended Events, I recommend you read start in here (for Auditing) or here (for XE).

Refresh SQL Modules

As is true in most facets of life, things tend to get stale and old. Sometimes this staleness can be visibly represented as the wrinkles on your face. Other times, as with SQL Server, it may mean that a stored procedure or view stops working.

Unfortunately, when it comes to SQL Server, the symptoms are not as readily visible as aging lines would be. In SQL Server, the symptoms may be as random and difficult to recognize as a stored procedure just suddenly stops working.

What makes this even more difficult is that the stored procedure (that is no longer working) may appear to be entirely unchanged. These failures can occur when other changes have occurred to the system as well as when no change has occurred. Imagine the joys you could experience while trying to troubleshoot this kind of problem.

If you were keen, you would probably implement any number of Extended Event Sessions to help troubleshoot the issue. If you are unfamiliar, you might want a refresher course on how to setup an XE session which you could read from one of these articles, here and here.

If you are curious, there are more XE articles on my blog – here.

Metadata

There are many possible causes for the metadata to become fouled up inside the SQL modules on the database server. Invariably, according to the interested parties, nothing has changed in months! Right or wrong, there is still a problem to fix. Fortunately, the problem is easy enough to fix.

Sure, all of the modules could be re-deployed from source control. But, given that the modules don’t to appear to have lost any of the code within them, re-deploying code seems to be a bit of overkill. We don’t need to go that far. We can simply run sp_refreshsqlmodule for the stored modules (procs, triggers, functions, views etc). This would be pretty simple with the following script:

In this script, I am just going to refresh all modules in the database (Adventureworks2014 in this case). I could modify the script to trim it down to a specific module or set of modules. Instead, I leave that as homework for you.

That said, I do have a slightly different alternative that looks for any tables changed on a specific date. After finding those changed tables, then I update all modules related to the changed table(s). Here is how that would look:

From here, suppose you want to check for a range of dates where tables were modified. Again, I will leave that as homework for you.

After executing these scripts, I will have an output similar to the following:

This leaves you with one more step to perform – copy the values from the RefreshStmt column to a new query window and execute the statements.

The Wrap

In this article I showed a very simple solution to a problem that plagues some environments – out of date metadata. Forcing a module refresh will often resolve these types of issues.

With such a beautifully simple solution, I could have grouped this article into my “Back to Basics” series, but I did not. That said, there are some pretty interesting articles in the series including (but not limited to): Backups, backup history and user logins. Check them out!

Cannot Use the Special Principal – Back to Basics

I recently had a client call me out of the blue because he happened to be getting an error while trying to add a user to a database role. The error he was getting was “Cannot use the special principal ‘dbo’.”

This error has probably cropped up on me more than a few times. And on more than a few occasions, I have forgotten about the previous experiences. Some of that is because the fix is rather easy and after a few times seeing it, muscle memory takes over and you just fix it without thinking about it too much.

Until you get to that muscle memory moment though, you may flounder a bit trying this and that and failing then proceeding on to a level of frustration that has you losing precious hair.

As luck would have it, this is an article about security and principals and is similar in nature to some other articles I wrote about some fundamental misconceptions about permissions here and here.

I do hope that there is something you will be able to learn from this basics article. If you are curious, there are more basics articles on my blog – here.

Meet Prince Apole and Rolle…

Adding a user to the db_datareader database fixed role is a pretty simple task. Most of us can likely do that in our sleep. Even using the GUI is usually pretty reliable to do that. Every now again though, somebody has decided to get tricky on us. Maybe a mistake was made somewhere in a setting on the server and nobody has caught it because nothing was “broken” – until it was.

In the aforementioned case, I was asked to help resolve the issue and I found that there was a problem in how the database owner was set. Not only was it a problem in the current database but in 12 other databases on the same server. The systems admin was at wits end. He was dealing with something that was just not in his knowledge-base yet. I remember being in the same boat – so no big deal there. We talked about some of the things he had tried and how none of it was working. I am going to recreate the same basic scenario along with some of the attempted fixes in this article.

First, we need to create a database (best to break a database designed to be broken instead of an existing one used for something else already).

That is pretty straight forward – the database will be created with the data files in the default directories on your SQL Server instance. In addition, a login called mydomain\svc_dummy will be created as a windows login.

Now let’s try to set the owner of the database and then add the svc_dummy account to the datareader role.

There is a point of interest here. I said I was going to add svc_dummy to the datareader role – not dbo. Well, I set the database owner in the preceding step to svc_dummy so it basically became dbo. When I try to perform the role addition in the GUI and then script the change, this is the script that is produced. I will show why in a few moments.

The execution of the second part of the script results in the following:

Msg 15405, Level 16, State 1, Line 18

Cannot use the special principal ‘dbo’.

That is obviously not going to work. Let’s try fixing the script and add the svc_dummy principal instead of dbo.

Unfortunately, this results in the following:

Msg 15151, Level 16, State 1, Line 22

Cannot add the principal ‘mydomain\svc_dummy’, because it does not exist or you do not have permission.

Well, maybe the problem is because the user doesn’t exist then? Let’s try to create the user and see what happens.

Now we should see this message:

Msg 15063, Level 16, State 1, Line 32

The login already has an account under a different user name.

Oy vey. We seem to be going in circles. Nothing is working. The user is there but not really there. Let’s try to drop the user and just try to clean things up and start over.

I hope you see the problem with this one. Trying to drop dbo. I dunno but we should see an error here – and we do get an error.

Msg 15150, Level 16, State 1, Line 27

Cannot drop the user ‘dbo’.

Let’s fix the user then and try to drop the svc_dummy user instead of dbo.

Which in turn generates yet another error.

Msg 15151, Level 16, State 1, Line 52

Cannot drop the user ‘mydomain\svc_dummy’, because it does not exist or you do not have permission.

If I can’t resolve the problem by changing the user in the database, maybe I can just blow it out of the water by dropping the server login.

Yet another failure message will ensue. This time the message is:

Msg 15174, Level 16, State 1, Line 55

Login ‘mydomain\svc_dummy’ owns one or more database(s). Change the owner of the database(s) before dropping the login.

So far we have been able to skirt around the problem and generate six different error messages. The last one kind of gives us the best information on what we could do to resolve the issue. The login owns a database and therefore, we need to undo that ownership. Before we do that, let’s take a look at the database principal ‘dbo’.

We already know that svc_dummy is mapped to a user in the DummyDB database. We also know that we cannot add the svc_dummy user because of that prior mapping. We have also learned that when scripting the permissions change from the gui on the svc_dummy login and then generate the script it scripts out the user ‘dbo’. Due to this, let’s look in the sys.database_principals view at the dbo user and see what it tells us.

See how the dbo database principal says it is mapped to a windows account type? With this in mind, let’s join to the sys.server_principals and see what windows account is mapped to the dbo database user.

Now we see a bit more clearly. Combined with the error messages and the principal information for both the login and the user, we have a better view of the puzzle now. Changing the database owner indeed mapped the windows account to dbo for us and is now restricting us to certain activities when trying to manage permissions for the windows login in the database. From here, we can easily fix the issue by changing the database owner, creating a user mapped to the windows login and then adding that principal to the datareader role.

And if we run that script for svc_dummy we will see a successful execution as shown here.

The Wrap

In this article I took a rather long route to a simple fix. It’s easy to try each of the steps I showed in this article thinking it will help. It isn’t illogical to try some of those steps. They just don’t work unfortunately. In the end, getting to know the settings in the database and what the errors are really trying to get at is most helpful. Sometimes, it just takes a few more steps to get to the real meaning of the error.

This has been another post in the back to basics series. Other topics in the series include (but are not limited to): Backups, backup history and user logins.

page 1 of 1

Calendar
November 2018
M T W T F S S
« Jul    
 1234
567891011
12131415161718
19202122232425
2627282930  

Welcome , today is Saturday, November 17, 2018