Event Tracing for Windows Target

There are many useful targets within SQL Server’s Extended Events. Of all of the targets, the most daunting is probably the Event Tracing for Windows (ETW) target. The ETW target represents doing something that is new for most DBAs which means spending a lot of time trying to learn the technology and figure out the little nuances and the difficulties that it can present.

With all of that in mind, I feel this is a really cool feature and it is something that can be useful in bringing the groups together that most commonly butt heads in IT (Ops, DBA, Devs) by creating a commonality in trace data and facts. There may be more on that later!

Target Rich

The ETW target is a trace file that can be merged with other ETW logs from Windows or applications (if they have enabled this kind of logging). You can easily see many default ETW traces that are running or can be run in Windows via Perfmon or from the command line with the following command.

And from the gui…

Finding the traces is not really the difficult part with this type of trace. The difficult parts (I believe) come down to learning something new and different, and that Microsoft warns that you should have a working knowledge of it first (almost like a big flashing warning that says “Do NOT Enter!”). Let’s try to establish a small knowledgebase about this target to ease some of the discomfort you may now have.

One can query the DMVs to get a first look at what some of the configurations may be for this target (optional and most come with defaults already set).

Six configurations in total are available for the ETW target. In the query results (just above) you will see that the default value for each configuration option is displayed. For instance, the default_xe_session_name has a default value of XE_DEFAULT_ETW_SESSION. I like to change default names and file paths, so when I see a name such as that, rest assured I will change it. (Contrary to popular belief, the path and session name default values can certainly be changed.)

As I go forward into creating an XE session using the ETW target, it is important to understand that only 1 ETW session can exist. This isn’t a limitation of SQL Server per se, rather a combination of the use of the ETW Classic target (for backwards compatibility) and Windows OS constraints. If the ETW target is used in more than one XE session on the server (even in a different SQL Server instance), then all of them will use the same trace target in windows (consumer). This can cause a bit of confusion if several sessions are running concurrently.

My recommendation here is to use a very precise and targeted approach when dealing with the ETW target. Only run it for a single XE session at a time. This will make your job of correlating and translating the trace much easier.

The ETW target is a synchronous target and does NOT support asynchronous publication. With the synchronous consumption of events by the target, and if you have multiple sessions with the same event defined, the event will be consumed just a single time by the ETW target. This is a good thing!

Two more tidbits about the ETW target before creating an event session and looking at more metadata. The default path for the target is %TEMP%\<filename>.etl. This is not defined in the configuration properties but is hardwired. Any ideas why one might want to specify a different path? I don’t like to use the temp directory for anything other than transient files that are disposable at any time!

Whether you change the directory from the default or leave it be, understand that it cannot be changed after the session starts – even if other sessions use the same target and are started later. However, if you flush the session and stop it, then you can change it. I do recommend that it be changed!

Second tidbit is that other than the classic target, ETW does have a manifest based provider. Should Extended Events (XE) be updated to use the manifest based provider then some of the nuances will disappear with translating some of the trace data (future article to include ntrace and xperf – stay tuned). For now, understand that viewing the ETW trace data is not done via SQL Server methods. Rather, you need to view it with another tool. This is due to the fact that the ETW is an OS level trace and not a SQL Server trace.

Session Building

If it is not clear at this point, when creating an XE session that utilizes the ETW target, two traces are, in essence, created. One trace is a SQL server (XE) trace that can be evaluated within SQL Server. The second trace is an ETW trace that is outside the realm of SQL Server and thus requires new skills in order to review it. Again, this second trace can be of extreme help because it is more easily merged with other ETW traces (think merging perfmon with sql trace).

When I create a session with an ETW target, it would not be surprising to see that I have two targets defined. One target will be the ETW target and a second may be a file target or any of the others if it makes sense. The creation of two targets is not requisite for the XE session to be created. The XE data will still be present in the livestream target even without a SQL related target.

Before creating a session, I need to cover a couple of possible errors that won’t be easy to find on google.

Msg 25641, Level 16, State 0, Line 101 For target, “5B2DA06D-898A-43C8-9309-39BBBE93EBBD.package0.etw_classic_sync_target”, the parameter “default_etw_session_logfile_path” passed is invalid.
The operating system returned error 5 (ACCESS_DENIED) while creating an ETW tracing session.
ErrorFormat: Ensure that the SQL Server startup account is a member of the ‘Performance Log Users’ group and then retry your command.

I received this error message even with my service account being a member of the “Performance Log Users” windows group. I found that I needed to grant explicit permissions to the service account to the logging directory that I had specified.

Msg 25641, Level 16, State 0, Line 105 For target, “5B2DA06D-898A-43C8-9309-39BBBE93EBBD.package0.etw_classic_sync_target”, the parameter “default_xe_session_name” passed is invalid.
The default ETW session has already been started with the name ‘unknown‘.
Either stop the existing ETW session or specify the same name for the default ETW session and try your command again.

This error was more difficult than the first and probably should have been easier. I could not find the session called ‘unknown’ hard as I might have tried. Then it occurred to me (sheepishly) that the path probably wanted a file name too. If you provide a path and not a filename for the trace file, then this error will nag you.

I found both error cases to be slightly misleading but resolvable quickly enough.

The session is pretty straight forward here. I am just auditing logins that occur on the server and sending them to both the ETW and event_file targets. To validate the session is created and that indeed the ETW session is not present in SQL Server, I have the following script.

Despite the absence of the ETW session from SQL Server, I can still easily find it (again either shell or from the perfmon gui). Here is what I see when checking for it from a shell.

Even though the session (or session data) is not visible from SQL Server, I can still find out a tad more about the target from the XE related DMVs and catalog views.

Running that query will result in something similar to this:

The Wrap

I have just begun to scratch the surface of the ETW target. This target can provide plenty of power for troubleshooting when used in the right way. The difficulty may seem to be getting to that point of knowing what the right way is. This target may not be suitable for most troubleshooting issues – unless you really need to correlate real windows metrics to SQL metrics and demonstrate to Joe Sysadmin that what you are seeing in SQL truly does correlate to certain conditions inside of windows. Try it out and try to learn from it and figure out the right niche for you. In the interim, stay tuned for a follow-up article dealing with other tools and ETW.

For more uses of Extended Events, I recommend my series of articles designed to help you learn XE little by little.

Interested in seeing the power of XE over Profiler? Check this one out!

This has been the tenth article in the 2018 “12 Days of Christmas” series. For a full listing of the articles, visit this page.

How to: File Target use in Extended Events

I have been doing a lot of posts of late about extended events but have failed to share some of the essential building blocks. I am not looking to do a series of definitions, but just share a couple of quick tips and examples here and there that will make the use of extended events a little easier to undertake.

The first one to tackle is the use of the file target. Using a file target is not difficult from a creation point of view (event session creation). But, some basics are critical or you could end up with fewer follicles.

Let’s take a quick peek at some of these tips.

File System Prep

This first little tip comes from a painful experience. It is common sense to only try and create files in a directory that exists, but sometimes that directory has to be different on different systems. Then comes a little copy and paste of the last code used that worked. You think you are golden but forgot that one little tweak for the directory to be used. Oops.

Take this example session. When you run this code, you will not see an error (don’t change anything).

However, if you try to start the session, you will receive a pretty little error. Here is the code to try and start the session, along with it’s associated error.

Msg 25602, Level 17, State 22, Line 25
The target, “5B2DA06D-898A-43C8-9309-39BBBE93EBBD.package0.event_file”, encountered a configuration error during initialization. Object cannot be added to the event session. The operating system returned error 3: ‘The system cannot find the path specified.
‘ while creating the file ‘C:\Database\XEs\FileTargetDemo_0_130670484928250000.xel’.

If you double check in the SSMS GUI, you will see that the session is there. It is just stopped. You can also check that the session exists if you recall my query from a previous post where I show the status of the session.

If you take it a step further and try to read the log file that should be created, you will be greeted once more with a reminder that there is a problem. Try this (sans change again).

And you will receive the following message:

Msg 25718, Level 16, State 3, Line 31
The log file name “C:\Database\XEs\FileTargetDemo*.xel” is invalid. Verify that the file exists and that the SQL Server service account has access to it.

The beauty here is that all is not lost. These are easy problems to get around and resolve. All it takes is to adjust the file path to be the correct path or a path that exists (to which you have permissions).

Read the Log File

If you have been reading this article, then this might seem like a bit of redundancy. That is by design. DBAs love redundancy—well at least when it means there is a means to recovery due to the redundancy.

Unlike reading from the ring_buffer, when you read from the file target, you will use different code. The file target is stored in a binary representation on disk that a function will produce as XML format for human readability. So to read a file target, you would need to do something like the following.

And you definitely do not want to try the following with a file target. That is unless of course you are looking to go bald at an earlier than expected age in this profession.

If you find yourself doing this version, all you will get for results is a short xml string similar to the following:

This is not what you would be looking for when trying to read the event session data. That said, it at least provides some information that could be of use. You will see that buffers are tracked and the file name is tracked. For this last demo, I corrected the FileTargetDemo session to use a directory on the file system that existed and to which I had the appropriate level of permissions.

I just covered two quick tips to help make life a little bit easier when dealing with file targets and extended event sessions. I will try to continue little tips like this over the coming months. I hope you have found this useful and that you will indeed be able to put it to use.

 

For more uses of Extended Events, I recommend my series of articles designed to help you learn XE little by little.

Interested in an article demonstrating the power of XE over Profiler? Check this one out!

This has been the third article in the 2018 “12 Days of Christmas” series. For a full listing of the articles, visit this page.

Encrypting a Database Never Finishes

Categories: News, Professional, Security, SSC
Comments: No Comments
Published on: December 20, 2018

There is plenty of legislation and regulation in place these days that strongly suggest the encryption of data within a database.  In SQL Server, we have the ability to comply with these regulations in a couple of different ways.  We can choose to encrypt at the column level or we can choose to encrypt the entire database using TDE (transparent data encryption).

Given these opportunities and methods of encrypting a database and protecting data, some clients will want to comply with the regulation and move forward with encryption.  Every now and again, the client might run into something they can enable themselves.  TDE happens to be one of those things.  Enabling TDE is pretty easy.  Sometimes things run very smoothly and sometimes there is a problem.  Unfortunately, the problem may not always be immediately evident.

This is one of those times.  If you enable a setting in a database, you would expect it to be in effect and to be working.  Sometimes, with TDE, that change doesn’t take effect. What actually happens is the switch is thrown, the database tries to comply, but something prevents the database from completing the directive it was just given.  As luck would have it, I had the opportunity to assist recently with such a case.

Infinitely Encrypting a Database

Before diving too far down into the problem, let’s take a look at how we can gauge and measure the encryption progress.  Microsoft has provided a dynamic management view to see and track the status and progress of encryption that we have tried to enable.  This view is sys.dm_database_encryption_keys.  Within that view there are a few important and useful fields.  One field is the encryption thumbprint, another is the encryption state, and another tells us the percent that has been completed.

With a little tsql from the toolbelt, we can pull back the essential information to monitor the encryption progress.

You’ll notice that I have a case statement in there to transform the encryption_state values into a human friendly form.  Sadly, the human friendly values are not stored in the database and are only documented in MSDN / BOL.

Now on to the problem at hand—the encryption that starts but never ends.  Given that I have a database named “Published”, I can enable encryption with the following script.

After issuing the encryption command, I can then pull up my “status” script and see how things are rolling.

In this example, and for the sake of this article, this database will remain at 0% complete for days and months if allowed to remain that way.  Notice that it still thinks it is progressing, but nothing is actually getting done.  This kind of result can also be seen after a database reaches 99% complete.  At some point, something occurred that caused grief for the encrypting process.  As soon as that happens, the report will revert back to 0% complete and just stall indefinitely.  The types of things that can cause this vary from corruption to index rebuilds occurring at the same time as the attempt at encryption.

When a stall such as this occurs, the fix could be as simple as restarting the encryption with the statement previously used.  Here that is again (bearing in mind to substitute the appropriate database name for the one I used in this demo):

In some cases, that won’t work.  And sometimes you just may want to proceed very cautiously.  In those events, you can get your database back on track to being encrypted by using the following:

The use of this trace flag forces the page scan, used to encrypt the database, to stop (even though it already appears to have stopped).

If I check my encryption status again, I should see the percentages growing until 100% is achieved and eventually the state that declares the database as being encrypted.  This is demonstrated in the following three screenshots.

And finally we have an encrypted database.

Conclusion

Transparent Data Encryption is useful in helping get a database into compliance with several regulations.  It is also fairly easy to implement.  Occasionally, it may only appear to have completed successfully.  When the SQL statement completes, TDE is not entirely in place.  Though it is “enabled”, it may still be waiting for the encryption phase to complete.  Once the statement to encrypt the database has been issued, it is important to follow up with a bit of monitoring to ensure the database progresses to the encrypted state.  If that doesn’t happen, you could end up in a state where you think the database is encrypted and it actually just stalled.

If you enjoy reading about security and encryption, check out these articles related to security and audits.

Synonyms in SQL Server – Good and Bad

When SQL Server 2005 was released, a nifty little feature was included called synonyms.  Despite being around since SQL Server 2005, I think this feature is often under-utilized or, more importantly, it is implemented in a very bad way.

Today I want to share a couple of examples.  We will take a look at examples of both good and bad implementations of synonyms.

First, let’s take a look at the syntax for creating a synonym.  Per BOL (and for the fun of it, here is the 2005 link).

So a sample implementation of a Synonym could be something like the following.

Before we delve into that sample synonym, lets look at an example of a really bad implementation.

The BAD

While working with a client, I received a request to look into why a linked server query was failing.  (This isn’t necessarily the bad, just hang in there for a bit.)  The message was something like this:

The OLE DB provider “SQLNCLI10” for linked server “blahblahblah” indicates that either the object has no columns or the current user does not have permissions on that object.

The error message seems at least semi-descriptive and gives a starting point.  In this case, I decided to verify the linked server was created properly, verified that the permissions were done properly and even tested the linked server.  On the source (linked) and destination server (let’s call the Source server ServerA and the Destination server we will call ServerB), I verified that permissions were in place for each database to be touched.  Still no dice!

Well, let’s go take a look and see if that referenced table actually exists.  It did not!  Does it exist as a view?  It did not!  Alas, the table existed as a synonym.  This is where it gets wonky.  In looking at the definition of the synonym, I found that the table defined in the synonym had a linked server table as its source.  To top things off, the linked server was back on the originating server that was coming across the link in the first place.  So yes, that would be ServerB initiated a query against ServerA to pull data back to ServerB.  But the data needed (as defined by the vendor) was available on ServerA – supposedly.  Reality had that data actually sitting on ServerB the whole time.

At any rate, thanks to having a synonym for each and every table sitting on ServerA that referenced a table across a linked server on ServerB, we had mass confusion.  In the end, the query was far simpler to execute by just pulling it from the originating query server (ServerB).

This implementation of a synonym was not the best.  All it did was cause confusion, create documentation inaccuracies and delay the developer from accomplishing her task.  Do you really need 1000s of synonyms in your database?  Do you need all of them going across a linked server?  If you do, did you ever think about the potential for performance issues?  (The vendor in this case used those linked servers and synonyms to perform a data conversion that took 36 hrs each time for a rather small dataset – ouch!!!!).

On the other Hand

Imagine, if you will, two databases sitting on the same box.  One database will be for your user data, and the other for “staging” data for processes such as those related to ETL.  Imagine further that, for some inane reason, the naming standard of your databases must include the suffix denoting the environment of the database.

Now picture an SSIS package that must utilize tables from both databases in data sources somehow.  At least one of the tables has millions of records.  And both of the tables have nearly 100 columns.  Even just pulling in the minimum amount of data using a query from each source can cause memory issues.  Two separate data sources means you will likely have a sort transformation (for each source) as well as a join transformation.

Trying to reduce the amount of data in the larger table source could be done via TSQL.  But to reference a database in one environment versus another environment means a code change with each deployment (due to the TSQL – think three part naming).  So you have been hampered by the environment.  Or have you?

By using a synonym in this situation, the data can be joined in a tsql data source by referencing that synonym.  Let’s look back at the sample synonym posted earlier in this article.

You can see that this synonym follows the same sort of naming standards as was just laid out in the preceding scenario.  If I create a synonym in each environment by the same name, and referencing the appropriate environment named database, I have just opened up a performance tweak for my SSIS datasource.

By implementing this slight tweak, I have been able to gain a 10x performance improvement in package performance.  I am now requiring SSIS to ingest fewer records and thus chew up less memory.  Fewer transformations are required and the package can just fly into the required transformations, rather than tinkering around with the transformations needed to just get the data into a usable state for those transformations.

There are other benefits within SSIS to using synonyms for databases on the same server as well.  Especially when dealing with this kind of naming standard that requires the databases to be named differently in each environment.

Conclusion

How you use a synonym can be a huge asset or it can be a significant dampener to performance.  There are benefits and uses for these nifty little things.  Check them out and let us know how you have been able to put synonyms to use to benefit you.

Check out some of these other articles on synonyms here and here.

Maintenance Plan Owner – Back to Basics

We all inherit things from time to time through our profession.  Sometimes we inherit some good things, sometimes we inherit some things that are not so good.  Other times we inherit some things that are just plan annoying.  Yet other times, we inherit things that may be annoying and we probably just haven’t discovered them yet.

Dizzying, I know.

Inheritance

Have you ever taken over a server that had several maintenance plans on it?  Have you ever really checked who the owner of those plans is?  Or, maybe you had a failing job relating to one of these maintenance plans and you changed the job owner, but did you really fix the root cause?  That could be one of those things that you inherited that could be annoying but you just don’t know it yet.

Step by Step

No this is not New Kids on the Block (I think I just threw up in my mouth thinking that).

Let’s create a generic maintenance plan and see what happens.

The first thing we do is navigate to Maintenance Plans under the Management menu in Management Studio.

 

Right Click the Maintenance Plan folder and select New Maintenance Plan… from the context menu.  This will prompt us with the following dialog box.

In this box, we can type a name for this Maintenance Plan that is to be created.  I chose MaintPlanOwner, since that is the topic of this article.

After clicking ok on this dialog box, you will be presented with a blank canvas with which to design your maintenance plan.  I have chose a simple task for the purposes of this article.

I will create a subplan named Statistics and add the Update Statistics task to the canvas.

You can see this illustrated to the left.  I chose to update the statistics on all databases and left all other options as the default option – for simplicity of this article.

At this point, the only thing left to do is to save this Maintenance Plan.  Once the plan is saved, then we can move on to the next step – some fun with TSQL.

 

 

Fun with TSQL

This is the stage of the article where we get to play with TSQL and investigate at a high level the Maintenance Plan we just created.

Within the msdb database, we have some system tables that store information about SSIS packages, DTS packages, and Maintenance Plans.  We will be investigating from a SQL 2008 and SQL 2005 standpoint (it changed in 2005 and then again in 2008).

In SQL 2005, we can query the sysdtspackages90 and sysdtspackagefolders90 to gain insight into who owns these Maintenance Plans.  In SQL 2008 and up, we can query sysssispackages and sysssispackagefolders to gain the same insight.  These system tables are within the msdb database.

In SQL Server, we can use the following to find that I am now the owner of that maintenance plan we just created.

Notice that in this query, I delve out to the sys.server_principals catalog view.  I did this to retrieve the name of the owner of the package that was found in the sysdtspackages90 and sysssispackages tables respective to version of SQL Server. I also am running a dynamic SQL query to support both views dependent on version of SQL Server.  I figured this might be a tad more helpful than the previous version here. This query would yield the following result set for that new “Maintenance Plan” that was just created.

Caveat

Let’s assume that this package is scheduled via a SQL Agent job on a production server.  I then get moved to a different department and no longer have permissions on this particular production server.  The job will start failing due to the principal not having access.  One fix would be to change the owner of the job.

That will work.  However, there is a problem with that fix.  As soon as somebody opens and saves the Maintenance Plan, the owner of the job will revert back to the owner of the Maintenance Plan.  When that happens, then the job will fail again.

A permanent fix is needed.  The permanent fix is to change the owner of the Maintenance Plan.  The following will change the owner to ‘sa’ for both SQL 2005 and SQL 2008 (and up).

SQL 2005

SQL 2008

Now if you run the code used earlier to investigate, you will find that the owner has indeed changed.  The results of that query should be similar to the following.

There you have it.  No more hair tugging over something as benign as the owner of a Maintenance Plan.  This is one of those things that should be looked at as soon as you inherit a new server.

The Wrap

In this article I took a rather long route to a simple fix. It’s easy to try each of the steps I showed in this article thinking it will help. It isn’t illogical to try some of those steps. They just don’t work unfortunately. In the end, getting to know the settings in the database and what the errors are really trying to get at is most helpful. Sometimes, it just takes a few more steps to get to the real meaning of the error.

This has been another post in the back to basics series. Other topics in the series include (but are not limited to): Backups, backup history and user logins.

SHUTDOWN SQL Server

Categories: News, Professional, Security, SSC
Comments: 4 Comments
Published on: December 3, 2018

Recently a friend by the name of Chris Bell (blog | twitter) wrote about an easy way to disrupt SQL Server.  That disruption comes in the form of the SHUTDOWN TSQL command.  You can read what Chris wrote from his article here.

Granted, you do need to have elevated permissions such as sysadmin, serveradmin or securityadmin.  I include securityadmin even though Chris did not because a securityadmin can create an account and grant that account sysadmin permissions.  And in Chris’ article he only discusses the threat to the SQL Server process. When I read the article, I wondered to myself if the threat stopped there. I also wondered to what extent could I cause disruption. Yes, the gremlin in me did start to come out.

When I say I was curious what level of disruption could be caused, I really wanted to know if I could reboot the server from within SQL Server or even if I could simply shut down the entire server. Well, you can certainly bounce the server from within a TSQL script – if you have adequate permissions (or know how to elevate your permissions).

The first step is rather simple: check to see if xp_cmdshell is enabled. If it is not, then enable it.

Now that xp_cmdshell is enabled, let’s bounce the server.

Now this code will not work on your server as-is. I have coded it to be my sandbox server. That said, if the server name matches you should get a prompt informing you that SQL Server will shut down. And that shutdown will happen after 60 seconds.

If you happen to see a message reporting “Access Denied”, that is also easy to circumvent—well, at least up to SQL Server 2016. Prior to SQL Server 2016, I could utilize xp_cmdshell to also add the service account (let’s say it is NT Service\MSSQLSERVER) through the use of net localgroup /add.  However, in SQL Server 2016, you will continue to get an access denied message (even if you try to use a proxy account). Here is an example of that  exploit as well (prior to 2016):

Let’s say you have done your duty and changed the service account off of the default local service to a domain account or a local account (not nt service), but you decided to add that account to the local Administrators group. You have actually opened yourself up to plenty of other problems. Even in SQL 2016, if the service account for SQL Server is in the Local Admins group, then the shutdown SQL Server example shared here will work and force a shutdown.

So, in the end, please be mindful of the service account in use. And be mindful of the level of permissions that service account has been granted at the OS level. If you are not careful, you could have some rogue developer or intruder shutting you down before you know it.

Now would be a good time to go and review your security. Check for users with too much access. Here are some articles that can help with that (here and here).

Monitor Database Offline Events

The other day, I shared an article showing how to audit database offline events via the default trace. Today, I will show an easier method to both audit and monitor for offline events. What is the difference between audit and monitor? It largely depends on your implementation, but I generally consider an audit as something you do after the fact. Monitor is a little more proactive.

Hopefully, a database being taken offline is a known event and not a surprise. Occasionally there are gremlins, in the form of users with too many permissions, that tend to do very strange things to databases and database servers.

Having read the previous article, you already know one method to try and find these database offline anomalies. That method may not be the most sleek solution nor most reliable given the possibility that events can quickly roll out of your default trace files. The better more reliable method is use Extended Events (XE) to monitor explicitly for those types of events. If you are unfamiliar with XE, I really encourage you to take a look at these resources to become more familiar with the tool.

Monitor

Beyond the power of XE to be able to better diagnose problems and trace events in your server, there is the ability to monitor for specific events as well. I won’t go into details about how to monitor with XE until a later article, but suffice it say I can monitor for Events to occur and immediately alert necessary parties to get more immediate action. This is quite some power for a built in tool and it is better than event notifications or agent alerts when it comes to ease of use and reliability.

As I look to monitor for these odd unplanned database offline events, I have several events within XE that can provide the requisite information: sqlserver.object_altered, sqlserver.database_started, sqlserver.database_stopped, and sqlserver.errorlog_written. I can hear you asking already “Wait, this seems to be a bit like a drill sergeant – very overbearing!”

Yes, it is probably a bit excessive for this session. However, I prefer to be comprehensive and the ability to link events together so I can better understand if it is a single one-off or if there is a bigger problem with the entire instance. Databases being stopped, started or set to offline should be rare and far between really. With that rarity in mind, the session should be relatively quiet.

If I have that session running and then take a database online/offline or vise versa, I will see something very similar to this output.

Starting from the bottom and working my way up, I can see that a command was issues to bring the 👻s database ONLINE. The very first thing that occurs is the request is written to the error log. Then I see that the database is in the stopped state. Next a message that the database is starting up (because it was stopped). Then we see two events for object_altered (similar to the default trace) due to the begin and commit phases of that transaction.

After that database was brought ONLINE, you can see that I immediately took 👻s back offline – starting with the errorlog event, then a stopped event and the object_altered begin and commit events.

Capturing each of the events I noted previously, not only gives me a complete picture of the event, it also can help me to identify if something happens in between the various “expected” events. If I use this session in my monitoring setup, then I can be quickly alerted to problems with a database as well as have the archive of the events to go back in time and AUDIT or troubleshoot the event of a database being offline or unable to come online.

With this XE Session running, I can be more confident that I have trapped and correlated the correct events in each of the sources. Using the default trace method, I have to make some highly likely correlations but there is still some “magic” involved. With the XE session, you will be far less likely to see any of those events roll out of the log as well. I can’t underscore the importance of that fact enough. The data will be there when you need it!

Conclusion

We all aspire to having a perfect database environment where nothing surprising or unexpected happens. Unfortunately, that is the desire of dreams and fairy tales. The unexpected will happen. A database can unexpectedly be taken offline. Are you prepared to address the problem fully to the CTO should it happen? This XE session can help you with that.

Extended Events is a powerful tool to help in troubleshooting and tuning your environment. I recommend investing a little time in reading the 60 day series about Extended Events. This is not a short series but is designed to provide an array of topics to help learn the tool over time. Don’t forget to go back and read the companion article showing how to audit these events via the default trace.

Audit Database Offline Events

Categories: News, Professional, Scripts, SQLBP, SSC
Comments: 1 Comment
Published on: November 13, 2018

Knowing when an event occurred within the database environment is a very important thing. Being able to act quickly when certain events occur is equally as important. Sometimes, we may not find out about an event for a few days or weeks and then we are asked to figure out the who, when, why and how of the event. Hopefully there was enough foresight in place to plan for such a request to travel back in time.

Recently while networking at a large database related conference, I had the chance to discuss some of this with a semi-desperate DBA. In his environment, apparently a database can be taken offline at random times without the involvement of the DBA team. He was looking for a solution to prevent it from happening. For me, the simplest solution is to revoke permissions from the person doing it. If you can’t revoke the permissions, then at least capture the pertinent information to document who and when the databases are being taken offline.

Auditing and Facts

I am a huge fan of auditing the database environment to gather facts. There is comfort that can be gained in knowing that if a change happens to a database or server under your purview, that you can report some facts back up the chain when a post-mortem is required. I recently shared another example of such an auditing scenario – here.

There is a technique in that article that may be of interest for you if you are restricted to implementing this kind of audit. I won’t go into detail about the poor man audit techniques in this article, so it is worth reading that previous article.

When looking to find data about a database being taken offline, one could capture the data from the default trace or from the SQL Server Error Log. While this article will show how to capture this data from both of those sources, it is important to understand that the data just might not be available in either location depending on settings and activity of the server combined with how old the offline event was.

Lets first take a look at the default trace. That one should be easy enough.

And a snippet of some possible output looks like the following:

If you look closely at the output, you will probably notice that I don’t get a lot of information. I find out that there was an ALTER event to a database but I do not know what the event was nor do I see the text from the SQL statement that caused the change. So, at best, I get a partial view of the picture by looking at just the default trace.

With that in mind, let’s look at the error log and see if that provides better information.

This is pretty cool here. I am checking all of the log files available on SQL Server and scanning for a search term of “offline.”

Ok, this shows a little bit of promise. I can see that the database

was indeed set to offline (or “option OFFLINE” was set to ON) with a spid and a timestamp. Unfortunately, the data shown here is far from being super useful in a post-mortem. The error log does not show who took the database offline.

What if, there was a way to combine these two queries into a single result set and better correlate the results into something highly useful? Something like that seems like it could be possible given we have a spid and timestamp in each result set.

Let’s try it and see.

If I use the timestamp from each result and the spid from each result set, I can surely join the two results together. As you can see, that is exactly what I did. I had to format the spid and timestamp a little bit – but that is ok. With that little bit of magic, I get a result set similar to the following now.

Now, I can easily see who put the database into offline mode. I can also tell you what the statement was when they executed the offline. I also know what the timestamp was when the database was put offline. This is usable data now when it comes time for a post-mortem or fact finding request.

The Wrap

Is this the most eloquent method to trap this kind of data? Absolutely not, but it is usable. When it comes to something more sleek and modern, I would recommend a better tool such as Extended Events. I will show how to do this same task with greater ease and reliability through the use of XE in my next article.

Oh, if you are interested in any of my other articles about Auditing or Extended Events, I recommend you read start in here (for Auditing) or here (for XE).

Cannot Use the Special Principal – Back to Basics

I recently had a client call me out of the blue because he happened to be getting an error while trying to add a user to a database role. The error he was getting was “Cannot use the special principal ‘dbo’.”

This error has probably cropped up on me more than a few times. And on more than a few occasions, I have forgotten about the previous experiences. Some of that is because the fix is rather easy and after a few times seeing it, muscle memory takes over and you just fix it without thinking about it too much.

Until you get to that muscle memory moment though, you may flounder a bit trying this and that and failing then proceeding on to a level of frustration that has you losing precious hair.

As luck would have it, this is an article about security and principals and is similar in nature to some other articles I wrote about some fundamental misconceptions about permissions here and here.

I do hope that there is something you will be able to learn from this basics article. If you are curious, there are more basics articles on my blog – here.

Meet Prince Apole and Rolle…

Adding a user to the db_datareader database fixed role is a pretty simple task. Most of us can likely do that in our sleep. Even using the GUI is usually pretty reliable to do that. Every now again though, somebody has decided to get tricky on us. Maybe a mistake was made somewhere in a setting on the server and nobody has caught it because nothing was “broken” – until it was.

In the aforementioned case, I was asked to help resolve the issue and I found that there was a problem in how the database owner was set. Not only was it a problem in the current database but in 12 other databases on the same server. The systems admin was at wits end. He was dealing with something that was just not in his knowledge-base yet. I remember being in the same boat – so no big deal there. We talked about some of the things he had tried and how none of it was working. I am going to recreate the same basic scenario along with some of the attempted fixes in this article.

First, we need to create a database (best to break a database designed to be broken instead of an existing one used for something else already).

That is pretty straight forward – the database will be created with the data files in the default directories on your SQL Server instance. In addition, a login called mydomain\svc_dummy will be created as a windows login.

Now let’s try to set the owner of the database and then add the svc_dummy account to the datareader role.

There is a point of interest here. I said I was going to add svc_dummy to the datareader role – not dbo. Well, I set the database owner in the preceding step to svc_dummy so it basically became dbo. When I try to perform the role addition in the GUI and then script the change, this is the script that is produced. I will show why in a few moments.

The execution of the second part of the script results in the following:

Msg 15405, Level 16, State 1, Line 18

Cannot use the special principal ‘dbo’.

That is obviously not going to work. Let’s try fixing the script and add the svc_dummy principal instead of dbo.

Unfortunately, this results in the following:

Msg 15151, Level 16, State 1, Line 22

Cannot add the principal ‘mydomain\svc_dummy’, because it does not exist or you do not have permission.

Well, maybe the problem is because the user doesn’t exist then? Let’s try to create the user and see what happens.

Now we should see this message:

Msg 15063, Level 16, State 1, Line 32

The login already has an account under a different user name.

Oy vey. We seem to be going in circles. Nothing is working. The user is there but not really there. Let’s try to drop the user and just try to clean things up and start over.

I hope you see the problem with this one. Trying to drop dbo. I dunno but we should see an error here – and we do get an error.

Msg 15150, Level 16, State 1, Line 27

Cannot drop the user ‘dbo’.

Let’s fix the user then and try to drop the svc_dummy user instead of dbo.

Which in turn generates yet another error.

Msg 15151, Level 16, State 1, Line 52

Cannot drop the user ‘mydomain\svc_dummy’, because it does not exist or you do not have permission.

If I can’t resolve the problem by changing the user in the database, maybe I can just blow it out of the water by dropping the server login.

Yet another failure message will ensue. This time the message is:

Msg 15174, Level 16, State 1, Line 55

Login ‘mydomain\svc_dummy’ owns one or more database(s). Change the owner of the database(s) before dropping the login.

So far we have been able to skirt around the problem and generate six different error messages. The last one kind of gives us the best information on what we could do to resolve the issue. The login owns a database and therefore, we need to undo that ownership. Before we do that, let’s take a look at the database principal ‘dbo’.

We already know that svc_dummy is mapped to a user in the DummyDB database. We also know that we cannot add the svc_dummy user because of that prior mapping. We have also learned that when scripting the permissions change from the gui on the svc_dummy login and then generate the script it scripts out the user ‘dbo’. Due to this, let’s look in the sys.database_principals view at the dbo user and see what it tells us.

See how the dbo database principal says it is mapped to a windows account type? With this in mind, let’s join to the sys.server_principals and see what windows account is mapped to the dbo database user.

Now we see a bit more clearly. Combined with the error messages and the principal information for both the login and the user, we have a better view of the puzzle now. Changing the database owner indeed mapped the windows account to dbo for us and is now restricting us to certain activities when trying to manage permissions for the windows login in the database. From here, we can easily fix the issue by changing the database owner, creating a user mapped to the windows login and then adding that principal to the datareader role.

And if we run that script for svc_dummy we will see a successful execution as shown here.

The Wrap

In this article I took a rather long route to a simple fix. It’s easy to try each of the steps I showed in this article thinking it will help. It isn’t illogical to try some of those steps. They just don’t work unfortunately. In the end, getting to know the settings in the database and what the errors are really trying to get at is most helpful. Sometimes, it just takes a few more steps to get to the real meaning of the error.

This has been another post in the back to basics series. Other topics in the series include (but are not limited to): Backups, backup history and user logins.

Quickly Change SQL Job Owners

It is not unusual to find a server where some random user created a bunch of jobs to be run by SQL Agent. Sometimes, the user creating the job(s) sets themself as the owner of the job. There are certain cases where this behavior is hard to avoid like when creating a maintenance plan.

And of course, there are times when the user just doesn’t know any better. There is of course, the rare occasion when setting the job owner to be ones self makes the most sense -but that is few and far between in the grand scheme. Usually, you will want a non-expiring account such as a service account or a principal without “logon” permissions to be the owner.

The primary reason being simple – humans have an expiration date for every job they will ever have. When that expiration occurs, you may end up with any number of unwanted side effects. Unwanted side effects is exactly what we try to avoid in our jobs run via SQL Agent.

No Expiration Date

There are two basic means to change the owner of every job on your server. Either you open each job one by one and set the owner to an acceptable principal. This method is rather tedious and you will be fighting off the boredom if you have a few hundred jobs on the server. Or, the alternative, change the job owners group by group (set-based theory). This second method can be far less tedious and far more efficient. The second method is by far my preferred method. Let’s take a look at how to make all of these changes in groups.

There are three basic sections to this script. First I fetch what should be changed, then I make the change, and lastly I verify the change. If the change doesn’t look right, then I can rollback the change. If the change is what I expected, then I can commit the change. Those are the broad strokes.

At a more detailed glimpse, I have setup a few variables to compare what I want to change, what the new job owner should be and then I fetch the sid of that new job owner. In my example, I am setting everything to ‘sa’. Why? Because it is easy for the sake of the example in the article – nothing more!

Since sometimes the owner of the job may only have access to the SQL instance via a Domain Group, I also take advantage of a couple of functions to double check that it is the correct account. These functions I am using are SUSER_SID() and SUSER_SNAME().

When all is done as I am expecting, then I should see something similar to the following.

Since the change is what I expect, then at this point I would proceed with the commit transaction statement.

The Wrap

As you can see, making job ownership changes at group scale instead of one by one is pretty easy. This only takes a matter of seconds to run against hundreds of jobs. That same kind of task done one at a time could easily take more than 40 minutes. I am not sure I want to spend that much time on such an innocuous task. I hope you are now able to use what you have learned to improve your skills and become a rock-star DBA. ENJOY!

If you feel the need to read more about single-user mode, here is an article and another on the topic.

This has been another post in the back to basics series. Other topics in the series include (but are not limited to): Backups, backup history and user logins.

«page 2 of 5»

Calendar
August 2019
M T W T F S S
« Jul    
 1234
567891011
12131415161718
19202122232425
262728293031  

Welcome , today is Wednesday, August 21, 2019