Database Offline – Redux

I have written a couple articles showing how to audit database offline events via the default trace as well as via Extended Events. These are great for a more proactive approach. Sometimes the proactive approach is just not plausible for the given circumstances. A good example of this is when you inherit servers (e.g. new client, new project, new job) and there are databases on the server that were taken offline prior to your inheritance.

In a situation like this, you need to find out when the databases were taken offline, but you are somewhat limited by the data available to you for your research purposes. The default trace has rolled beyond the retention point for your server for events that were trapped. Setting up an Extended Events session would be less than helpful given the databases are already offline. So we must find an alternative to finding approximately when the databases were taken offline.

Alternatives

If your predecessor leaves you with a database that has been taken offline and no documentation about it, there is an alternative to find the approximate time it was taken offline – using TSQL. Granted, you could explore the file system and make the same sort of guess based on the file modified date. Data people generally prefer some sort of script and a script that is native to their language (tsql).

So, let’s take a look at a viable option for figuring out the mystery behind your database that has been set offline with no good info surrounding it.

Running this query, I see the following on my test server:

Note that this query uses GETUTCDATE. The differential_base_time column in sys.master_files is in UTC time. Thus, in order to compare properly, we need to ensure we use a UTC datestamp for comparison. Also, of note, this query doesn’t work if there is no full database backup for the database in question. Who doesn’t backup their databases though, right? No seriously, you really should.

Since this is my test server and I am well prepared, I just so happen to have an XEvent session running that tracks the offline events that I can use to compare the results.

Yes, there is a bit of variance in the actual database offline event and the differential_base_time column. This variance is due to timing of the backup and the actual offline event. Again, this is about finding an approximate time of when the database was taken offline as a viable alternative when other data is not really available.

Conclusion

We all aspire to having a perfect database environment where nothing surprising or unexpected happens. Unfortunately, that is the desire of dreams and fairy tales. The unexpected will happen. A database can be offline at the time that you take over responsibility of the database. You will be asked when it was taken offline. This script will help you get a reasonable deduction for that offline event in the absence of better data.

Extended Events is a powerful tool to help in troubleshooting and tuning your environment. I recommend investing a little time in reading the 60 day series about Extended Events. This is not a short series but is designed to provide an array of topics to help learn the tool over time. Don’t forget to go back and read the companion article showing how to audit these events via the default trace.

Database Recovery Monitoring with XE

On of the greatest benefits of Extended Events (xe) is how the tool simplifies some of the otherwise more difficult tasks.

Recently, I wrote a rewrite of my database recovery progress report script. That script touches on both the error log and some DMVs along with some fuzzy logic to join the data sets together. That script may not be the most complex script out there, but it is more more complex than using the power of XE.

Database recovery (crash recovery) is a nerve wrenching situation under the wrong conditions. It can be as bad as a root canal and just as necessary to endure that pain at times. When the business is waiting on you waiting on the server to finish crash recovery, you feel nervous at best. If you can be of some use and provide some information back to the business, that anxiety dissipates and the business becomes more calm as well. While the previous script can help you get that information easily enough, I want to introduce the easiest method to capture that information currently available.

If you are interested in a history lesson first, here are the first couple of versions of the aforementioned script (here and here).

Discovery First

As always, I like to explore the event repository to see if there is an event that may be applicable to my situation. This can be done via TSQL script or from the XE Gui. I will almost always break out my scripts to figure out if an event is there or not.

This query will yield any events that match my description. In this case, I am looking for events related to “database_recovery”. This search will yield four relevant events we can use to track our database recovery progress. Those events are shown in the following image (with the event names being circled in green).

If I explore the events a little more, I will eventually come across an attribute in the database_recovery_progress_report event that leads to a map. This map is called recovery_phase. For me, that is an interesting attribute/map and makes me want to look at it further.

Things are coming together a little bit now. We all know (or should know) that there are the analysis, redo and undo phases to crash recovery. This aligns with that knowledge and throws in a couple more phases of the recovery process.

So, now we know there are four relevant events for us to use and that one of the events will tell us specifically which phase of recovery is currently processing. We have enough information that an event session can now be built.

You may notice that I have thrown a lot of actions including the kitchen sink at this event session. Some of that is for consistency across sessions and some of it is simply for exploratory wants (and not needs). Feel free to add/remove actions form this list as you explore the use of this session in your environment.

Here is what that session produces on my test server with a simple stop/start of the SQL Server instance.

In the preceding image, I have the different events circled in red. I have also added the event_sequence action so I can see the relationship between these events as the progress from one to the next. If you note the items circled in green (and connected by green arrow), you will see a couple of different things such as the trace message, the database name, the database id, and the recovery time remaining). Circled in blue are the “destress” items that let us know that the recovery is 100% complete.

Wrap

SQL Server recovery is a safeguard to protect the data in the event of an unexpected failure. The recovery process is necessary and has several phases to roll through in order to bring the database back online.

Just because you need to have patience during the crash recovery process does not mean you have to work hard to get a status of the process or that you need to stress throughout the process. This XE event session will take a lot of work and stress out of the process. I would recommend having this lightweight session running on the server with the startup state set to enabled. This will make your job easier and definitely can make you look like a rockstar DBA.

This article has demonstrated the power of Extended Events, for a lot more reading on the topic, here is a list of over 100 articles.

Making Databases Personal

TSQL2sDay150x150TSQL Tuesday

The second Tuesday of the month comes to us a little early this month. That means it is time again for another group blog party called TSQLTuesday. This party that was started by Adam Machanic has now been going for long enough that changes have happened (such as Steve Jones (b | t) managing it now). For a nice long read, you can find a nice roundup of all TSQLTuesdays over here.

The Why?

This month, Todd Kleinhans (b | t) invites us to share something personal about ourselves. Well sort of. The invitation to each of us is to share a little about databases that we use in our personal lives or that we have created in our personal lives to help us in some fashion or another.

I think this is a really cool idea. Data and databases are very integrated into my life. The integration is enough that it would be difficult to un-blur the line where personal life and professional life begins when talking about data and databases.

Take a moment and reflect on the importance of data and databases within your life. At a minimum, you have taken the approach that data is your profession of choice and hopefully it is a career for you and not just a job.

If you have taken it down the career path, then you probably study your craft in your personal time to try and make yourself better at your trade-craft. Often times, if you are like me, these studies lead to trial and error, experiments, and possibly rabbit holes. If data is just a J-O-B, it is quite likely none of this will apply to you. Those are the types that fall into a category Steve Jones wrote about recently, and I would urge you to change that sooner rather than later. (Side note, I read that article as if it applied to me directly because I really do need to learn and practice more about my craft too!)

Personal Databases

Over the years, I have had sooo many personal databases that I have created for one reason or another. Some still need to be finished. Here are some examples of what I use a database for in my personal life.

Finances: A simple little database to track (sort of like a checkbook yes) credits and debits. More importantly, it tracked bills and due dates with dollar amounts for each bill. This is useful both for reminders for bills but also to see if there are any noteworthy trends with spending or increased costs. Now, many banking (and credit card) sites offer the same types of services – so long as the spending is done with their card.

Exercise: Back in the day, I tracked all of my activities in a database. I would track the duration, activity type, effort level and various biometric measures after each activity. Once again, this is something that is now tracked by so many different apps and databases that it seems counter productive to maintain my own system in favor of something that works more easily on a wearable device.

I have also previously done databases to inventory books, movies, track personal studies and link thoughts about studies from day to day, or even track routine chores (calendar reminders don’t offer much for tracking details). The short of it is, a database can be created to manage so many things as a part of everyday personal life.

 

Wrapping it Up

If you can use a database to create automation routines to help sustain your work life, you can certainly do the same to sustain your personal life. Databases are an essential piece of both my personal and professional life. Not only do I use them to create automation (such as this or this) for things, but I stand databases up to test numerous things from corruption to recovery to performance tuning to help improve my personal abilities and craft in the realm of data.

 

Database In Recovery Redux

Categories: News, Professional, Scripts, SSC
Comments: No Comments
Published on: April 8, 2019

One of the more gut-wrenching experiences of a DBA is to have a database get stuck in recovery. This typically happens when the server was rebooted unexpectedly and typically coincides with a rather large transaction.

The reboot and sudden stop of the transaction requires the transaction to be replayed or to experience an undo. This could take hours or days depending on worst case scenarios. Sitting there and waiting for it to complete just makes you sweat.

This happens to have happened to a client recently so I dusted off an old script (which you can find if you click this link), and started investigating and monitoring the server. While waiting and investigating, I decided to take on my challenge (quoted hereafter) from the first time I posted this query (here).

Unfortunately, this query does not demonstrate the time remaining for the rollback nor the percent complete without needing to query the error log.  Those would be awesome additions if you know how to do it (and let me know), other than via the error log.

I did not remove the dependency on the error log, rather I leveraged that dependency to allow me to get more information from the DMVs and link it to the error log messages to give a bigger more complete picture of the recovery process from a simple query.

This script is now enhanced from the 2014 version and works through 2017.

Wrap

SQL Server recovery is a safeguard to protect the data in the event of an unexpected failure. The recovery process is necessary and has several phases to roll through in order to bring the database back online. It will require patience and this script can help achieve that patience.

Indexed Views – Performance Panacea or Plight

Indexed View(s)

An indexed view is a view where the result set from the query (the view definition) becomes materialized in lieu of the virtual table result set of a standard (non-indexed) view. Many times we see that an indexed view would be created to help improve performance. Far too often, an indexed view is created without consideration for the costs of the indexed view. In this article, I hope to cover some of the more important costs that are frequently overlooked when considering an indexed view as a performance panacea.

The Setup

To demonstrate the overlooked costs of indexed views, I have decided to use the AdventureWorks2014 database (if you don’t have this sample database, you can get your copy here). I will create a view in that database and then I will add a few indexes to that view. Prior to sharing the entire setup, and like all good DBAs, I need to first grab a baseline. For this baseline, I am doing nothing terribly complex. I will just grab table sizes and storage information for the entire database.

Here are my results for the size of the objects within the AdventureWorks2014 database:

These results show that the largest object in the database happens to be the Person.Person table at just about 30MB. Not terribly large, but the database is not terribly large as a whole. Let’s see what happens when I add a materialized view, based on the Person.Person table, to the mix. Here is the definition of that view along with the indexes that I will add to help the performance of some essential production related queries.

After executing all of that code to create this new view with its indexes, I have the following size results:

The creation of this view has chewed up a bunch of storage. It has jumped right up to the number two spot on the biggest objects list within this database. You can see that differences by comparing the highlighted rows to the previous image. The vPerson view is highlighted in red in this second image to help point it out quickly.

Surely this must be a contrived example and people don’t really do this in the real world, right? The answer to that is simply: NO! It DOES happen. I see situations like this all too often. Far too often, large text fields are added to an indexed view to make retrieval faster. I have mimicked that by adding in two XML columns from the Person.Person table. This is definitely overkill because a simple join back to the table based on the BusinessEntityID would get me those two columns. All I have effectively done is duplicated data being stored and I have achieved that at the low low cost of increased storage of 25% for this small database. If you are curious, the column count between the Person.Person table and this new view is 13 columns each.

I call the 25% increase storage cost a significant increase. An increase of 25% in storage for a single materialized view is not an uncommon occurrence. I have seen multi-terabyte databases with 25% of the total data storage being caused by a single view. If you are tight on storage, any unexpected data volume increase can cause a non-linear growth in data requirements to occur just because of the presence of indexed views. Take that into consideration when presented with an option to either performance tune the code or to take a short-cut and create an indexed view.

Band-aid Performance

I alluded to the use of indexed views as a cog to help improve performance. This is done by taking the query that sucks and turning it into a view. The thought is that it will run faster because an index is created specifically for that data set. Generally the performance gains from using an indexed view, to camouflage the bad query, are seen when the query is complex. Unfortunately, the query I provided for this article is not terribly complex. Due to the simplicity of the first example, let’s look first at the execution plan for another query that runs terribly slow but is also not terribly complex:

There are a few table spools and index scans. The most costly operator of the bunch appears to be a merge join with a many-to-many join operation. The indexed view in this case does give me a very significant improvement on this query (look at that estimated cost and then the estimated number of rows). What if I could tune the query a little and avoid the all of storage cost? Let’s see!

Consider the base query for this new view:

Looking at the query, you may be asking yourself why use so many trim functions on each of the joins? This is an extreme that can be easily fixed. None of the fields in the joins should need to be trimmed since they are all numeric in this case. Starting there and removing all of the function use in the joins should provide significant improvements in query speed and without the use of a view. Testing it out would yield similar performance results to the first view in this article. This small change would cause the query to complete in 1 second compared to about 90 seconds. That is a huge gain, and without the added cost of increased storage use.

Do I really see stuff like this in the field? Yes! Spending a little time to fix the root of the problem rather than looking for a quick fix and performance can be spectacular. There are times, though, that an indexed view may be an absolute requirement. That would be perfectly fine. There are times when it is warranted and there is no way around it. If an indexed view is an absolute must, then there are a couple more considerations to be taken into account.

Disappearing Act

I showed previously that the indexed view requires storage space and showed that the new view created quickly claimed its place as the second largest object within the database. Now, to stage the next segment, I will show once again that the indexes are present and consuming disk space. This can be shown via the following query:

And the results of that query should look something like the following (quick reminder that I created a new indexed view for the second demo and these results will be based on that second view):

Now that we know the index is present and consuming space, time to make a change to the view and see what happens. I am proceeding under the premise that I have determined that including the two XML columns in the view are completely unnecessary and are the cause of too much space consumption. I can reap the benefits of the view at a fraction of the cost if I ALTER the view and remove those columns. So I will proceed by issuing the following ALTER statement:

Take note of how long this ALTER command takes to complete – nearly instantaneous.  That rocks! Right? Let’s look at the indexes to make sure they are still in working order and don’t require a defrag. Based on that, I will re-run that query to check the indexes and sizes – for giggles.

This time when the query completes, I get a nice little surprise: I no longer have any indexes on that view. Changing the view definition instantly drops any indexes that were built on that view. If this is done in a production system, imagine the implications and problems that could arise.

Conclusion

In this article, I have covered indexed views and three different considerations that are often overlooked when looking to implement a materialized view. The three considerations for indexed views, as discussed, are: storage, changes to the view, and the band-aid effect. Overlooking the cost of storage can have serious repercussions on the production environment. Implementing an indexed view without attempting to tune the code first could lead to overlooking the storage cost and could end up just being a waste of resources. The last tidbit is that any change to an indexed view will drop the indexes. That is an easily overlooked feature of indexed views. If you forget to recreate the indexes on the view after making changes, you could be looking at a production outage.

There are a couple more critical considerations for indexed views that are often overlooked as well. These additional considerations are blocking, deadlocking, and maintenance. These considerations will be discussed further in a future article.

For more reading enjoyment try the Back to Basics or Xevents series.

Connect To SQL Server – Back to Basics

The first critical task any data professional should ever learn how to do is how to connect to SQL Server. Without a connection to SQL Server, there is barely anything one could do to be productive in the data professional world (for SQL Server).

Yet, despite the commonality of this requirement and ease of the task, I find that there is frequent need to retrain professionals on how to connect to SQL Server. This connection could be attempted from any of the current options but for some reason it still perplexes many.

Let’s look at just how easy it is (should be) to connect to SQL Server (using both SQL Auth and Windows Auth).

Simplicity

First let’s take a look at the dialog box we would see when we try to connect to a server in SQL Server Management Studio (SSMS).

Circled in red we can see the two types of authentication I would like to focus on: “Windows Authentication” and “SQL Server Authentication”. These are both available from the dropdown box called Authentication. The default value here is “Windows Authentication”.

If I choose the default value for authentication or “Windows Authentication”, I only need to click on the connect button at the bottom of that same window. Upon closer inspection, it becomes apparent that the fields “User name:” and “Password:” are disabled and cannot be edited when this default value is selected. This is illustrated in the next image.

Notice that the fields circled in red are greyed out along with their corresponding text boxes. This is normal and is a GOOD thing. Simply click the connect button circled in green and then you will be connected if you have permissions to the specific server in the connection dialog.

Complicating things a touch is the “SQL Server Authentication”. This is where many people get confused. I see many people trying to enter windows credentials in this authentication type. Sure, we are authenticating to SQL Server, but the login used in this method is not a windows account. The account to be used for this type of authentication is strictly the type that is created explicitly inside of SQL Server. This is a type of login where the principal name and the password are both managed by SQL Server.

Let’s take a look at an example here.

Notice here that the text boxes are no longer greyed out and I can type a login and password into the respective boxes. Once done, I then click the “Connect” button and I will be connected (again, presuming I have been granted access and I typed the user name and password correctly).

What if I attempt to type a windows username and password into these boxes?

If I click connect on the preceding image, I will see the following error.

This is by design and to be expected. The authentication methods are different. We should never be able to authenticate to SQL Server when selecting the “SQL Server Authentication” and then providing windows credentials. Windows credentials must be used with the “Windows Authentication” option. If you must run SSMS as a different windows user, then I recommend reading this article.

The Wrap

Sometimes what may be ridiculously easy for some of us may be mind-blowing to others. Sometimes we may use what we think are common terms only to see eyes start to glaze over and roll to the backs of peoples heads. This just so happens to be one of those cases where launching an app as a different principal may be entirely new to the intended audience. In that vein, it is worthwhile to take a step back and “document” how the task can be accomplished.

Connecting to SQL Server is ridiculously easy. Despite the ease, I find myself helping “Senior” level development and/or database staff.

If you feel the need to read more connection related articles, here is an article and another on the topic.

This has been another post in the back to basics series. Other topics in the series include (but are not limited to): Backups, backup history and user logins.

An Experiment with Deadlocks

Everything can be fixed with a query hint (*cough* directive), right? If a certain process is consistently causing deadlocks, a simple ROWLOCK hint can be added to prevent it, right?

Well, for whatever reason, there seems to be a myth out there that when deadlocks come a-knocking, then just throw this little directive at it and all will be well. Today, we get to test that and show what will continue to happen.

First, lets look at what the ROWLOCK actually means:

Specifies that row locks are taken when page or table locks are ordinarily taken.

This seems like a fabulous idea if the deadlocks are occurring against a table involving a massive update. Let’s take a look at a small update scenario involving just a handful of records. For the setup, we will use the same setup used in a previous article by Wayne.

Looking at the code, we can see there are only five rows in each of the tables. In addition, an update will be performed to both col1 and col2 in each table for one specific row. So we are keeping this to a singleton type of update, and we are able to force a deadlock by using this setup. Not only do we see that a deadlock will occur consistently, we would see the following in the sys.dm_tran_locks DMV as well as in the deadlock graphs.

In my environment, I used session 51 and 54 consistently for the deadlock repro. In each of the tests, each spid did obtain page locks as well as RID locks (row locks) that were waiting on each other in order to perform an Update. This is what we saw when I ran the setup without the ROWLOCK directive. What if I used the ROWLOCK directive (are you guessing already that there would be no change because the deadlock occurs on the lock held for the update that is waiting on the other update to complete?)? Let’s take a look at that too!

The only change to this setup is that the ROWLOCK directive has been added to the update statements. Examining the sys.dm_tran_locks DMV reveals the same locks being held as was seen without the directive. This shouldn’t be too big of a surprise since the updates are against a single row.

In addition to the same locks being held, we continue to experience the same deadlock problem. Using an Extended Events session to trap deadlock information (similar to the article previously discussed), we can pull out some pretty good info. Let’s examine some of the deadlock data trapped by an XE session.

The results from this query will show us the deadlock graph, the event data, as well as several other pieces of data already parsed from the session data for you. And looking at the session data, one can see that the sql_text from each of the queries will demonstrate both the ROWLOCK directive and the directive-free versions of the query. In this query you can also see that I did a little black magic to match up the two event types from the event session (lock_deadlock and xml_deadlock_report). Then I was able to join the two together to produce one row per deadlock event and to see the sql_text with the deadlock graph on one row. Otherwise, the sql_text does not produce with the deadlock_report event. I leave the rest of the query to the reader to discover and question.

From the EventDeadlockGraph column, we could click the cell and take a close look at the XML generated for the deadlock event. Further, if I choose to save the xml as an XDL file and then reopen it in SSMS, I can see the deadlock graphical report as shown in the following.

We see that row locks are still in effect for the update coming from both sides. This further supports that the directive really is just a waste of time in trying to combat this type of deadlock. This is one of those cases where the best option would be to optimize the code and work things out without trying to take a shortcut.

Wrapping Up

Look to optimize the code instead of trying to take a shortcut. In addition, take a look at the deadlocks, the locks held, and the code to get a better understanding of what is truly happening.

This article demonstrates briefly the power of Extended Events while diving into deadlocks. For more on using Extended Events, start reading here! This article may also be of interest.

The Curiously Large CASE Statement

 

A powerful little expression in SQL Server is the CASE statement. The CASE statement has the ability to “transform” data per business rules to help better understand the data in a given query. In the simplest form, the CASE statement is a statement that evaluates a list of conditions (in order) and returns the first matching result expressions.

Most uses of the CASE statement are benign enough that one wouldn’t even bat an eye at them. Every now and again, along comes a nifty little query that contains over 300 conditions in the CASE statement. Something that extreme certainly deserves a second (third or even fourth) look. In this article, I am going to explore one such CASE statement in an effort to squeeze more efficiency out of it.

300!

300+ conditions may be a bit hard to imagine. I know I had to double take the list more than once. Immediately upon seeing such an awful piece of code (not pulling punches there), I really wanted to dump the conditions/evaluations into a table and join to it. Eventually, I did put it into a table but not before catching some metrics. Let’s take a look at what this thing looked like from the execution plan properties perspective.

In the preceding image, note the query map in the scrollbar. I have the case statement circled in green. Yes – it is really that large. Next, I have “cached plan size”, “estimated number of rows”, and “estimated subtree cost” circled in red. Each of these shows something that is a little alarming to me. First, the plan size is 48MB! Next, the cost of the query is a little elevated at 51 and change. Last is what seems to be a high value for the estimated number of rows to be returned (only ~6k actually get returned).

On the plus side, the query only takes 381ms, or does it? The query actually takes 3 seconds to complete despite the timing in the properties window. Sometimes, we have to take some of these values in the properties window with a grain of salt.

What if I try to run this query on SQL Server 2017 instead of SQL Server 2014? Let’s take a look at what that might look like.

Firstly, we have a few properties that seem to indicate that we have moved in a slightly better direction with this particular query. The cached plan size has dropped all the way down to just 37MB. The estimated cost dropped to 36 and change and the elapsed time dropped to 353ms. Nothing terribly earth shattering there – but just changing to SQL Server 2017, in this case, we see a bit of an improvement.

That said, take a look at the compile memory and compile time. Memory is 30,544kb and the time is 1,324ms. Aha – we see why this query takes 3 seconds on SQL 2014 and roughly 2 seconds on SQL 2017. High compile times. That memory for the compile is too high for my liking as well so let’s go ahead and take a look at why that might be a little high. It’s time to look at the plan XML and the converted tree for the query (because it is fun).

This is where we get a little geeky with the CASE statement. In the execution plan XML (and the converted tree we will see in a bit), every evaluation criteria for the case statement must have some sort of instruction on how and what should be done. In the case of the XML, we have multiple scalar operators to denote the CASE statement and then each of the evaluations for the CASE statement.

Once we enter the CASE statement, the first thing we see is that there is an “OR” operation (circled in red). Then for each of the values to be compared, we will see an “Equality” operation (circled in green) followed by the value to be evaluated. For just the three values I have in the image, notice how that immediately bloats the execution plan XML for this query.

In the preceding image, we see the converted tree for this query. We see the same pattern evolve here. There is a scalar operator for each value to be evaluated. In this image, I just have the equality comparisons. However, there is also an OR operator that is defined further up in this converted tree. Could this possibly be the reason for the bloated compile time and compile memory?

Let’s evolve that query just a bit and use a table and a join for that particular case statement to see what happens.

Immediately, we can see that the cached plan size improves dramatically! The cached plan size is down to 1.1MB. The compile memory is down to ~4MB and the compile time is down to 174ms. These are all significant improvements but now we see an interesting problem with the memory grant. This query is requesting over 400MB for the memory grant. That should be sounding sirens everywhere. Also of note is the estimated number of rows has dropped to ~28k (a significant improvement in estimate). This query is completing in under 1 second at this point compared to the 3 seconds otherwise but that memory grant is worrying me (and hopefully you too).

For giggles, I went ahead and added a columnstore index to this table (for good reasons beyond the scope of this article). Let’s see what that did.

Notice how the memory grant has dropped to just over 28MB? This is an acceptable improvement for this query. Also note the query execution time improved by a decent amount. This query timing for this query is similar with both a warm cache or cold cache at this point.

Remember that note about the XML and converted tree bloat? With the case statement, the query plan was in excess of 37MB and the revised version is sitting at about 1.1MB. Looking at the quantity of text in each, I see 5819 lines of XML in the version with the case statement and only 3755 lines of XML in the table version of the query (and more room for optimization). The converted tree is more dramatic in that the case statement version of the converted tree requires 3157 lines of instruction and the version that uses a table instead only has 1353 lines of instruction.

What does any of that matter? It helps to illustrate that this 300+ condition filled CASE statement is causing bloat and wasting space along with execution resources.

Wrap

If you ever find yourself writing a query that requires an excessive amount of conditions to be evaluated in a CASE statement, you should reconsider the query design. Too many conditions can lead to plan bloat and excessive memory grants. Keep an eye out for stuff like that!

Thanks for reading! This has been a pseudo deep dive into a really ugly CASE statement. If you would like to read other articles that are less of a deep dive, You can read check out the Back to Basics articles here, or the Extended Events articles here.

Disk Space and SQL Server

Disk Space in SQL Server

One of the frequently required job functions of the database administrator is to track disk space consumption. Whether this requirement comes from management or from a learning opportunity after a production outage, the need exists.

As a hard working DBA, you want to make sure you hit all of the notes to make management sing your praises. Knowing just when the database may fill the drives and prevent a production outage just happens to be one of those sharp notes that could result in a raise and management singing hallelujah. The problem is, how do you do it from within SQL Server? You are just a DBA after all and the disk system is not your domain, right?

Trying to figure it out, you come across a pretty cool function within SQL Server. The name of the function is sys.dm_os_volume_stats. Bonus! This is an excellent discovery, right? Let’s see just how it might work. First a sample query:

If I run that on my local system, I might end up with something that looks like the following:

Looking at the image you may be wondering to yourself right now why I have highlighted a couple of things. You may also be wondering why I used the word “might” in the previous paragraph as well. The reasoning will become more evident as we progress. For now, you have resolved to continue testing the script so execute it again and end up with something that may look like the following (for the same server):

Whoa! What just happened there? Why are there two listings for the C: drive? Why does each register a different value for the FreeSpace column? In addition without any additional usage on the C drive (as verified through other tools) the FreeSpace is changing between executions as well as within the same execution. This is problematic, so you continue testing:

And yet again!

This can’t be correct, can it? Just for giggles let’s modify it just a bit to see if there are any additional clues. Using the following changed script, hopefully a clue will help shed some light on this:

This script yields the following potential results:

Look at the different highlighted areas! There are three different values for FreeSpace for the C: drive in this particular execution. The case of the C: drive plays no role in whether the value is recorded differently or not. This seems to be more of a bug within the dynamic management function. From execution to execution, using this particular method, one could end up with duplicate entries but distinct values. The sort of the execution could be returned differently (though we could fix that).

All of these tests were run on my local machine and I really do only have one C: drive. I should never receive multiple entries back for any drive. If using this particular DMF to track space usage, it could be somewhat problematic if the duplicate drive data pops up. How do we get around it, you ask? Here is another example that I have that has not yet produced this duplication:

Using this version of the script is not terribly more complex, but it will prove to be more reliable. You can see I used some CTEs to provide a little trickery and ensure that I limit my results. What if it is a mount point or a non-standard drive letter? I have not tested that. Let me know how that goes. As you can see, I am restricting the drive selection by using the row_number function against the drive letter.

For alternative reliable methods to find your disk space consumption, I would recommend something different because it is tried and tested. I would recommend using a wmi call to fetch the data. Samples are provided as follows:

Easy peasy, right? Have at it and try tracking your disk space.

Thanks for reading! This has been another article in the Back to Basics series. You can read many more here. For other interesting articles about database space – check here or here.

Passion, Challenges, and SQL

Comments: 1 Comment
Published on: February 12, 2019

TSQL Tuesday

The second Tuesday of the month comes to us a little early this month. That means it is time again for another group blog party called TSQLTuesday. This party that was started by Adam Machanic has now been going for long enough that changes have happened (such as Steve Jones (b | t) managing it now). For a nice long read, you can find a nice roundup of all TSQLTuesdays over here.

The Why?

Long time friend Andy Leonard (b | t) invites us this month to do a little checkup on ourselves and talk about the “why” around what we do. This could be a very easy topic for some. Equally, this could be a very difficult topic for those same people at different times in their lives. Thus the problem, the topic is simple in nature but sure requires a firm reflection on self and what you have been doing.

The problem for me is less about the “why” behind what I do, and more about how to stretch it out into something more than a few sentences. Think! Think! Think!

Challenges

One of my biggest reasons why I do what I do, boils down to the challenges that I frequently get to encounter. There is a wild satisfaction to working on a very difficult and challenging task, product, tool, profession, skill, etc. This satisfaction often involves reward and a sense of accomplishment.

The challenge can be anything from how to effectively communicate with a difficult person, a tough to find internals problem in SQL Server that could be causing a performance issue, or taking over a project and turning it back from the edge of failure and onto a track of success. Sometimes, the challenge may be as simple as converting a pathetic cursor into a set based approach and gaining an improvement of 100x in performance.

I really do enjoy some of the puzzles (challenges) that I get to work on routinely. This gives me an opportunity to improve my skillset as well as continue to learn. Being able to continually improve is a great motivation for me. The frequent challenges and continual opportunity to learn presents a great opportunity to evolve ones self and career. In a constantly changing world, being able to naturally and easily evolve your personal career is a bonus!

Passion

“Do what you love and you will never work a day in your life.” This is a common saying in the United States. Agree or disagree – there is some truth to it. Being able to do something one loves makes the really hard days a lot easier. Knowing, I may be able to solve a complex problem makes it easier to face the day.

I really enjoy the opportunity to face difficult challenges and resolve those challenges. The passion to solve these puzzles with data doesn’t end there. I also really do enjoy the opportunity to learn which brings up two other challenges that help me learn: speaking and writing.

By putting myself out there regularly to speak and write, I am becoming a better technical person. I am becoming better equipped to solve many of the puzzles I face. Those are great benefits. That said, I don’t feel I could get out there and talk about something about which I wasn’t passionate. I have learned to become passionate about writing and speaking – though I still have plenty of room for improvement (just as I do in my quest to become a good DBA).

TSQL2sDay150x150Wrapping it Up

I really do enjoy the challenges I get to face on a frequent basis in the world of data. This is the big “WHY” for me to continue my progress in this career.

Find something you are passionate about and strive to envelop your career with as many opportunities to do that thing. If that means accepting some less wanted tasks in order to do more of the thing you love, it could very well be worth it!

«page 2 of 70»

Calendar
June 2019
M T W T F S S
« May    
 12
3456789
10111213141516
17181920212223
24252627282930

Welcome , today is Monday, June 24, 2019