The Curiously Large CASE Statement

 

A powerful little expression in SQL Server is the CASE statement. The CASE statement has the ability to “transform” data per business rules to help better understand the data in a given query. In the simplest form, the CASE statement is a statement that evaluates a list of conditions (in order) and returns the first matching result expressions.

Most uses of the CASE statement are benign enough that one wouldn’t even bat an eye at them. Every now and again, along comes a nifty little query that contains over 300 conditions in the CASE statement. Something that extreme certainly deserves a second (third or even fourth) look. In this article, I am going to explore one such CASE statement in an effort to squeeze more efficiency out of it.

300!

300+ conditions may be a bit hard to imagine. I know I had to double take the list more than once. Immediately upon seeing such an awful piece of code (not pulling punches there), I really wanted to dump the conditions/evaluations into a table and join to it. Eventually, I did put it into a table but not before catching some metrics. Let’s take a look at what this thing looked like from the execution plan properties perspective.

In the preceding image, note the query map in the scrollbar. I have the case statement circled in green. Yes – it is really that large. Next, I have “cached plan size”, “estimated number of rows”, and “estimated subtree cost” circled in red. Each of these shows something that is a little alarming to me. First, the plan size is 48MB! Next, the cost of the query is a little elevated at 51 and change. Last is what seems to be a high value for the estimated number of rows to be returned (only ~6k actually get returned).

On the plus side, the query only takes 381ms, or does it? The query actually takes 3 seconds to complete despite the timing in the properties window. Sometimes, we have to take some of these values in the properties window with a grain of salt.

What if I try to run this query on SQL Server 2017 instead of SQL Server 2014? Let’s take a look at what that might look like.

Firstly, we have a few properties that seem to indicate that we have moved in a slightly better direction with this particular query. The cached plan size has dropped all the way down to just 37MB. The estimated cost dropped to 36 and change and the elapsed time dropped to 353ms. Nothing terribly earth shattering there – but just changing to SQL Server 2017, in this case, we see a bit of an improvement.

That said, take a look at the compile memory and compile time. Memory is 30,544kb and the time is 1,324ms. Aha – we see why this query takes 3 seconds on SQL 2014 and roughly 2 seconds on SQL 2017. High compile times. That memory for the compile is too high for my liking as well so let’s go ahead and take a look at why that might be a little high. It’s time to look at the plan XML and the converted tree for the query (because it is fun).

This is where we get a little geeky with the CASE statement. In the execution plan XML (and the converted tree we will see in a bit), every evaluation criteria for the case statement must have some sort of instruction on how and what should be done. In the case of the XML, we have multiple scalar operators to denote the CASE statement and then each of the evaluations for the CASE statement.

Once we enter the CASE statement, the first thing we see is that there is an “OR” operation (circled in red). Then for each of the values to be compared, we will see an “Equality” operation (circled in green) followed by the value to be evaluated. For just the three values I have in the image, notice how that immediately bloats the execution plan XML for this query.

In the preceding image, we see the converted tree for this query. We see the same pattern evolve here. There is a scalar operator for each value to be evaluated. In this image, I just have the equality comparisons. However, there is also an OR operator that is defined further up in this converted tree. Could this possibly be the reason for the bloated compile time and compile memory?

Let’s evolve that query just a bit and use a table and a join for that particular case statement to see what happens.

Immediately, we can see that the cached plan size improves dramatically! The cached plan size is down to 1.1MB. The compile memory is down to ~4MB and the compile time is down to 174ms. These are all significant improvements but now we see an interesting problem with the memory grant. This query is requesting over 400MB for the memory grant. That should be sounding sirens everywhere. Also of note is the estimated number of rows has dropped to ~28k (a significant improvement in estimate). This query is completing in under 1 second at this point compared to the 3 seconds otherwise but that memory grant is worrying me (and hopefully you too).

For giggles, I went ahead and added a columnstore index to this table (for good reasons beyond the scope of this article). Let’s see what that did.

Notice how the memory grant has dropped to just over 28MB? This is an acceptable improvement for this query. Also note the query execution time improved by a decent amount. This query timing for this query is similar with both a warm cache or cold cache at this point.

Remember that note about the XML and converted tree bloat? With the case statement, the query plan was in excess of 37MB and the revised version is sitting at about 1.1MB. Looking at the quantity of text in each, I see 5819 lines of XML in the version with the case statement and only 3755 lines of XML in the table version of the query (and more room for optimization). The converted tree is more dramatic in that the case statement version of the converted tree requires 3157 lines of instruction and the version that uses a table instead only has 1353 lines of instruction.

What does any of that matter? It helps to illustrate that this 300+ condition filled CASE statement is causing bloat and wasting space along with execution resources.

Wrap

If you ever find yourself writing a query that requires an excessive amount of conditions to be evaluated in a CASE statement, you should reconsider the query design. Too many conditions can lead to plan bloat and excessive memory grants. Keep an eye out for stuff like that!

Thanks for reading! This has been a pseudo deep dive into a really ugly CASE statement. If you would like to read other articles that are less of a deep dive, You can read check out the Back to Basics articles here, or the Extended Events articles here.

Disk Space and SQL Server

Disk Space in SQL Server

One of the frequently required job functions of the database administrator is to track disk space consumption. Whether this requirement comes from management or from a learning opportunity after a production outage, the need exists.

As a hard working DBA, you want to make sure you hit all of the notes to make management sing your praises. Knowing just when the database may fill the drives and prevent a production outage just happens to be one of those sharp notes that could result in a raise and management singing hallelujah. The problem is, how do you do it from within SQL Server? You are just a DBA after all and the disk system is not your domain, right?

Trying to figure it out, you come across a pretty cool function within SQL Server. The name of the function is sys.dm_os_volume_stats. Bonus! This is an excellent discovery, right? Let’s see just how it might work. First a sample query:

If I run that on my local system, I might end up with something that looks like the following:

Looking at the image you may be wondering to yourself right now why I have highlighted a couple of things. You may also be wondering why I used the word “might” in the previous paragraph as well. The reasoning will become more evident as we progress. For now, you have resolved to continue testing the script so execute it again and end up with something that may look like the following (for the same server):

Whoa! What just happened there? Why are there two listings for the C: drive? Why does each register a different value for the FreeSpace column? In addition without any additional usage on the C drive (as verified through other tools) the FreeSpace is changing between executions as well as within the same execution. This is problematic, so you continue testing:

And yet again!

This can’t be correct, can it? Just for giggles let’s modify it just a bit to see if there are any additional clues. Using the following changed script, hopefully a clue will help shed some light on this:

This script yields the following potential results:

Look at the different highlighted areas! There are three different values for FreeSpace for the C: drive in this particular execution. The case of the C: drive plays no role in whether the value is recorded differently or not. This seems to be more of a bug within the dynamic management function. From execution to execution, using this particular method, one could end up with duplicate entries but distinct values. The sort of the execution could be returned differently (though we could fix that).

All of these tests were run on my local machine and I really do only have one C: drive. I should never receive multiple entries back for any drive. If using this particular DMF to track space usage, it could be somewhat problematic if the duplicate drive data pops up. How do we get around it, you ask? Here is another example that I have that has not yet produced this duplication:

Using this version of the script is not terribly more complex, but it will prove to be more reliable. You can see I used some CTEs to provide a little trickery and ensure that I limit my results. What if it is a mount point or a non-standard drive letter? I have not tested that. Let me know how that goes. As you can see, I am restricting the drive selection by using the row_number function against the drive letter.

For alternative reliable methods to find your disk space consumption, I would recommend something different because it is tried and tested. I would recommend using a wmi call to fetch the data. Samples are provided as follows:

Easy peasy, right? Have at it and try tracking your disk space.

Thanks for reading! This has been another article in the Back to Basics series. You can read many more here. For other interesting articles about database space – check here or here.

Passion, Challenges, and SQL

Comments: 1 Comment
Published on: February 12, 2019

TSQL Tuesday

The second Tuesday of the month comes to us a little early this month. That means it is time again for another group blog party called TSQLTuesday. This party that was started by Adam Machanic has now been going for long enough that changes have happened (such as Steve Jones (b | t) managing it now). For a nice long read, you can find a nice roundup of all TSQLTuesdays over here.

The Why?

Long time friend Andy Leonard (b | t) invites us this month to do a little checkup on ourselves and talk about the “why” around what we do. This could be a very easy topic for some. Equally, this could be a very difficult topic for those same people at different times in their lives. Thus the problem, the topic is simple in nature but sure requires a firm reflection on self and what you have been doing.

The problem for me is less about the “why” behind what I do, and more about how to stretch it out into something more than a few sentences. Think! Think! Think!

Challenges

One of my biggest reasons why I do what I do, boils down to the challenges that I frequently get to encounter. There is a wild satisfaction to working on a very difficult and challenging task, product, tool, profession, skill, etc. This satisfaction often involves reward and a sense of accomplishment.

The challenge can be anything from how to effectively communicate with a difficult person, a tough to find internals problem in SQL Server that could be causing a performance issue, or taking over a project and turning it back from the edge of failure and onto a track of success. Sometimes, the challenge may be as simple as converting a pathetic cursor into a set based approach and gaining an improvement of 100x in performance.

I really do enjoy some of the puzzles (challenges) that I get to work on routinely. This gives me an opportunity to improve my skillset as well as continue to learn. Being able to continually improve is a great motivation for me. The frequent challenges and continual opportunity to learn presents a great opportunity to evolve ones self and career. In a constantly changing world, being able to naturally and easily evolve your personal career is a bonus!

Passion

“Do what you love and you will never work a day in your life.” This is a common saying in the United States. Agree or disagree – there is some truth to it. Being able to do something one loves makes the really hard days a lot easier. Knowing, I may be able to solve a complex problem makes it easier to face the day.

I really enjoy the opportunity to face difficult challenges and resolve those challenges. The passion to solve these puzzles with data doesn’t end there. I also really do enjoy the opportunity to learn which brings up two other challenges that help me learn: speaking and writing.

By putting myself out there regularly to speak and write, I am becoming a better technical person. I am becoming better equipped to solve many of the puzzles I face. Those are great benefits. That said, I don’t feel I could get out there and talk about something about which I wasn’t passionate. I have learned to become passionate about writing and speaking – though I still have plenty of room for improvement (just as I do in my quest to become a good DBA).

TSQL2sDay150x150Wrapping it Up

I really do enjoy the challenges I get to face on a frequent basis in the world of data. This is the big “WHY” for me to continue my progress in this career.

Find something you are passionate about and strive to envelop your career with as many opportunities to do that thing. If that means accepting some less wanted tasks in order to do more of the thing you love, it could very well be worth it!

Interview Trick Questions

Today, I am diverging from the more technical posts that I routinely share. Instead, as the title suggests, I want to dive into a something a little more fun.

Anybody that has interviewed for a job has most likely run into the trick question. Some interviewers like to throw out multiple trick questions all in an effort to trip up the candidate and get the candidate to doubt him/her self. Sure, there can be some benefit to throwing out a trick question or four. One such benefit would be to see how the candidate performs under pressure (see them squirm).

The downside to throwing out trick questions, in my opinion, would be that you can turn a serious candidate into an uninterested candidate. So, when throwing out the tricks, tread carefully.

Let’s take a look at an interview trick question candidate. This is a more technical question and is designed to make you think a little bit. Before reading on to see the answer, I implore that you try to answer the question for yourself legitimately.

How can you insert data into two tables using a single statement without the use of triggers, service broker or some other behind-the-scenes feature?

Are you thinking about it?

Do you have your answer yet?

Now that you have your answer, go ahead and continue reading.

Is your answer to this question something along the lines of “You can’t do that and this is just a trick question”?

Well, honestly, it is a bit of a trick question. But I assure you, you can certainly perform an insert into multiple tables from a single statement. Here is one such setup that demonstrates how you can do this:

Do you see how I was able to perform that insert into multiple tables? The trick is in using the OUTPUT clause. This little feature in SQL Server can be of great use for things such as building multiple staging tables during an ETL process.

Here is that little trick again just to highlight it.

Conclusion

There are cases when an interview trick question is suitable. It is when the purported question is truly more technical than trick and is really trying to evaluate your depth and knowledge of SQL Server. The puzzle during the interview boils down to figuring out when it is a trick and when it might not be. Then from there, work your way through possible solutions. But don’t be afraid to admit when you haven’t got a clue. That will be far more impressive than to try and flim-flam the interviewer.

I invite you to share your trick questions in the comments.  Also, how did you solve this particular trick question?

 

Thanks for reading! This has been another article in the Back to Basics series. You can read many more here.

Defaults In msdb Database

Today is a day to discuss defaults. It started with the day being TSQL Tuesday and having a topic of “Say No to Defaults.” You can read more about that from the invite – here. I already participated in the party but did also want to discuss defaults a little bit more. That said, this article is not participating in the blog party. That would seem a bit silly.

While, this post is not a part of the party, the defaults to be discussed are fairly important. I have seen severe consequences due to these defaults being ignored and not changed. So today, in addition to my earlier article (you can read it here), I implore you to make some fundamental changes to your production servers with regards to various defaults.

A Trio of msdb Defaults

There aren’t really that many defaults within the msdb database that must be changed, are there? I mean, seriously, beyond the defaults that are available to every database, what could possibly be unique to this database that could have a severe consequence?

I am so glad you asked!

The defaults in the msdb database are more about what is missing than what is actually there. By default, this database is missing quite a few things that could be deemed critical to your environment.

Let’s start with an easy one – Indexes

There are a few out there that may disagree, but the proof really is in the effect on performance for backup jobs and such. I have three indexes I like to put on every instance. I have seen the implementation of these indexes aid in improved job times as well as aid in reduced time to “clean” up the database.

Easy enough. These indexes are very straight forward and pretty small in the grand scheme of things. But if the index can help improve performance by a factor of 10, then I am in favor of them (and I have seen that performance gain).

Now that we have some supporting indexes to help a bit with performance, we should take a look at the next item. This one can help with job performance as well as help with keeping the msdb database nice and trim.

Data Pruning

I have walked into client instances that had backup history dating all the way back to 2005 and included two-three full backups a day per database with quarter-hourly log backups. Oh and this was for an instance containing well north of 200 databases. Can you say sluggish backups and sluggish msdb overall?

The fix is very easy! Not only do I recommend pruning the backup history, but also the job history, mail history and maintenance plan history (eew – if you use those things). Think about it – do you really need to know that Job XYZ ran successfully in 2006 and only took 15 seconds? This is 2015 and that kind of data is probably not pertinent at this point.

The pruning of this data is not enabled by default! You have to configure this for each of the servers under your purview. Luckily, this is easy to do!

If you use this code sample, be sure to adjust the number of days shown in the retention to match your specific needs.

Now we have addressed a couple of defaults in msdb that can impact your performance. We are tidying up the database and in a much happier state these days. There is one more default, though, that is really critical to your data’s well being. This one is set within the msdb database but it really is for all of your databases!

Configuring Alerts!

I’m not talking about just any alerts. There are some very specific alerts that really should be configured. These are the alerts that can help you intervene to minimize corruption.

If you haven’t faced a problem with corruption – you will. It is only a matter of time. Corruption happens. When it happens, the earlier one can intervene, usually the better the outcome. Every minute counts, so why not try to reduce that time as much as possible?

This one is not terribly difficult to implement. I happen to have a query ready to go for that as well. All that needs to be done is a minor adjustment to the alert email address:

Wrap

Wow! Now there are three quick defaults that must be changed on every server. These defaults will help improve performance as well as help you stay on top of things when they start to go south (corruption). With timely notifications, and better performance, your servers will be happier, healthier, and longer lasting.

Thanks for reading! This has been another article in the Back to Basics series. You can read many more here.

«page 3 of 129»

Calendar
May 2019
M T W T F S S
« Apr    
 12345
6789101112
13141516171819
20212223242526
2728293031  

Welcome , today is Saturday, May 25, 2019