T-SQL Tuesday #43 – Hello, Operator?

Comments: 2 Comments
Published on: June 11, 2013

 

TSQL2sDay150x150

This month’s TSQL-Tuesday is hosted by Rob Farley (blog | twitter), who became a Microsoft Certified Master for SQL Server 2008 this past December (you know that elitist club that should wear Jedi robes to Summit 2013). As this month’s host, Rob has selected the topic of this month’s world-wide T-SQL Tuesday blogging party to be about Plan Operators. Specifically, one or more Plan Operators that have piqued your interest. With this guideline in mind, I have picked a few operators.  Some you may know and some that may be completely new to you.  Some are indicators of poor performance and some may be indicators of spectacular performance.

 

telephone_operator1

First things First

What is an operator?  Well for that, let’s just cheat a bit and go straight to the book definition on the topic.  From BOL we see the following:

Operators describe how SQL Server executes a query or a Data Manipulation Language (DML) statement. The query optimizer uses operators to build a query plan to create the result specified in the query, or to perform the operation specified in the DML statement. The query plan is a tree consisting of physical operators.

You can read more on what an operator is or does from this link.

Lesser Known Operators

In this group, I will cover some operators that go in largely in anonymity.  Let’s play a little game and see if you recognize any of these operators.  I will show you some operator icons and give a description.  You see if you can name the operator.  After all of these operators, I will give you the actual name.  No peeking – Honor System here!

para_nolabel

1.  This operator consumes multiple streams and produces multiple streams of work.  Think of it as a workhorse and an indicator that a lot is getting accomplished.  Imagine being able to consume a ton of work and consequently be able to produce a lot of work or information.

while_nolabel

 

2.  This operator indicates a repeating cycle.  In other words, it can be quite repetitive and it can be never-ending.

cursor_nolabel

 

3.  This one is a bit more difficult.  This occurs when a suitable icon cannot be displayed. It could very well be that the suitable icon is NTSFW.  This potentially represents a problem, but does not always represent a problem.

fetch_nolabel

 

4.  This operator represents the retrieval of something more.  Work is being done and a retrieval is required.  This could be done anywhere from one to thousands of times in a given work cycle.

refresh

 

5.  Related to the prior operator, this operator can be done multiple times within a work cycle.  This represents that data has to be retrieved again in some fashion.

So how well do you think you did?

These are all valid operator icons within graphical execution plans.  I am willing to bet you have had more frequent exposure to these operators than you may want to admit.  For instance, the first operator has a stigma about it in graphical execution plans that makes everybody think it is a really bad thing.  But I am here to tell you how wrong you are.

Let’s compare answers and see why you may be really wrong about that first operator.

If you answered “Repartition Streams” or “Parallelism” for operator number 1, you are wrong.  You see, this operator is actually the following.

Parallelism

If you don’t know who Paul White is and the work he has done to educate everybody on execution plans and operators, then you probably need to crawl out from under that rock.  Quite simply, Paul takes in so much information and puts out just as much information – but it is high quality information.  Go check out his stuff at his blog.

If you are 0 for 1, don’t feel bad.  That was a toughy.  I think they get a bit easier from here.

Next operator indicated a repeating cycle.  Who thinks this is a while operator based on the T-SQL While Loop?  If you notice, I was careful to bypass wording about language constructs.  In this case, it is actually the following.

10x1

 

If you run into this kind of operator, you have my deepest sympathies.  This is a purported Senior DBA with the same year of experience ten times.  This can often cause you quite a few gray hairs.  If you are running into this operator, you can certainly do for some optimization.

Are you 0 for 2 at this point?  That’s ok, I think you may be getting the gist of this game now.

Let’s move on to that third operator.  I am willing to be that 75% of the participants are really close on this one.  This is a cursor of sorts.  The orange color of the icon helps to give it away.  But, this cursor comes about from a workplace outage.  Let’s take a look at what this really is.

devcursor

Sure we hate to let people into the production environment.  Sometimes they do stupid things and it breaks the production environment.  Then you have some unhappy DBA cursing up a storm as he fixes it and goes on his merry way.

Next is operator number four in this little quiz.  The hint on this one was that there was a fetch somewhere in the mix.  That should make it very easy at this juncture.

Jr

How often do you task your Jr DBA as the gopher of the team?  This is a critical operator within the team.  Many times the senior does not have time for various tasks and s/he must assign some things to others to get them accomplished.  Jr go fetch this.  Jr go do that.  Jr get the disk space reports.  This operator in an execution plan may indicate a potential tuning effort should be engaged.  But when we are talking about it as the Jr DBA operator, it is an operator indicating that we might be running at a higher efficiency level – if you are distributing tasks to him/her appropriately.

I hope if you are 0 for 4 at this point that you will get this one.  This is potentially the best operator I have in the bunch.

SrTeaching

If you are a Senior DBA and are not involved with this operator at all, then turn in your keyfob and report to security.  This is absolutely critical.  Mentor the junior, repeat the cycle, and teach the junior some more.  This is definitely an iterative operation but is an absolute must in the DBA world.

As a Senior DBA, part of the job is to teach the junior and then to use the operator from item 4 and send them on their way to perform various tasks.  Sometimes, we lose track of this operator and become too wrapped up in our own set of tasks because we can just do it faster.  Teach the Jr and then maybe you will have more time for other projects in the future.

Joking Aside

There are several operators that I really like to dive into in SQL Server.  I won’t claim to know near as much as Paul White on the topic, but I enjoy it nonetheless.  There are three operators that I tend to enjoy bringing up in interviews and I have even started a series on them.  The last two articles in the series will be published soon.  Yes, I was a slacker and took a long time to finish parts two and three, but they are near completion on edits and ready to send to SSC.  You can read the first one here.

If you have any question about the use of Paul White as an Operator, he and I discussed it first and I received his blessing to proceed.

In Vs. Inner Join

Categories: News, Professional
Comments: No Comments
Published on: February 15, 2010

This is tightly related to another of my forays into tuning some slowly/poorly performing processes.  This process came across my desk due to some reporting inaccuracies.  The report is derived from data in a downstream process that is based on the data provided from this top level process.  More precisely, this is an ETL job that pumps data into the warehouse for consumption by several other downstream ETL jobs and reports.  In order to effectively troubleshoot, I wanted to follow the data all the way back to the source.  I had to verify that all was working correctly from end to end in the process chain for this report.  I ended up tuning a script that was a critical piece of this ETL process.  (Oh and the data inconsistency was also corrected here and in the downstream report – there were a couple of consistency problems).

So, let’s start with examples of some of the problem areas with the script.  First, it took an average of 12 minutes to execute.  This slowness was consistent whether executed from the dts package or from SSMS.  Second, it was not a stored proc.  This particular script is used in more than one location.  If any changes need to be made to the logic, one must find all locations rather than edit the stored proc.  Third, it relied on sub-queries and correlated sub-queries.  Fourth, the use of date comparison operators in the Join statements.  These were additional Join conditions that I needed to test for relevance and impact on the result set.  That should be enough to discuss about this script.  Fifth, persistent use of the NOLOCK hint.  In this scenario, it was used as an optimization ploy.  There were other “problems” with the rest of the process related to this script, suffice it to say I corrected some of those issues too and am working to correct the rest.

I am going begin with the third problem stated.  The first issue will lend itself to being resolved by the end of this article as a part of the overall process.  The second issue will be the last item treated and should just be understood by the end that I put the query into a proc and altered the dts package to call the proc instead of running the script.

Sub-Queries

What is a sub-query?  A sub-query is a query that is found attached to a Join clause or in the Where clause.  It is, as the name implies, a query that is run as a part of an outer/parent query.  From msdn, we learn that:

“many queries can be evaluated by executing the subquery once and substituting the resulting value or values into the WHERE clause of the outer query.”

This doesn’t sound so bad.  The query could be executed once and then return the necessary results to the where or join conditions of the outer query.  In my scenario, the sub-queries are run twice each.  They are a part of a Union select statement and the same query will appear on both sides of the Union.

[codesyntax lang="sql" title="One of My subqueries"]

left join (select SSN, avg(PBO) PBO, min(ConDate) ConDate
, cast(convert(varchar(8),min(RecordDate),112) as smalldatetime) RecordDate
from dbo.DOE_CONCampaign_Funded (nolock)
where ServicerID = 1234
group by SSN) f2 on t.SSN = f2.SSN and ss.SubmitDate < f2.ConDate

[/codesyntax]

In this example we see a few of the items that I listed.  Since I am running this on a SQL Server 2005 box, I decided to move each of those sub-queries to a CTE.  This change gave me an improvement in time of about three minutes. (Even as I test currently, I am seeing that the original query is now taking in excess of 23 minutes – ouch!)

Now on to that Union statement.  I modified the Union to be a Union All.  Without specifying the All, a Distinct is performed on the query being performed.  I needed to determine first if I truly needed to perform a Distinct operation, or if I could use the Union All statement without altering the result-set.   Testing both versions, I ended up with the same result set.  Based on this I am pretty confident that I can proceed with using Union ALL and producing another time savings of about 1 minute on this query.  This is good news.

Correlated Sub-Queries

Next up is the use of the correlated subqueries.  This query uses a correlated subquery on each side of the union statement.  The correlated sub-query looks like the following for the top half of the statement.

[codesyntax lang="sql"]

[/codesyntax]

I recently had to add the last 5 campaigns to this list.  I see a couple of problems with this.  First off, it is cumbersome to edit code to fix these hard-coded values.  Some code somewhere will likely get missed.  The second pain point was the slowness this caused my query.  The resolution was to create a “Control” table to which these values would be added.  The code would be corrected in all locations and would use a Join instead of the correlated sub-query.  What does this buy me?  My performance now becomes very consistent and is less than 20 seconds without any further indexing.  One more thing to note here is that the IO Stats are a going to be different for multiple reasons at this point.  The original IO Stats above only show the first four campaigns listed in the correlated Subquery.  I am currently working with the entire set of Campaigns therein listed.

The replacement for the correlated sub-query is an Inner Join back to the control table.

[codesyntax lang="sql"]

[/codesyntax]

This is the top half of the Union Select statement.  The bottom half is slightly different.  The table has an additional field to help designate type of campaign.  This means less code rework and easier maintainability on my part.  If I want to exclude a campaign from the query – remove it from the table.

Extraneous Join Conditions

In this query there is the use of datefields in the Outer Join clauses.  The dates used are comparisons only and needed to be tested for contribution to the query.  At this point the testing is simple – run the query as is and then rerun with each of those uses of date comparisons commented out  in the Join Clauses.

As-is query

[codesyntax lang="sql"]

[/codesyntax]

Changed query

[codesyntax lang="sql"]

[/codesyntax]

I was a little surprised to see that both forms of the query returned the same result set.  In light of that, the fact that there appears to be no visible performance gain by eliminating these Join Conditions is not astonishing to me.  Thus I can leave them or eliminate them – they don’t add or subtract from the query performance for my current purposes.  I decided to remove them anyway.  At this point, the IO Stats remain unchanged from the previous sampling.  I would now need to examine indexing to improve performance.

Conclusion

This exercise is yet another one of those opportunities to performance tune a process.  This one had been left alone because it “was working” and people wanted to leave well enough alone.  When processing takes that long, well enough is not good enough.  The systems need to be able to move on to other processing tasks and use the resources in a more efficient manner than to run the same process for an excess of 12 minutes (24 minutes today during testing to get data for this article).  This is only a piece of the puzzle to tuning this particular process.  This query was tuned, altered, put into a stored proc, and the nolock hints removed as some of the adjustments.  In the end, it returns the correct data set and does it in 1/72nd of the time it took earlier today and 1/36th the average time.  I would say that is substantial cost savings.

Indexes Out the Window II

Categories: News, Professional
Comments: 2 Comments
Published on: January 25, 2010

In Part I, I discussed some of the peculiarities and troubleshooting done in relation to a peculiar execution plan.  In this post, I will discuss some of what I did to further tune the query.

First, a little more background into the design of this query.  This query uses some dynamic SQL to try and build a dynamic where clause based on the value passed to a single parameter.  I need to be able to create three different where clauses that could essentially create a range-scan on a date field.  In two of the three queries could be an inequality date comparison rather than an equality comparison.  The specific requirements are to have the report return 1 of the following: 1) 2010 data only, 2) < 2010 data, and 3) all data.  This was initially accomplished with a query similar to:

[codesyntax lang="sql" title="Initial Where"]

[/codesyntax]

One of the first comments might be, “That query is not SARG-able.”  True it is not a SARG query.  That will be the last item to be evaluated.  The slowness of the query in this case has less to do with the SARG-ability of it than it does with the inequality scan.  The puzzle in the query was “How to make the query an equality comparison” rather than an < or > comparison.  The above query uses the indexes as discussed in the prior post – it is just not as fast as it could be.

It is at this point that I am reminded of an integral part of my last two presentations at my local PASS meetings.  And it dawns on me that an understanding of the data is truly necessary.  Most of the RecordDate values are Null since there was no backfill on the Column after it’s implementation.  Prior to it’s implementation there was no Date control on the table.  The table is in a warehouse and only contains data since the beginning of 2009.  It is also necessary to understand that there is a minimal chance that there will be record additions to the table after 2010.  Thus we now have a method to be able to get the equality comparisons.  The following might satisfy this need:

[codesyntax lang="sql" title="Next Rev"]

[/codesyntax]

This change is now using a Clustered Index seek.  This clustered Index seek is the most costly piece of the Execution plan.  However, the overall runtime of the query has improved by 1 sec.

Thinking I could still improve the query, I decided that a new column called ReportYear would help out the cause.  The change would allow for the query to be SARG-able, the column would have a default on it to change RecordDate to the first day of the year recorded in the RecordDate field.  I figured this idea would also still permit me to do an Eqaulity search in my Where clause.  Let’s see the results:

Wow, I have just come full circle back to the First Post in this series.  I just lost the clustered Index seek, but I did not lose any of the performance.  To see if I gained anything by making the query SARG-able, let’s look at the IO stats between the two.

First IO Stats

Second IO Stats

Top image represents the First Query and the bottom image represents the Second Query.  We have an improvement of about 600,000 logical reads.  That is a substantial improvement.  That improvement comes even with an Index Scan on an Index that is non-covering.  Now, to tune the index.  Much the same as the first article, I added an index to the SalesOpportunity table on LastUpdate and SalesOppID.  This creates a covering index and removes the hash match join in favor of the merge join in the execution plan.  Another effect is the IO Stats as displayed below.

Second Query Index

We can see here that there was once again an improvement made in the logical reads.  Another ~10,000 logical reads improvement.  Overall, both queries run in the same amount of time – but the second query uses less resources.  Again, if I change the clustered index on this table – I am certain I would see an improvement.

I am still in search of the reason behind the use of a bad index for this query.

Execution Plans can be downloaded here.

Indexes Out the Window

Categories: News, Professional
Comments: 1 Comment
Published on: January 22, 2010

I ran into a peculiarly puzzling situation with a query I have been tuning of late.  The query runs some aggregations and performs a range scan based on a date value passed to the query.  The primary target for this query is to be consumed by reporting services.  The query executes (from a stored proc) three times in a single report.  This is another case were the report currently outperforms the previous version by several fold.  I just want it to run a little faster – it is an executive report afterall.

While tuning the query itself, I removed some of the aggregations (not consumed by the report and not needed for accuracy), and that bought me a small performance gain.  Next was to remove a couple of “or” operations.  Again, that only bought me a small performance gain.

The next step was to check my indexing.  Looking at the maintenance plan for the query, indexes were being used – or so it seemed.  Based on the execution plan, the query was performing two Index Scans prior to a Hash Match.  In the first Index Scan, the query was expecting an Output of two columns (SalesOppID and LastUpdate) from IDX_SalesOpportunity_SalesStatusIDSalesContactID.  This is depicted in the next two images.

This makes a little sense since the index does contain the LastUpdate column.  But the index is not covering, and we do not have a Key lookup.

[codesyntax lang="sql" title="IDX_SalesOpportunity_SalesStatusIDSalesContactID"]

[/codesyntax]

Out of curiosity, I decided to see what would happen if I commented out the lines that required the LastUpdate field.  I ended up with the following:

Interesting, the query optimizer decided that we should use IDX_LeadTypeID which has the following definition:

[codesyntax lang="sql" title="IDX_LeadTypeID"]

[/codesyntax]

Once again, I have a query that is using a bad index.  There is no key lookup and the index is not covering.  I find this very peculiar.  Thus I proceeded to test a little more.  I dropped that Index and reran the query.  With each group of tests, it is important to note, I cleared the ProcCache.  After dropping the index, my execution plan showed no performance increase and still showed a funny result.  The query moved onto a new index that was not covering and there was no key lookup.

Now, let’s make this a little more interesting.  I do have a Clustered Index on SalesOppID – defined as follows:

[codesyntax lang="sql" title="PK_SalesOpportunity"]

[/codesyntax]

I am puzzled as to why my query will not use the Clustered Index.  I decided to do the standard stuff – DBCC CheckDB, DBCC UpdateUsage etc.  There is no corruption, and the query still tries to perform the same way.  I could recreate the clustered index on the LastUpdate field.  Since there are Foreign Key relationships that I do not want to drop and recreate, I will just create an NC index on the appropriate two fields.  After doing that, I get the following:

The index is covering and pulling the correct columns.  With this index the query is about 3 times faster than previously.  Some of that is due to the change from a Hash Match to a Merge Join from the two tables (there is still a hash match but it is after the merge and is now less costly).

The moral is, even though an index may be used by the query – it may not be a good index choice.  I am still working on tuning this query, I think I can squeeze better performance out of it.

Execution Plans can be downloaded here.

page 1 of 1




Calendar
July 2014
M T W T F S S
« May    
 123456
78910111213
14151617181920
21222324252627
28293031  
Content
SQLHelp

SQLHelp


Welcome , today is Thursday, July 31, 2014