Adventures with NOLOCK

Categories: News, Professional, SSC
Comments: 4 Comments
Published on: June 15, 2015

Some of the beauty of being a database professional is the opportunity to deal with our friend NOLOCK.  For one reason or another this query directive (yes I am calling it a directive* and not a hint) is loved and idolized by vendors, applications, developers, and upper management alike.  The reasons for this vary from one place to the next, as I have found, but it always seems to boil down to the perception that it runs faster.

And yes, queries do sometimes run faster with this directive.  That is until they are found to be the head blocker or that they don’t run any faster because you can write good TSQL.  But we are not going to dive into those issues at this time.

A gem that I recently encountered with NOLOCK was rather welcome.  Not because of the inherent behavior or anomalies that can occur through the use of NOLOCK, but rather because of the discovery made while evaluating an execution plan.  Working with Microsoft SQL Server 2012 (SP1) – 11.0.3128.0 (X64) , I came across something that I would rather see more consistently.  Let’s take a look at this example execution plan:

NoLock

 

First is a look at the plan to see if you can see what I saw.

Read Uncommitted

 

And now, we can see it clear as day.  In this particular case, SQL Server decided to remind us that the use of this directive allows uncommitted reads to occur so it throws that directive into the query text of the execution plan as well.  This is awesome!  In short, it is a visual reminder that the use of the NOLOCK directive, while it may be fast at times, is a direct route to potentially bad results by telling the database engine to also read uncommitted data.

How cool is that?  Sadly, I could only reproduce it on this one version of SQL Server so far.  If you can reproduce that type of behavior, please share by posting to the comments which version and what you did.  For me, database settings and server settings had no effect on this behavior.  No trace flags were in use, so no effect there either.  One thing of note, in my testing, this did not happen when querying against a table direct but did only occur when querying against a view (complexity and settings produced no difference in effect for me).

* I would like to make it a goal for every database professional to call this a DIRECTIVE instead of a hint.  A hint implies that you may have a choice about the use of the option specified.  And while NOLOCK does not entirely eliminate locks in the queries, it is an ORDER to the optimizer to use it.  Therefor it is more of a directive than a mere suggestion.

Index your FK

Categories: News, Professional, Scripts
Comments: No Comments
Published on: March 4, 2010

A few months ago I read an article from SQLServerCentral.com about some Foreign Key gotchas.  Since that article, I have seen several people asking about whether a foreign key should be indexed or not.  I have seen the benefits of having FKs indexed as well as the agony of not indexing them.  The article discusses some of the benefits.  In summary, it will help to avoid deadlocks, reduce locks, and improve performance.  My intent is not to go into depth on the article, and to focus on a couple of scripts to help identify which tables need indexes on the foreign keys.  This article is a part of my Key documentation series.

If you browse the internet, you will likely be able to find several scripts to accomplish the same goal.  There is one script that I have seen that will generate the script for you to create the index on the foreign key.  Many scripts are similar – some perform slightly different things.  I thought about evaluating some of the scripts and comparing them.  I have found some to be less accurate than others.  I decided after some thought that I would just share my scripts.  I will show the evolution of the script that I now use.

Rev 1

This was just to hammer out a solution that would get me the results I sought.  This version uses some objects that are to be deprecated.  Thus should only really be used on Servers that are SQL 2000.

[codesyntax lang=”tsql”]

[/codesyntax]

This works fine and with limitations.  Obviously there was the limitation of using objects that are scheduled to be deprecated.  The second limitation is that it is not 100% accurate.  This script does not pull all of the Foreign Keys that are missing Indexes.  This is the same sort of issue that I found with several scripts on the internet.  That said, this query is rather efficient.  When compared to future versions, it was 8% less costly and about 150ms faster.  However, it does consume more disk IO and more CPU time.

With these limitations in mind, I decided to create a second query.

Rev 2

[codesyntax lang=”tsql”]

[/codesyntax]

This query is less limited than the first.  Here, I have improved the performance substantially over the first query and the execution plan cost is lower.  I have implemented the use of SQL 2005 objects, decreased Disk IO, and decreased run time.  CPU time remains about the same as the previous example.  Still one limitation that is pretty glaring.  I am not returning all Foreign Keys that are missing an index.

Moving onto attempt number three.

Rev 3

As I was testing version three of this query, I noticed there was another item that I needed.  I ran the tests in a database with multiple schemas and had difficulty locating the tables that were in the list.  When looking at the expanded view in SSMS, the tables are grouped by schema and listed alphabetically within that schema.  When I realized where the table was, it dawned on me the need to include the schema in the query results.  Adding this little piece of information will save a few seconds when trying to verify the information in the report.

[codesyntax lang=”tsql”]

[/codesyntax]
In the code, note that I have opted to use the schema_name() function rather than use the Join on the sys.schemas view.  By using the function, the script performs a little better.  There is another addition I made to the script.

[codesyntax lang=”tsql”]

[/codesyntax]

An additional condition was added to the Join.  With the addition of this condition, I am now able to pull back the additional foreign keys that are missing indexes.  In addition to this additional condition, I have added an Order By, which will slow it down a bit.  The offset to the decrease there is that the data is more manageable for verification.

Conclusion

There are many methods to arrive at this data.  The difference in performance between the methods I wrote and evolved was negligible.  The most important concept in the end is having a solution that is accurate and timely to provide this data.  All of these solutions finish in under a second for me.  None of them place a significant impact on the server, and thus I can choose for accuracy over performance in this instance.

This exercise is a part of getting to know your database.  Finding and documenting is very helpful when it comes time to troubleshoot.  Once this information is garnered, one needs to evaluate whether or not to place an index on that foreign key.

In Vs. Inner Join

Categories: News, Professional
Comments: No Comments
Published on: February 15, 2010

This is tightly related to another of my forays into tuning some slowly/poorly performing processes.  This process came across my desk due to some reporting inaccuracies.  The report is derived from data in a downstream process that is based on the data provided from this top level process.  More precisely, this is an ETL job that pumps data into the warehouse for consumption by several other downstream ETL jobs and reports.  In order to effectively troubleshoot, I wanted to follow the data all the way back to the source.  I had to verify that all was working correctly from end to end in the process chain for this report.  I ended up tuning a script that was a critical piece of this ETL process.  (Oh and the data inconsistency was also corrected here and in the downstream report – there were a couple of consistency problems).

So, let’s start with examples of some of the problem areas with the script.  First, it took an average of 12 minutes to execute.  This slowness was consistent whether executed from the dts package or from SSMS.  Second, it was not a stored proc.  This particular script is used in more than one location.  If any changes need to be made to the logic, one must find all locations rather than edit the stored proc.  Third, it relied on sub-queries and correlated sub-queries.  Fourth, the use of date comparison operators in the Join statements.  These were additional Join conditions that I needed to test for relevance and impact on the result set.  That should be enough to discuss about this script.  Fifth, persistent use of the NOLOCK hint.  In this scenario, it was used as an optimization ploy.  There were other “problems” with the rest of the process related to this script, suffice it to say I corrected some of those issues too and am working to correct the rest.

I am going begin with the third problem stated.  The first issue will lend itself to being resolved by the end of this article as a part of the overall process.  The second issue will be the last item treated and should just be understood by the end that I put the query into a proc and altered the dts package to call the proc instead of running the script.

Sub-Queries

What is a sub-query?  A sub-query is a query that is found attached to a Join clause or in the Where clause.  It is, as the name implies, a query that is run as a part of an outer/parent query.  From msdn, we learn that:

“many queries can be evaluated by executing the subquery once and substituting the resulting value or values into the WHERE clause of the outer query.”

This doesn’t sound so bad.  The query could be executed once and then return the necessary results to the where or join conditions of the outer query.  In my scenario, the sub-queries are run twice each.  They are a part of a Union select statement and the same query will appear on both sides of the Union.

[codesyntax lang=”sql” title=”One of My subqueries”]

left join (select SSN, avg(PBO) PBO, min(ConDate) ConDate
, cast(convert(varchar(8),min(RecordDate),112) as smalldatetime) RecordDate
from dbo.DOE_CONCampaign_Funded (nolock)
where ServicerID = 1234
group by SSN) f2 on t.SSN = f2.SSN and ss.SubmitDate < f2.ConDate

[/codesyntax]

In this example we see a few of the items that I listed.  Since I am running this on a SQL Server 2005 box, I decided to move each of those sub-queries to a CTE.  This change gave me an improvement in time of about three minutes. (Even as I test currently, I am seeing that the original query is now taking in excess of 23 minutes – ouch!)

Now on to that Union statement.  I modified the Union to be a Union All.  Without specifying the All, a Distinct is performed on the query being performed.  I needed to determine first if I truly needed to perform a Distinct operation, or if I could use the Union All statement without altering the result-set.   Testing both versions, I ended up with the same result set.  Based on this I am pretty confident that I can proceed with using Union ALL and producing another time savings of about 1 minute on this query.  This is good news.

Correlated Sub-Queries

Next up is the use of the correlated subqueries.  This query uses a correlated subquery on each side of the union statement.  The correlated sub-query looks like the following for the top half of the statement.

[codesyntax lang=”sql”]

[/codesyntax]

I recently had to add the last 5 campaigns to this list.  I see a couple of problems with this.  First off, it is cumbersome to edit code to fix these hard-coded values.  Some code somewhere will likely get missed.  The second pain point was the slowness this caused my query.  The resolution was to create a “Control” table to which these values would be added.  The code would be corrected in all locations and would use a Join instead of the correlated sub-query.  What does this buy me?  My performance now becomes very consistent and is less than 20 seconds without any further indexing.  One more thing to note here is that the IO Stats are a going to be different for multiple reasons at this point.  The original IO Stats above only show the first four campaigns listed in the correlated Subquery.  I am currently working with the entire set of Campaigns therein listed.

The replacement for the correlated sub-query is an Inner Join back to the control table.

[codesyntax lang=”sql”]

[/codesyntax]

This is the top half of the Union Select statement.  The bottom half is slightly different.  The table has an additional field to help designate type of campaign.  This means less code rework and easier maintainability on my part.  If I want to exclude a campaign from the query – remove it from the table.

Extraneous Join Conditions

In this query there is the use of datefields in the Outer Join clauses.  The dates used are comparisons only and needed to be tested for contribution to the query.  At this point the testing is simple – run the query as is and then rerun with each of those uses of date comparisons commented out  in the Join Clauses.

As-is query

[codesyntax lang=”sql”]

[/codesyntax]

Changed query

[codesyntax lang=”sql”]

[/codesyntax]

I was a little surprised to see that both forms of the query returned the same result set.  In light of that, the fact that there appears to be no visible performance gain by eliminating these Join Conditions is not astonishing to me.  Thus I can leave them or eliminate them – they don’t add or subtract from the query performance for my current purposes.  I decided to remove them anyway.  At this point, the IO Stats remain unchanged from the previous sampling.  I would now need to examine indexing to improve performance.

Conclusion

This exercise is yet another one of those opportunities to performance tune a process.  This one had been left alone because it “was working” and people wanted to leave well enough alone.  When processing takes that long, well enough is not good enough.  The systems need to be able to move on to other processing tasks and use the resources in a more efficient manner than to run the same process for an excess of 12 minutes (24 minutes today during testing to get data for this article).  This is only a piece of the puzzle to tuning this particular process.  This query was tuned, altered, put into a stored proc, and the nolock hints removed as some of the adjustments.  In the end, it returns the correct data set and does it in 1/72nd of the time it took earlier today and 1/36th the average time.  I would say that is substantial cost savings.

page 1 of 1








Calendar
August 2017
M T W T F S S
« Jun    
 123456
78910111213
14151617181920
21222324252627
28293031  
Content
SQLHelp

SQLHelp


Welcome , today is Thursday, August 24, 2017