Late work night, business dinner, and thus really late to the gym to get some exercise.
33 Min. 13 Min @ high intensity HR=185BPM. 20 minutes @moderate intensity HR=160BPM
Late work night, business dinner, and thus really late to the gym to get some exercise.
33 Min. 13 Min @ high intensity HR=185BPM. 20 minutes @moderate intensity HR=160BPM
Finally back to the Gym.
32 minutes at a relatively high intensity (7min miles) / HR 174 BPM. And then lower intensity at 143 BPM.
Feels good once again.
I just found out that Michelle Ufford (@sqlfool) has updated her Index Defrag scripts. Right now she is looking for Beta testers. Anybody that is interested could find more info at:
In a previous post, I questioned how much Twitter could really do for me. After some comments made, and some other blog posts that I have read – I decided to take the dive.
Andy Warren and Glenn Berry, both blogged about their twitter experiences after my last blog on the subject. Both bring up some valid points that make sense to me. Others people have also made mention of some good reasons why to tweet. The biggest reason I can find is that it can help to build your brand. I am trying to start a blog (well continue it at this point), increase viewer/readership as well as create my personal brand. One comment was made that helped me to realize another aspect of twitter – the social network really integrates into blogs quite nicely. I heard another person tell me that he was going to spread the word on twitter about this blog – that is pretty cool. So if others will spread the word for me, why shouldn’t I as well use the same tools to help myself out some too!!
So the only thing left to do now is to mention that I have updated my blog to integrate with my twitter account, added an icon so you could click to follow me on twitter, and my twitter account is sqlrnnr.
In Part I, I discussed some of the peculiarities and troubleshooting done in relation to a peculiar execution plan. In this post, I will discuss some of what I did to further tune the query.
First, a little more background into the design of this query. This query uses some dynamic SQL to try and build a dynamic where clause based on the value passed to a single parameter. I need to be able to create three different where clauses that could essentially create a range-scan on a date field. In two of the three queries could be an inequality date comparison rather than an equality comparison. The specific requirements are to have the report return 1 of the following: 1) 2010 data only, 2) < 2010 data, and 3) all data. This was initially accomplished with a query similar to:
Case when convert(varchar(4), @RecordDate, 112) = '2010' then ' convert(varchar(4), RecordDate, 112) = ''2010''' when convert(varchar(4), @RecordDate, 112) < '2010' then ' isnull(RecordDate,''1/1/2000'') < ''1/1/2010''' else ' isnull(RecordDate,''1/1/2001'') > ''1/1/2000''' End
One of the first comments might be, “That query is not SARG-able.” True it is not a SARG query. That will be the last item to be evaluated. The slowness of the query in this case has less to do with the SARG-ability of it than it does with the inequality scan. The puzzle in the query was “How to make the query an equality comparison” rather than an < or > comparison. The above query uses the indexes as discussed in the prior post – it is just not as fast as it could be.
It is at this point that I am reminded of an integral part of my last two presentations at my local PASS meetings. And it dawns on me that an understanding of the data is truly necessary. Most of the RecordDate values are Null since there was no backfill on the Column after it’s implementation. Prior to it’s implementation there was no Date control on the table. The table is in a warehouse and only contains data since the beginning of 2009. It is also necessary to understand that there is a minimal chance that there will be record additions to the table after 2010. Thus we now have a method to be able to get the equality comparisons. The following might satisfy this need:
Case when convert(varchar(4), @RecordDate, 112) = '2010' then ' convert(varchar(4), RecordDate, 112) = ''2010''' when convert(varchar(4), @RecordDate, 112) < '2010' then ' dateadd(yy, datediff(yy, 0, isnull(RecordDate,''1/1/2009'')), 0) = ''1/1/2009'' ' else ' isnull(RecordDate,''1/1/2009'') = isnull(RecordDate,''1/1/2009'') ' End
This change is now using a Clustered Index seek. This clustered Index seek is the most costly piece of the Execution plan. However, the overall runtime of the query has improved by 1 sec.
Thinking I could still improve the query, I decided that a new column called ReportYear would help out the cause. The change would allow for the query to be SARG-able, the column would have a default on it to change RecordDate to the first day of the year recorded in the RecordDate field. I figured this idea would also still permit me to do an Eqaulity search in my Where clause. Let’s see the results:
Wow, I have just come full circle back to the First Post in this series. I just lost the clustered Index seek, but I did not lose any of the performance. To see if I gained anything by making the query SARG-able, let’s look at the IO stats between the two.
Top image represents the First Query and the bottom image represents the Second Query. We have an improvement of about 600,000 logical reads. That is a substantial improvement. That improvement comes even with an Index Scan on an Index that is non-covering. Now, to tune the index. Much the same as the first article, I added an index to the SalesOpportunity table on LastUpdate and SalesOppID. This creates a covering index and removes the hash match join in favor of the merge join in the execution plan. Another effect is the IO Stats as displayed below.
We can see here that there was once again an improvement made in the logical reads. Another ~10,000 logical reads improvement. Overall, both queries run in the same amount of time – but the second query uses less resources. Again, if I change the clustered index on this table – I am certain I would see an improvement.
I am still in search of the reason behind the use of a bad index for this query.
Execution Plans can be downloaded here.
I ran into a peculiarly puzzling situation with a query I have been tuning of late. The query runs some aggregations and performs a range scan based on a date value passed to the query. The primary target for this query is to be consumed by reporting services. The query executes (from a stored proc) three times in a single report. This is another case were the report currently outperforms the previous version by several fold. I just want it to run a little faster – it is an executive report afterall.
While tuning the query itself, I removed some of the aggregations (not consumed by the report and not needed for accuracy), and that bought me a small performance gain. Next was to remove a couple of “or” operations. Again, that only bought me a small performance gain.
The next step was to check my indexing. Looking at the maintenance plan for the query, indexes were being used – or so it seemed. Based on the execution plan, the query was performing two Index Scans prior to a Hash Match. In the first Index Scan, the query was expecting an Output of two columns (SalesOppID and LastUpdate) from IDX_SalesOpportunity_SalesStatusIDSalesContactID. This is depicted in the next two images.
This makes a little sense since the index does contain the LastUpdate column. But the index is not covering, and we do not have a Key lookup.
CREATE NONCLUSTERED INDEX [IDX_SalesOpportunity_SalesStatusIDSalesContactID] ON [dbo].[SalesOpportunity] ( [SalesStatusID] ASC, [SalesContactID] ASC, [ProspectID] ASC, [CampaignID] ASC, [LastUpdate] ASC ) ON [PRIMARY]
Out of curiosity, I decided to see what would happen if I commented out the lines that required the LastUpdate field. I ended up with the following:
Interesting, the query optimizer decided that we should use IDX_LeadTypeID which has the following definition:
CREATE NONCLUSTERED INDEX [IDX_LeadTypeID] ON [dbo].[SalesOpportunity] ( [LeadTypeID] ASC ) ON [PRIMARY]
Once again, I have a query that is using a bad index. There is no key lookup and the index is not covering. I find this very peculiar. Thus I proceeded to test a little more. I dropped that Index and reran the query. With each group of tests, it is important to note, I cleared the ProcCache. After dropping the index, my execution plan showed no performance increase and still showed a funny result. The query moved onto a new index that was not covering and there was no key lookup.
Now, let’s make this a little more interesting. I do have a Clustered Index on SalesOppID – defined as follows:
ALTER TABLE [dbo].[SalesOpportunity] ADD CONSTRAINT [PK_SalesOpportunity] PRIMARY KEY CLUSTERED ( [SalesOppID] ASC )WITH ( PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF , ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90 ) ON [PRIMARY]
I am puzzled as to why my query will not use the Clustered Index. I decided to do the standard stuff – DBCC CheckDB, DBCC UpdateUsage etc. There is no corruption, and the query still tries to perform the same way. I could recreate the clustered index on the LastUpdate field. Since there are Foreign Key relationships that I do not want to drop and recreate, I will just create an NC index on the appropriate two fields. After doing that, I get the following:
The index is covering and pulling the correct columns. With this index the query is about 3 times faster than previously. Some of that is due to the change from a Hash Match to a Merge Join from the two tables (there is still a hash match but it is after the merge and is now less costly).
The moral is, even though an index may be used by the query – it may not be a good index choice. I am still working on tuning this query, I think I can squeeze better performance out of it.
Execution Plans can be downloaded here.
Recently I was asked to alter a report to pull different more meaningful data. The particular report was pulling data from the datawarehouse. It had been recently modified to use a stored procedure (and subsequently improved performance 10 fold). Due to this change, the report started showing differently. The report was now paginating on the different matrices. Due to this minor display change, the creative wheels started turning and new requirements started developing from the real needs of the users of the report.
Through the quick review process, the requirements evolved from first displaying the current week and previous week data to being able to display data grouped by pay period. If possible, then the goal would be to correlate the data to specific pay periods. If the process was going to be too complex, or take too long – then sorting by weeks would be adequate. Correlating the data to current and previous weeks would be rather simple. The data was already present. The code was also conducive to making this correlation. So as a milestone, this task was completed first – as quickly as possible. Doing this also made it possible to more quickly jump onto the more puzzling requirement – which was more of a desirable, yet unnecessary requirement requested by the business.
Based on this, I came up with some enhanced requirements that would better define this request.
When reviewing these requirements, I was puzzled as to how to accomplish the task. I did not want physical structures that required continued maintenance and pay period updates. I would do that if it was absolutely necessary. I was hoping to achieve something that required little-to-no maintenance, was fast, accurate, and provided the end-user the desired results.
Thinking about it for a bit, I came across a few ideas but each evaporated when I found a flaw with it. It soon dawned on me a quick way to do it. I could use a recursive CTE, one known good pay period start date, and then some date logic. Date logic by itself did not seem useful enough for me since the pay periods were for specific ranges.
Thus I came up with the following CTE:
[code lang="sql" smarttabs="true"]
With periodstarts (StartID, StartDate,EndDate)As (
Select 1 As StartID,@StartDate,dateadd(wk, datediff(wk, 0, @StartDate) + 2, 0) -1
Select StartID + 1,DATEADD(wk, DATEDIFF(wk, 0, StartDate) + 2, 0) As StartDate,dateadd(wk, datediff(wk, 0, EndDate) + 2, 0)-1 as EndDate
Where DATEADD(wk, DATEDIFF(wk, 0, StartDate) + 2, 0) > StartDate
And Startid < 105
This CTE gives me the ability to create 4 years worth of pay periods on the fly. There are likely other ways of accomplishing the same task, this one suits me very well. To build the table, I start with a known valid pay period start date. From there, I can create the start and end dates of each of the pay periods over the next four years. To calculate the dates for the start and end for each period beyond the initial seed date, I used a method shown by Lynn Pettis in one of his Blog posts. In my scenario, each pay period is two weeks. To calculate the ending day of the pay period, I just subtract 1 day from the Result of adding two weeks to the start date. Verifying the data, I can see that I have start and end dates that correlate correctly to the pay periods.
The next step was to integrate the above into the query, and thus be able to correctly assign data to either the previous pay period or the current pay period. I was able to accomplish this through two left joins. This was the tricky part. I initially only created the CTE to have Start Dates and no end dates. This proved to be more difficult than I desired. The Table Joins started getting a little too complex and convoluted for what I had envisioned. I decided it would be much simpler to also include the EndDate in the CTE, thus drastically improving readability and ease of design for the query. That tricky part was now overcome to a degree, and I was able to associate some of the records. However, I was getting stumped on the Previous PayPeriod records. After trying a few things, I realized how easy the fix was to retrieve those records. A simple change to use Isnull in the Second Left join resolved this issue.
So now, my Join code is something like this:
INNER JOIN clc.dbo.employees e ON e.ntloginname = MIS.NTLoginName AND e.lc_flag = 1 AND e.activeflag = 1 AND e.AdminRights = 0 LEFT OUTER JOIN PeriodStarts PS ON MIS.RecordDate BETWEEN PS.StartDate AND PS.EndDate AND @CurrentDate BETWEEN PS.StartDate AND PS.EndDate LEFT OUTER JOIN PeriodStarts Prev ON MIS.RecordDate BETWEEN Prev.StartDate AND Prev.EndDate AND IsNull(PS.StartDate,0) = 0
Since the only tying factor between my data is a recorddate and the payperiod range, I needed to be able to compare the recorddate to the startdate and enddate range. This works better than I had expected. As was expected, I would incur some cost to create the “PayDay” table on the fly as in the CTE. I also take a hit for the date comparisons, since I can only compare on a Range and not do an actual equality. The query is executing across two databases (1 is SQL 2005 and the other is SQL 2ooo) and returns in about 200ms, without any index tuning.
If I tune indexes in one of the tables (93% of total cost to the query comes from this table), I expect to see some improvement. Since the table only has a clustered Index, I started by creating an Index on NTLoginName, RecordDate, Product, LVCRequested and ProspectID. I know, I didn’t divulge the entire query, so some of this is coming out of the blue. However, those fields were in the SQL 2000 database and were required outputs for this query. By adding a new NC Index, I was able to reduce the Clustered Index Scan to an Index Seek. For this part of the query, it reduced overall cost from 93% to 33%. Logical reads on the table reduced from about 5000 to 76 – another substantial savings. Total execution time is down to about 140ms.
All in all, this is a good solution for the requirements at hand. Revisiting the self-defined requirements:
The only requirement that may be questionable is #3. I do need to pass a date into the proc to make this whole thing work. That date must be a known good payperiod start date. However, I have also set a default so that the date will populate to one that I know is good. #2 is a achieved since I did not create a permanent physical lookup table. The nice takeaway from this exercise has been the improvement in the query once again. Though the query is doing something a bit more difficult than previously, performance is better. It was also a nice exercise in thinking outside the box.
Here is the full execution plan of the revised query.
In the spirit of this section (Running), I will be posting my exercise along with any other Running related material I deem appropriate.
Today I made it to the gym during lunch. It was a short session – but well worth it.
32 minutes Treadmill and Elliptical @ HR of 172 (about)
I feel much better having done it.
In a day and age when information abounds and social networking is the only way to communicate (a bit of an exaggeration, I know), it seems to be a rarity when one does not conform. I just happen to be one of those rarities.
I don’t find the need currently to tweet. I don’t know how I would find any more time in the day to add one more thing to do. Particularly when tweeting seems to be an activity that would be time intensive for any real purpose I would deem reasonable to make tweeting necessary. I tried the facebook thing for a bit, and found it to be just a gossip network. I do participate in LinkedIn. LinkedIn works nicely for me because it is a professional network. Tweeting seems like it could go either way – gossip or professional. Twitter does seem a bit more difficult to integrate into the work environment than LinkedIn and could prove to be a big distraction.
Tweeting could be a useful activity, and may enter into my tool-belt sometime in the future. That won’t be until it seems to be more conducive to my current activities. Having recently started my blog, I am finding that there is a substantial time requirement to get it up and running, tweaking it, and then writing the posts. Some of these posts are quick and easy, while others take more time and preparation. Just the other day, my wife (Krista) called this blog my “other child.” This was due me splitting my free time between blog and children.
Also recently, Krista commented on my priorities. She said that I didn’t even have myself listed on my priorities. If I don’t have time for myself in my own set of priorities, how could I possibly add another time detractor to my list of priorities? She also said that my priorities seem like: 1.) Work, 2.) Work, 3.) Work – but then she said she was joking. And she was. That doesn’t make it any easier to swallow. I try to keep things well rounded between work, family (wife and children), church, community, and service. The work, work, work does have some validity to it – since I am also trying to get my personal business going alongside full-time employment.
I imagine someday that only one of those two jobs will exist and there will be more time for tweeting. I do think though, that it would be used mostly as a professional tool – even with the limitations that tweeting does exhibit.
I am curious to know how other DBA’s who tweet manage it? How do you find tweeting to integrate professionally?
Ok, so it’s not until November – but I am leaving on a Jet Plane nonetheless. It will have been four years since the last time I attended PASS Summit. This year, my ticket is already punched.
Missing this Conference each of the last few years has stung quite a bit. I really enjoyed the Summit I was able to attend, and I learned a lot. Getting to Summit 2010 has not been an easy task. It took a lot of persuasion, negotiating, and begging (sort of).
Attending PASS Summit is an expense worth the effort to attend, and the effort to get the employer to help foot the bill. Not only will the attendee benefit from this conference, but the employer stands to gain substantially from it as well. The 2009 Summit published an ROI chart to help explain some of the benefits (2010 ROI is here).
There are quite a few people I look forward to meeting (Steve Jones, Grant Fritchey, Gail Shaw, Jeff Moden, Paul Randall, Bob Hovious and other SSCites to name a few). I hope that they will be in attendance. There is a great deal that can be learned from these people and this conference.
Between now and then, I will be watching for the schedule and speaker lineup. I want to get a feel for the topics that will be presented and have my seminar preferences nailed down well in advance.
After the Summit, I hope to be able to share enough knowledge at work to prove the ROI. I also intend to take what I will learn back to my local PASS User Group, so they would be able to benefit from it as well.
I hope to see you there (at PASS Summit 2010).