DBA MD

Categories: News, Professional
Tags:
Comments: No Comments
Published on: February 24, 2010

I just had the opportunity to spend some time with my four year old son at the doctor’s office.  It was a painful experience, for my son as well as myself – but for different reasons.  He was playing on the monkey bars, lost his grip and fell.  A natural reaction when falling is to put out your hands to catch yourself (which in many cases is not the best idea but we do it anyway), which he did.  For him, his pain began there when he injured his wrist.  He wasn’t in writhing pain, and he was able to handle the pain without ibuprofen or Tylenol.  Since he was still complaining about it after an hour or so, we called the doctor to get an appointment.

We arrived just before our appointed time (I left work to help out expecting the visit to be reasonable since we had an appointment) and took our place in the empty waiting room after signing the register.  Forty minutes after our appointment time, I finally asked for an estimate.  My pain had already begun, and my son’s pain was becoming more evident through the wailing and whining.  Finally we were called back and they finally believed that he was in pain.  The family practitioner was nice and professional.  She seemed quite knowledgeable and seemed to care that he was in pain.  “Let’s get some x-rays takens,” she said.  She then proceeded to instruct us to go to the hospital to get some x-rays taken.  What?  No x-ray machine at the office.  Ok, so not all that unheard of, but I had become accustomed to being able to get x-rays at our previous doctors offices.

Down to the hospital we went.  We had to fill out all the paperwork again, sit and wait again, and pay another co-pay (that is ridiculous).  Once again the waiting room was empty and we had another forty minute wait.  My son did great while getting the x-rays, but his patience was long expired by this point (mine was too but I hid it better).  It took us all of two minutes to get the x-rays taken and be on our way back to the doctor’s office.

Once again we were waiting.  This time there was one person in the waiting area and we sat for another 30 minutes.  When we finally get called back again, we were in the room for a total of 10 minutes before we were headed home for the day.  We had an appointment at this time to see the orthopeadic specialist on the other side of town – today at 13H00.  And then we were told we would need to get a copy of the digital x-rays that had been taken (why we had to get a copy of the films when they were digital – who knows).  Thus another trip to the radiologist.

While waiting all this time, I pondered on what it would be like in the DBA world if we were like doctors.  Imagine using cursors and loops that cause long time delays, and being specialized in what we do.  The DBA department could have a stored procedure writer, an index creator, a schema creator, a performance tuning specialist, a report writer, a corruption specialist, etc.  In the medical profession it is somewhat necessary.  Is it necessary in the database world?

Envision the scenario where a business user comes to the department requesting that a report be written.  The report writer couldn’t do it by himself.  He would need to spec out the requirements, submit them to the proc writer.  The proc writer would write the stored procedure and then submit to the performance tuning specialist who would invoke the indexing specialist.  Then the proc could be given back to the report writer who would work on producing the report based on the proc.  If a problem were encountered, the report writer would iterate through the loop of the DBA department until the report was working as desired.  Then finally get it back to the end-user.  This is a bit how I felt with the doctor.  I felt like I was stuck in loop that was deadly slow and consumed far too many resources.

That kind of system doesn’t portray information sharing and teamwork to me.  There are plenty of cases where having people who do different functions within the database team is necessary and efficient.  In these types of systems there is knowledge sharing and cross-training in many cases.  A report writer may write a stored procedure and performance tune it in some cases – thus making it somewhat efficient.  I just hope that the DBA profession does not become hyper-specialized like the medical field such that huge delays and endless loops are forced upon the end-user.

Why All The Fuzz?

Categories: News, Professional
Comments: 4 Comments
Published on: February 22, 2010

I really must thank Steve for his editorial on FizzBuzz.  It seemed like a really good topic to do some testing and comparison.  Then the comments started rolling in on the topic.  This provided more information and ideas to use for this article.  As trivial as it may seem, the topic brings up some very pertinent information for us in the Data Bizz.  The primary objective of this article is to compare and contrast methods and hopefully provide evidence to use one method over another.  The underlying objective being that of performance tuning your code.

A FizzBuzz is a really trivial topic to dispute.  The objectives being to weed out those that can from those that can’t at the most basic level.  However, with the FizzBuzz, the interviewer has the opportunity to get more insight into the prospective employee.  Before delving into the technical aspects, ponder the non-technical aspects first.  This sort of test will help to determine if the candidate can quickly assess requirements, is willing to gather more information, organization skills (though very minimally), mettle under pressure, and (at a very high level) character.  Keep in mind that all of these things are only a piece of the puzzle.  There still remains the technical skill, and “team fit”.

This FizzBuzz requires that one write some code to be generate a list of numbers.  The list will have multiples of 3 replaced by Fizz, multiples of 5 replaced by Buzz, and multiples of both 3 and 5 to be replaced by FizzBuzz.  Those are the base requirements.  The unstated requirements are the requirements that top tier candidates will accommodate instinctively.  These requirements are performance, scalability, maintainability, and in a set-based fashion (since this is TSQL and SQL is optimized for Set-Based coding).  Sometimes this test is implemented where the requirements state to print out the results.  For my purposes, I will take that to simply mean display the results.  The methods would be acutely different.

I will explore a few different methods that may be used in achieving the results.  Some answers are better than others.  The first is a nice long example by Gus (posted in the comments for the SSC editorial).

[codesyntax lang=”tsql”]

[/codesyntax]

This was created in jest.  However, there are some important things to note in the code.  First, the use of a table variable is not necessary.  With that table variable there is an additional attribute that is unnecessary, even if one decided to use a temp table or table variable.  That may be a minor thing, but can be considered to be sloppy coding, inattention to detail, and poor use of resources.  Second, the use of a while loop to populate that table.  Third, is the use of a cursor to loop through the numbers in the table to do the comparison for the FizzBuzz test.  This script was also hard-coded with the record amounts to use.  To make it scalable and maintainable, one should parameterize those values.  These criticisms are nothing new to Gus.  It is important to note that he coded this query in this fashion intentionally, and he pointed out some of the criticisms himself.

Will this code work?  Yes it will.  It produces the following IO stats and a looping execution plan (1 plan for each iteration).

For 100 records, this takes roughly five seconds to run on my server.  100 records is a very small test and thus larger tests are needed to see about the performance and scalability factor.  Scaling up to one million records, this query was still running after six hours.

Next up is the following query:

[codesyntax lang=”tsql”]

[/codesyntax]

This version loops through the numbers and assigns the value to to be printed and then prints it.  See what happens when the counter hits 15?  We end up assigning a value and then overwriting that value.  This version also just prints the result rather than displaying the query results.  For 100 records the execution is not bad.  There are no IO stats and no Execution plan.  However, to run this for the one million records, takes forty-five seconds.  If I needed a larger result set, I would also need to be careful of the Int variable.  This one would also need to have a change in order to use a variable for the number of records to build.

Next is a pretty nice looking CTE version.  There are many renditions of this one.

[codesyntax lang=”tsql”]

[/codesyntax]

This query is far more optimal than the previous queries.  Though we use a CTE in this query, we are still using Procedural programming.  This is a recursive CTE.  We are also lacking in the maintainability and scalability of this code.  Of a lesser magnitude is the use of the option (maxrecursion 100).  Just a little trick that should be noted.  The recursive definition has a limiting where clause thus making the maxrecursion statement unnecessary for so few records.  Since this performs so well with 100 records, let’s proceed to the one million test.  Here is where we run into another trick that must be used.  MaxRecursion needs to be specified with a 0.  Doing this will permit the query to attain one million.  The query now takes 15 seconds to complete.  Here are the IO Stats and Execution plan.

Now that I have shared some procedural based methods to accomplish this “simple” task, let’s explore some of the potential set-based methods.  The first uses a windowing function to create our number set.

[codesyntax lang=”tsql”]

[/codesyntax]

First thing to note is the reduction in code here.  The solution is very simple and runs very rapidly.  By using a cross join against sys.columns were able to create a result set.  This cross join has limitations.  Not all sys.columns tables are created equally and thus we would have to add another cross join in order to test one million records.  Also, note that the modulo was changed in the FizzBuzz test to use a modulo 15.  That was also documented in the short note at the end of the line.  Thus the alterations for the query are as follows (to test one million records).

[codesyntax lang=”tsql”]

[/codesyntax]

This query will return in ~14 seconds.  It is a set-based solution.  It is not quite yet to that upper echelon though.  Some modifications need to be made to make it scalable and maintainable.  Also note that I am still employing the use of sys.columns.  There are better methods for that.  One such is to use master.sys.all_columns.  Another is to use a Numbers table.  Before going into the use of the numbers table, here are the IO stats and exec plan.

Now for the numbers table.  I will jump straight to the one million records testing.  The query is not substantially different than the previous query.

[codesyntax lang=”tsql”]

[/codesyntax]

The runtime for this query is about the same as the previous query.  IO Stats and Exec Plan show a slightly different story.  Still, note that this query is not quite the scalable or maintainable query I would like to see.

As you can see, Logical reads increased, and the execution plan is slightly better.  The cost of this new query is lower and makes it a little more desirable – despite the higher logical reads.

We are now at a point where we have progressed to where we can start playing a bit.  The idea now is to fine tune and tweak what I have to see if it can be made better.  The remainder of the queries and testing will be in the next article.  There is still plenty to cover. 🙂

I’m givin’ ‘er all she’s got!

Categories: News, Professional
Comments: No Comments
Published on: February 21, 2010

Or am I?

As I proceed down the path on a consolidation project, I have taken time to pause and ponder the approach.  One driving goal is to shrink our footprint.  Another goal is to improve performance and use of hardware and software resources.  I am looking to make sure the appropriate consolidations are made while leaving appropriate servers as-is if appropriate.

From time to time we have the opportunity to choose between distributed processing and a more centralized approach.  In my experience, this decision has come about as a means to improve performance.  Distributed processing is a viable option when done correctly.  I have seen this work in the past and I have seen it fail miserably.  I have had the opportunity of reversing these types of solutions via server consolidation on two occasions.

What are some considerations when deciding to employ a distributed model or even a consolidated model?  Denny Cherry has a good article about the considerations for moving to a consolidated model.  You can read that article here.  I think some of those very considerations should be used when electing to not consolidate.  Both sides of the equation must be evaluated.  If you can meet the criteria, then consolidate.  If not, then have some very good reasons to consolidate.  Conversely, if looking to distribute, there must be some very sound reasons as to why one would elect that option.

To illustrate why not to distribute, I will share the two experiences I mentioned.

Experience 1

Performance for the primary database server had become horrendously unacceptable.  In a server that hosted one million users or more a day, something had to be done.  Complaints were common and frequent from the client and noticeable from in-house processes.  The decision was made to distribute processing to three servers.  Two of the servers were “report” servers and the third was the primary transaction server.  Transaction replication was used to copy the data to the report servers.  The application was configured to use the various servers through various means.  Each of the servers had linked servers created to help process requests from the application.  It was common for queries to join tables across the linked servers to produce the results necessary.  After these changes, the complaints were just as common and frequent about the application timing out and performance being horrendous.

When I started, the first thing to do was to start tuning queries.  I had gotten the run-down on how bad things were and started implementing a plan to correct the problems.  All of the servers involved in this process maintained >= 70% processor utilization at all times.  If a process/query had done awry, then we would see spikes to 100%.  Each server was running SQL Server Enterprise Edition with databases in SQL 2000 compatibility mode.

Experience 2

As luck would have it, this was a very similar situation as the one just described.  This scenario is different in that the number of users was far fewer.  The number of servers involved though was considerably more.  Each server was running Enterprise Edition.  There was some mix of SQL 2000 and SQL 2005.  One major difference was the level of user access.  Some power users were known to have access to the databases with the ability to write their own queries.  On a regular basis they would grind the server to it’s knees.  In cases like this, new servers were frequently deployed to try and counteract this problem.

Another big difference between the two is the lack of replication in this experience.   At least a traditional replication method.  Databases were backed up and restored two different servers from the production databases on a daily basis.  The benefit here is a test of the backups on a daily basis.  This should be done anyway, but being implemented as a means to improve performance is not the wisest decision.  This proved to be quite costly.

Solution

In both cases, the solution was very much the same.  Performance tuning had been neglected and was tantamount to reduced costs.  What performance tuning was necessary in these cases?  In both cases, the lack of clustered indexes was rampant.  Both cases employed the use of nolock as a means of performance tuning their queries.  In each case where nolock was used, the query was analyzed and tuned to perform several times faster.  These are some pretty basic techniques to use in order to minimize cost and improve performance.  Throwing hardware at the situation is not always the best solution.

In the first scenario, when employing these simple techniques, we were able to reduce processor utilization down to 35% with peaks to 55%.  That was prior to removing the replication and linked servers and consolidating the servers into one server (thereby reducing cost).  After merging all three servers into one, we saw one server use 15% processor on a persistent basis with spikes to 35%.  This change speaks to the cross-server joins and replication overhead.  That was a massive improvement over the usual 100% processor utilization.

In the second scenario; simply combining the servers back to one, tuning the queries, and implementing controls on end-user queries direct to the server was all that was needed.  This amounted to a reduction of about ten servers.  Included in that reduction is the reduction in Enterprise Licenses.

Conclusion

So when should I not distribute the processing across servers?  Do not employ a distributed processing approach when performance tuning has not been done.  Do not employ a distributed processing approach when it is political in nature (i.e. end-users who write their own queries).  Employing a distributed method under either of these two scenarios is only driving up cost.  Furthermore, it becomes a management headache for the database team.  The team must maintain the processes involved to keep the distributed approach running.  This can also render a database team less efficient due to more work.

If it is appropriate and due diligence been done, then go ahead and employ a distributed processing methodology.  As long as the work has been done to get the databases finely tuned, keep the customer happy, and abide by policy – there is nothing wrong with it.

It is the wrong approach when no forethought,no  foresight, no planning,lack of  due-diligence, or general wasting of money has been employed.  Just throwing hardware at a problem is not a good solution.  These “wrong” methods are just a few ways to try and mask the real problem.  Masking the problem never solves it.  It takes time and troubleshooting patience to delve into performance problems.  When solved though, at no additional cost, it feels pretty good and looks pretty good for you.  This is how a DBA can say “I’m givin’ ‘er all she’s got!” without it just being a quote.

May I help you?

Categories: News, Professional
Comments: No Comments
Published on: February 17, 2010

With the Vancouver games underway, I have been reflecting on the 2002 Winter Olympic Games.  Those were the greatest winter games ever.  With or without the Figure Skating scandal.  Upon completion of those games, I set out to try and enlist in the Torino games.  The Vancouver games snuck up on me.  So why were they the greatest games?  Because I was a volunteer.

I had the privilege of being a French Linguist/Translator at the Medals Plaza.  Our tasks there varied from day to day.  But we got to see most of the medalists, talk to many of them, talk to dignitaries, and see some pretty good fireworks and concerts.  I thoroughly enjoyed it and would do it again and again if opportunity presented itself.  Being an Olympics Volunteer for games held in a country other than your home country can be quite pricey.  For those that are able to continue to volunteer internationally despite the cost – kudos.

Besides the Olympics, I have been pondering the act of “service.”  One can give service in many ways, and many of them for more subtle than the grandeur of the Olympics.  Some find service opportunities through religion.  Others find service through community involvement.  Still others may perform service through professional affiliations, big events, or because a judge told them to do it.

I think that giving service through community, church, or relief efforts pretty much explain themselves.  How can one give service through professional affiliations?  As a DBA, I am affiliated with PASS, SQLServerCentral (loosely), and my local Users Groups.  I don’t get paid by any of these affiliations except in the increased knowledge.  What can one do through these affiliations?

I think it first needs to be understood that none of these professional communities can function without member participation.  People throughout these communities give freely of time, talent and skills to help each other.  It’s not easy and most have other responsibilities to tend to daily.  I think I summed up how to participate quite nicely in my article about Users Group Participation.  I think the same kinds of principles apply to PASS and forum participation.  PASS is looking for volunteers currently for Summit, and I understand there are more opportunities later to volunteer at Summit to help that event run smoothly.  Forum participation can be performed at various levels.  One can “Troll” or one can actively engage in the forums for the benefit of the community.

The important thing to remember in all of these types of service, is that it is a voluntary thing.  If you give of yourself freely to help improve your community, professional community, religious community or some other community – your rewards will not be flashy but will be well worth it.

DBA.Sleep(8hrs)

Categories: News, Professional
Tags:
Comments: No Comments
Published on: February 16, 2010

I have been pondering recently what helps me to sleep at night.  Or, conversely, what prevents me from sleeping at night.  This is different than the calls in the middle of the night or the cell phone buzzing wildly on the nightstand from database or system generated alerts.  There are work related stresses and there are stresses unrelated to work that can contribute to how well one sleep’s.

For me, a few things keep me restless through the night.  Some of these might be an unsolved problem at work, flood of thoughts just as I am trying to wind down for the night, or even stewing over (contemplating)  a particular event of the day.  How do I resolve these issues and get past the sleeplessness?  I have to do something different for each one.  I will discuss in detail, in no particular order, these techniques that work for me (with relative success).

Solve the Problem

It sounds pretty simple and straightforward.  It doesn’t matter how complex the issue truly is, I have to reach some level of satisfaction and resolution so that I can rest.  I don’t like to leave things undone, unsolved, or at a less than desirable break point.  My wife says I am like a dog with a bone about things like that.  I like to be able to devote my attention to the moment.  If a project or task is left in an undesirable state, I end up being distracted by it’s incompleteness.

Solving the problem can be as painless as jotting down some notes.  It can be as painful as not sleeping for a couple of days.  Whatever it takes, get your mind at peace with your tasks.  This doesn’t necessarily mean work tasks.  This can apply to personal life tasks as well.

Drain the Brain

When flooded with a deluge of thoughts just as you lay down, (or worse, when you awaken in the middle of the night) – get them out of your head and down on paper.  By writing down as many thoughts as I can, I stop thinking about them and rest peacefully.  Just braindump out to paper or to the laptop or something.  It doesn’t matter what the thoughts are (usually it is a list of things that I need to give more attention to over the coming days), just get them written.  By writing out the thoughts, they are less likely to be forgotten.  You give yourself an opportunity to relax knowing that there is a plan of attack for each of those thoughts that kept you awake.

This is a technique that I have heard about using since I was in high school.  It is something I am told that writers tend to use.  I hadn’t realized how useful it was until just a few nights ago.  It is something that I have employed from time to time.  However, lately I have had a lot more work to do and a lot more projects that tie directly into my time.   This particular night, I was somewhat fatigued but very restless.  Pages of thoughts came flooding in.  I decided that I could either sit there and figure them out, or go and write them out as I have done in the past.  Once they were all written down – zonk.  Less than five minutes and I was out.

Act Honorably

I am throwing this one in because it is sometimes said “I wonder how he sleeps at night.”  This is usually in reference to something somebody has done that may seem questionable.  Did you say something offensive?  Did you really screwup at work?  Did you act with respect, honor, and dignity?  Was there something left unsaid or said that left you puzzled and needing answers?

When dealing with co-workers, forums, professionals, family and friends, one needs to act and speak with consistency.  By keeping the story straight and telling the truth in all scenarios, it helps to sleep at night.  If ever found to be at fault for something, owning up to it quickly helps to alleviate stress that may prevent sleep.  More than that though, is how you treat your fellow human beings.  Treating others how you would prefer to be treated is a great sleep-aid.  Apologize when necessary, give credit where credit is due, and treating others with kindness and respect go a long way to a good nights rest.

Work it Out

This one is simple too.  I think it ties in well with the others.  It is why I am writing this currently.  I was thinking about something I wanted to do.  That was to write a blog post due to some recent events that made me ponder the topic.  So, what am I doing – I am working out this blog post, writing down my thoughts and solving the problem.  This one can be equated differently for different scenarios.  For me, sometimes it simply means drawing up a more detailed plan/ outline based on the thoughts that have been written.  By devising a plan and organizing thoughts into something more meaningful, I am able to assure myself that I have done something positive about the situation.

Conclusion

So why did I want to cover this topic?  I have found myself working later into the evenings working on trying to accomplish goals as well as work projects.  And then after those late evenings, I sometimes find myself continuing to ponder some of the goals or projects.  The worst is dreaming about an unsolved problem at work.  I have also been thinking about professional conduct in an online world.  These are the kinds of things that can be avoided with a little preparation, forethought, and action.

Pirate Latitudes

Categories: Book Reviews
Tags:
Comments: No Comments
Published on: February 16, 2010

This is a book that was published posthumously.  Michael Crichton wrote this novel and left it “finished” in his attic.  I picked it up thinking it might be a good read.  Having read it now, I think I know why Crichton left it unpublished.

The book reads, to me, more like a directors narrative of a screenplay.  It is disjointed in how the events are described and inadequate attention seems to have been given to describing the scenes.  The action was little and far between.  It took twenty chapters for this book to start picking up pace and then it quickly lost it for another five chapters or so.

Crichton did seem to do a nice job of blending fantasy with what could have been historical events.  I don’t buy into the Kraken, but at least it gave a little action to a dragging spot in the book.  The book had half a dozen climaxes in the story that were supposed to build to the final climax.  Unfortunately, it dropped dead.

The book reads very predictably until the final act.  I must say I did not expect Hunter to be imprisoned – but could have figured on it.

I can’t really say much more about the book than it would be a better movie than it was a book.  I only finished reading the book due to having started it and wanted to have read the whole thing to give it a fair shake in the review.

Grade C-

In Vs. Inner Join

Categories: News, Professional
Comments: No Comments
Published on: February 15, 2010

This is tightly related to another of my forays into tuning some slowly/poorly performing processes.  This process came across my desk due to some reporting inaccuracies.  The report is derived from data in a downstream process that is based on the data provided from this top level process.  More precisely, this is an ETL job that pumps data into the warehouse for consumption by several other downstream ETL jobs and reports.  In order to effectively troubleshoot, I wanted to follow the data all the way back to the source.  I had to verify that all was working correctly from end to end in the process chain for this report.  I ended up tuning a script that was a critical piece of this ETL process.  (Oh and the data inconsistency was also corrected here and in the downstream report – there were a couple of consistency problems).

So, let’s start with examples of some of the problem areas with the script.  First, it took an average of 12 minutes to execute.  This slowness was consistent whether executed from the dts package or from SSMS.  Second, it was not a stored proc.  This particular script is used in more than one location.  If any changes need to be made to the logic, one must find all locations rather than edit the stored proc.  Third, it relied on sub-queries and correlated sub-queries.  Fourth, the use of date comparison operators in the Join statements.  These were additional Join conditions that I needed to test for relevance and impact on the result set.  That should be enough to discuss about this script.  Fifth, persistent use of the NOLOCK hint.  In this scenario, it was used as an optimization ploy.  There were other “problems” with the rest of the process related to this script, suffice it to say I corrected some of those issues too and am working to correct the rest.

I am going begin with the third problem stated.  The first issue will lend itself to being resolved by the end of this article as a part of the overall process.  The second issue will be the last item treated and should just be understood by the end that I put the query into a proc and altered the dts package to call the proc instead of running the script.

Sub-Queries

What is a sub-query?  A sub-query is a query that is found attached to a Join clause or in the Where clause.  It is, as the name implies, a query that is run as a part of an outer/parent query.  From msdn, we learn that:

“many queries can be evaluated by executing the subquery once and substituting the resulting value or values into the WHERE clause of the outer query.”

This doesn’t sound so bad.  The query could be executed once and then return the necessary results to the where or join conditions of the outer query.  In my scenario, the sub-queries are run twice each.  They are a part of a Union select statement and the same query will appear on both sides of the Union.

[codesyntax lang=”sql” title=”One of My subqueries”]

left join (select SSN, avg(PBO) PBO, min(ConDate) ConDate
, cast(convert(varchar(8),min(RecordDate),112) as smalldatetime) RecordDate
from dbo.DOE_CONCampaign_Funded (nolock)
where ServicerID = 1234
group by SSN) f2 on t.SSN = f2.SSN and ss.SubmitDate < f2.ConDate

[/codesyntax]

In this example we see a few of the items that I listed.  Since I am running this on a SQL Server 2005 box, I decided to move each of those sub-queries to a CTE.  This change gave me an improvement in time of about three minutes. (Even as I test currently, I am seeing that the original query is now taking in excess of 23 minutes – ouch!)

Now on to that Union statement.  I modified the Union to be a Union All.  Without specifying the All, a Distinct is performed on the query being performed.  I needed to determine first if I truly needed to perform a Distinct operation, or if I could use the Union All statement without altering the result-set.   Testing both versions, I ended up with the same result set.  Based on this I am pretty confident that I can proceed with using Union ALL and producing another time savings of about 1 minute on this query.  This is good news.

Correlated Sub-Queries

Next up is the use of the correlated subqueries.  This query uses a correlated subquery on each side of the union statement.  The correlated sub-query looks like the following for the top half of the statement.

[codesyntax lang=”sql”]

[/codesyntax]

I recently had to add the last 5 campaigns to this list.  I see a couple of problems with this.  First off, it is cumbersome to edit code to fix these hard-coded values.  Some code somewhere will likely get missed.  The second pain point was the slowness this caused my query.  The resolution was to create a “Control” table to which these values would be added.  The code would be corrected in all locations and would use a Join instead of the correlated sub-query.  What does this buy me?  My performance now becomes very consistent and is less than 20 seconds without any further indexing.  One more thing to note here is that the IO Stats are a going to be different for multiple reasons at this point.  The original IO Stats above only show the first four campaigns listed in the correlated Subquery.  I am currently working with the entire set of Campaigns therein listed.

The replacement for the correlated sub-query is an Inner Join back to the control table.

[codesyntax lang=”sql”]

[/codesyntax]

This is the top half of the Union Select statement.  The bottom half is slightly different.  The table has an additional field to help designate type of campaign.  This means less code rework and easier maintainability on my part.  If I want to exclude a campaign from the query – remove it from the table.

Extraneous Join Conditions

In this query there is the use of datefields in the Outer Join clauses.  The dates used are comparisons only and needed to be tested for contribution to the query.  At this point the testing is simple – run the query as is and then rerun with each of those uses of date comparisons commented out  in the Join Clauses.

As-is query

[codesyntax lang=”sql”]

[/codesyntax]

Changed query

[codesyntax lang=”sql”]

[/codesyntax]

I was a little surprised to see that both forms of the query returned the same result set.  In light of that, the fact that there appears to be no visible performance gain by eliminating these Join Conditions is not astonishing to me.  Thus I can leave them or eliminate them – they don’t add or subtract from the query performance for my current purposes.  I decided to remove them anyway.  At this point, the IO Stats remain unchanged from the previous sampling.  I would now need to examine indexing to improve performance.

Conclusion

This exercise is yet another one of those opportunities to performance tune a process.  This one had been left alone because it “was working” and people wanted to leave well enough alone.  When processing takes that long, well enough is not good enough.  The systems need to be able to move on to other processing tasks and use the resources in a more efficient manner than to run the same process for an excess of 12 minutes (24 minutes today during testing to get data for this article).  This is only a piece of the puzzle to tuning this particular process.  This query was tuned, altered, put into a stored proc, and the nolock hints removed as some of the adjustments.  In the end, it returns the correct data set and does it in 1/72nd of the time it took earlier today and 1/36th the average time.  I would say that is substantial cost savings.

PayPeriod II

Categories: News, Professional
Tags: ,
Comments: No Comments
Published on: February 13, 2010

I recently blogged about a solution I had decided to use in order to solve a problem related to PayPeriod Matching.  The result needed to meet a few requirements.  One of those requirements was to arrive at the results without the use of a Table.  I did not want to create a table that may need to be maintained down the road.  I also wanted to give myself a little more time to make sure the database being used in the warehouse was not involved in one of the ETL processes that actually restores a the database from a different system.  Another requirement was that the current payperiod create an aggregate as well as the prior pay period create an aggregate.  This information was to be consumed by Reporting Services for a report.  At the time, I now realize that, I had insufficient data for this to work as desired.  Thus, I needed to revisit the solution and make a couple of adjustments.

The first adjustment to be made was the creation of a table and the elimination of the CTE.  Though the CTE performed very rapidly in every single test I threw at it, it bogged down during our month-end processing.  That is another process that will be revised shortly and will not be so resource intensive nor will it be so time intensive.  Anyway, that is a topic for another discussion.  Present circumstances required an update to the proc that I created in order to make it perform better and regain, on a more long-term basis, the performance the CTE showed during testing and the first couple of weeks it lived in production prior to month-end processing.  I went ahead and created the table for the payperiods.

Not only did I create the table due to performance reasons, but it also simplified my query later in the proc.  The aggregates for the previous payperiod needed an easier way to be retrieved.  There was also an inaccuracy in my query.  With more data, I was able to spot it.  All previous payperiods were being lumped into the previous payperiod – though I only wanted the immediate previous payperiod and nothing more.  This caused the aggregates to be incorrect.  The solution could have still been achieved through the use of the CTE, however I wanted to simplify it a little and produce a faster result.

My final solution does not eliminate all CTE’s – merely the PayPeriod Table population CTE.  Now I use a CTE to retrieve the current payperiod and then recursively pull in the previous payperiod.  The table was created exactly like the CTE with an ID field, PeriodStart and PeriodEnd.  I decided the simplest method to ensure I would only aggregate on the two payperiods in question was to only pull those two payperiods into the query.  I wanted to be certain that I could only have two periods in play at any time.

[codesyntax lang=”sql”]

[/codesyntax]

With this method, you can see that I peform a top 2 operation in the base query from the PayPeriods table.  Without the recursive definition on this query, the base query will only return 1 record.  With the recursion, it will only return two records.  In addition to that change, I changed the Left Joins later in the query to the following:

[codesyntax lang=”sql”]

[/codesyntax]

And then one final change of note.  I changed the aggregation on the Previous PayPeriod to the following:

[codesyntax lang=”sql”]

[/codesyntax]

This was much simpler than what I was trying to use previously.  I also found a nice side effect of using the top clause in the base query of the CTE.  When using the top in a recursive query, it appears that 

[codesyntax lang=”sql”]

[/codesyntax]

is no longer necessary.  I tested this and retested to verify results.  Just another way of controlling a recursive CTE in SQL server.

I was happy with the first query that I came up to meet this requirement.  I am much more satisfied with this revision.  Query times are <= 1 sec and other performance indicators are positive.  Of course, using the table, I can now use indexes on the date ranges which should help query performance somewhat as well.

Conclusion

Despite meeting the requirements in the last article, and the query being pretty cool in performing what it did – sometimes it really is better to test other methods.  Even with the need to maintain this table (maybe), the consistent performance gains and accuracy outweigh the desire to not create that table.  It is a good idea to test multiple methods in some cases to ensure best path decision is made.  It was a good exercise to come back to this one and redo the query – I learned at least one new trick (really a few).  Learning something new made it worthwhile.

The Lightening Thief

Categories: Book Reviews
Comments: No Comments
Published on: February 12, 2010

I am putting this movie review in with the book reviews, though technically it is not a book.  I figure it works well there since I read the book and really liked the book.

A lot of artistic liberty was taken with the screen play for this book.  Anybody that has read the book should go into watching this movie with the understanding that it is an almost entirely different story.  Some characters have changed, major pieces of the story was changed, and many characters and events were entirely omitted.

The movie was roughly two hours long and did not feel like two hours.  Adjusting to the movie being a different story than the book had to occur very early in the movie – otherwise this could have been a long nitpicking affair.  I liked some of the changes, others were less acceptable.  Again though, it was a different story.  There is no “claiming” event where Poseidon claims Percy.  I felt that was an essential piece of the story.  The camp was a forest campground rather than a farm.  There was also no Dionysus and no Oracle (thus no official quest).  However, I think they made the story work.

The ending was substantially different than the book.  I really do not see how they would do a next movie in the story should they decide to do one.  if they make a movie to go along with book two, I would be very interested in seeing how they decided to do it.

Overall I would give the movie 3.5 stars if it must be rated.  The rating is due to the adjustment necessary to the different story and trying to get the book clear of present thought in order to enjoy the movie.  Oh, and Uma was horrendous.  Her best acting was after she was beheaded.  Would I watch it again?  Yes I will watch it again and I will likely buy the DVD / Blu-Ray when it comes out.  That may be swayed if it has no French language track though.

Relationships

Categories: News, Professional
Comments: 2 Comments
Published on: February 8, 2010

This month Rob Farley is hosting TSQL-Tuesday #3.  The topic is Relationships and he has left it wide open for interpretation so long as it can be tied to the theme and SQL Server.

At first my thought was to republish one of the articles in my series on Primary Keys and Foreign Keys (Part III has links to the first two).  I decided that would not quite fit the spirit of the event.  Thus a mere mention about the topics will suffice.

A few more thoughts rattled off in my head: work relationships, professional relationships, spouse relationships, and Parent Child relationships.  There are quite a few paths one could take with this topic.  I think discussing multiple relationships would work best for my purposes.

Let’s start with the Husband-Wife relationship.  My wife keeps me going in my profession.  Even though my being a DBA is a love/hate relationship for her.  She loves that I enjoy doing what I am doing.  She hates that it consumes so much time.  I must admit there are times that I hate it too.  Specifically when it dampens my plans to build my relationship with my wife.  One particular instance was our anniversary (2008).  Last minute urgent needs came up from a client that needed fixed.  My employer knew I had plans and that it was my anniversary.  Unfortunately, I was the only DBA since the other DBA had recently left and we had not filled his position yet.  It was an extremely difficult pill since it was not a production issue.  It was a new development issue that popped on my desk at 5 and took me well past 9 to resolve.

That brings us to work relationships.  Some of that is being covered by Sean and Jennifer McCown, thus I will not dwell too heavily on it.  Suffice it to say that one must learn the difference between loyalty and being walked on.  If you feel that your employer is taking advantage of you, maybe it is time to find somewhere better.  Sometimes it is best to do that sooner than later.  The worst thing that could happen is have your work turn into a hate-fest.  Don’t let your employer dampen your mood on being a SQL Server DBA.  Despite the anniversary incident, I still really like that employer and I really like the environment that I helped to build.  It was a very difficult decision to leave.  The decision was made in part for the opportunity to set off on my own.  I’m still working on that.

Setting off on my own and starting my own business is a difficult process.  This means more networking, more professional relationships, more self marketing, longer hours, and less time with family.  What helps here is that it is a goal to achieve and my wife supports it.  Supporting my children with an uncertain client base is a very difficult adventure to undertake.  Not seeing my children on a regular basis is also difficult to undertake.  Having that Parent-Child relationship not become a Foreign relationship and remain a Primary relationship is the ideal path for me.

And now we come back to how this relates to SQL server.  Databases are full of tables that should represent different data and characteristics.  We tie those into a governing table from time to time, and call it a Parent Table.  Life is much the same.  When you have children, they have their own unique characteristics.  Sure some of their make-up is similar to mine, but their characteristics and traits are their own.  My children have their own unique traits, characteristics and identities.  There is no doubt they are mine – but they are their own little beings.  That may be a little too loose on the tie between family and SQL – but it is valid.  I would also venture to say that, like a well tuned database, not all settings,rules or methods work the same for all databases – just as not all methods and rules work the same for the children.  It is a constant process.  One must continually improve their skills and methods to be as efficient as possible while maximizing the best results.

Happy Valentines (Relationships) Day

«page 1 of 2








Calendar
February 2010
M T W T F S S
« Jan   Mar »
1234567
891011121314
15161718192021
22232425262728
Content
SQLHelp

SQLHelp


Welcome , today is Tuesday, October 17, 2017