Tags: TSQL Tuesday

TSQL Tuesday #26 or #23 – Identity Crisis

Comments: 1 Comment
Published on: January 10, 2012

The first opportunity of this New Year to participate in TSQLTuesday, we have been invited by David Howard (blog) to take a second shot at a previous TSQLTuesday.

This second shot is giving me fits.  I have no clue if it is TSQLTuesday 26 or if it is TSQLTuesday 23.  Is it some sort of amoeba of both TSQLTuesdays combined?

While I try to figure that out, how about we look at some pictures of what the New Year might have looked like as it was rung in.  Let’s begin with Paris.

Oooh…Aaaaahhh…Those are quite impressive.

Next up on our tour is where a twin resides for a famous lady.  How did they ring in the New Year in New York City?

 

 

 

And now, let’s DBCC Timewarp to the other side of the world.  Here is what you might have seen if you were in Sydney Australia.

Ahhh.  Yes, that did the trick.  This little diversion sure gave me enough time to think about which TSQLTuesday this is.  Et merci a Stuart Ainsworth (Blog | Twitter).  Le Sujet qu’il a propose est celui laquelle dont je vais parler ce mois.  Dans le TSQLTuesday qu’il a organise, il nous a invite de parler a propos des JOINS.

J’ai completement rate cet occasion de parler a propos de “Joins” en participantes dans TSQLTuesday.  Voyez, TSQLTuesday 23 etait tenu pendant le premier semain au lieu de deuxieme semain ce fois ci.  Et, maintenant, je vais terminer cet article en Anglais.  I was going to write the whole thing in French, but will save that for another time.  I should have a second chance to do that someday.

This second chance, gives me the opportunity to finally talk about a topic that has been on my to-blog list for quite some time.  I hope this post will show some different ways of joining in TSQL.  They are certainly methods I had never considered – until it was required.

Business Requirement

I have some tables that I need query.  One of the tables has lookup information with a bitmask applied to the id field.  Another table references this table but the ids can be a direct match or an indirect match to the id of the lookup table.  For this case, they will always only differ by the value of 1 if the two values do not directly match.  No other match possibility is considered for this example.

Based on this need, the solution dictates some sort of bitwise math.  I have several examples of how a join could be written to accomplish this primary objective.  I am only going to show the possible ways of performing this.  In my environment these all yield the same results and the data is unique and large enough (4.2 million records).  I will compare performance of these different queries in a later post as I demonstrate a query tuning method to drive the query down from nine seconds to 100ms or less.  For the record, I would choose any of queries 5, 6, or 7 for this particular need based on plan cost and performance.

The JOINS

First up is the ANSI style INNER Join using addition in one of the conditions as well as an OR to fulfill the second portion of the business requirement.

PRINT 'Query 1 -- Join with Or and source+1'
------
SELECT TOP 1000
		PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
		,US.Source,Us.SourceID, PPV.SourceID
	FROM	RumorView PPV
		INNER Join SourceType US
			ON (PPV.SourceID = US.SourceID
				Or PPV.SourceID = US.SourceID+1)

This is probably the easiest to understand and it performs well enough.  Until running into this business requirement, I hadn’t considered putting an OR in the JOIN conditions.  But it makes sense considering that an AND can be used there.

Next is a NON-ANSI style of JOIN.

PRINT 'Query 2 -- Non-Ansi Join with Or and source+1'
------
SELECT TOP 1000
		PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
		,US.Source,Us.SourceID, PPV.SourceID
	FROM	RumorView PPV, SourceType US
	WHERE (PPV.SourceID = US.SourceID
				Or PPV.SourceID = US.SourceID+1)

Through 2008 R2, this works just as well as the ANSI JOIN already shown.  I haven’t tested in SQL 2012 but I do know that the NON-ANSI syntax of *= (for example) no longer works.  I am not a big fan of this style JOIN because it is far too easy to end up with a Cartesian product.

Another type of JOIN that I like is the use of APPLY.

PRINT 'Query 3 -- Cross Apply with Or and source+1'
------
SELECT TOP 1000
		PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
		,US.Source,Us.SourceID, PPV.SourceID
	FROM	RumorView PPV
		Cross Apply SourceType US
	WHERE (PPV.SourceID = US.SourceID
				Or PPV.SourceID = US.SourceID+1)

This particular code segment is the equivalent of the first query shown.  This is the last in the set of using basic math and an OR in the JOIN conditions.  The remaining queries all rely on bitwise operations to perform the JOIN.  Again, until this particular need, I had never even considered using a bitwise operation in a JOIN.  First in this series is the NON-ANSI style JOIN.

PRINT 'Query 4 -- Non-Ansi with COALESCE and Bit compare'
------
SELECT TOP 1000
		PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
		,US.Source,Us.SourceID, PPV.SourceID
	FROM	RumorView PPV, SourceType US
	WHERE (PPV.SourceID|1 = COALESCE(US.SourceID|1,Us.SourceID))

The big change here is in the where clause.  Notice the use of COALESCE and the first comparison value in that COALESCE.  This is called a BITWISE OR.  From MSDN: “The bits in the result are set to 1 if either or both bits (for the current bit being resolved) in the input expressions have a value of 1; if neither bit in the input expressions is 1, the bit in the result is set to 0.”

So I am comparing the bit values of 1 and the SourceID.  The SourceID from RumorView will create a match meeting the requirements put forth thanks in large part to the BIT OR operation being performed on both sides of the equality in the WHERE clause.  It is also worth mentioning that the COALESCE is completely unnecessary in this query but it I am leaving it as a pseudo reference point for the performance tuning article that will be based on these same queries.

Next on tap is the CROSS Apply version.

PRINT 'Query 5 -- Cross with Bit compare'
------
SELECT TOP 1000
		PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
		,US.Source,Us.SourceID, PPV.SourceID
	FROM	RumorView PPV
		Cross Apply SourceType US
	WHERE (PPV.SourceID|1 = US.SourceID|1)

And the last two queries that the optimizer equate to the same query.

PRINT 'Query 6 -- Join with Bit compare'
------
SELECT TOP 1000
		PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
		,US.Source,Us.SourceID, PPV.SourceID
	FROM	RumorView PPV
		INNER Join SourceType US
			ON (PPV.SourceID|1 = US.SourceID|1)
------
PRINT 'Query 7 -- Join with ISNULL and Bit compare'
------
SELECT TOP 1000
		PPV.RumorID,PPV.PersonRumorID,PPV.PersonID
		,US.Source,Us.SourceID, PPV.SourceID
	FROM	RumorView PPV
		INNER Join SourceType US
			ON (PPV.SourceID|1 = ISNULL(US.SourceID|1,Us.SourceID))

The query optimizer in this case is smart and eliminates the ISNULL.  These two queries use the same exact plan, have the same cost and the same execution statistics.  The version with COALESCE is considered more expensive and takes longer to run than these queries.  It is also important to note that the Cross Apply Join also produces the exact same plan as these two queries.

Conclusion

So there you have it.  Many different ways to write the JOIN for this little query.  Performance and results may vary.  It is good to have a few different ways of writing this particular JOIN.  During my testing, it was evident that various methods performed better under different circumstances (such as how the indexes were configured – which will be discussed in the follow-up article).

Notes: Names have been concealed to protect the innocent ;) .  Also, the fireworks images are all links to external sites.  I have no affiliation with those sites…disclaimer yada yada yada…I am not responsible for content on those sites but they can have the credit for the images.

T-SQL Tuesday #025 – Holiday Gifts

Comments: No Comments
Published on: December 13, 2011

Tis the Season

It is the season for TSQL Tuesday.  More importantly it is the season for giving and reflection.  And whether you celebrate Christmas or Chanukkah or Kawanzaa or TSQLTuesday because you believe or simply because of tradition, it is a good time to reflect and help somebody else.

Amidst the hustle and bustle, Allen White (Blog|Twitter) is hosting at least one party during this season.  And with all of our hustle and bustle, many of us will be attending at least one party this year (as evidenced by this post).

Allen’s party theme is an invitation to “Share Your Tricks.”  Well, in the spirit of the Holidays, I want to share some tricks and tips.  They can be my gifts to you during the holidays.  And maybe they can help you give to somebody else.

Tricks

ObjectProperty()

I find myself using this frequently in queries.  A popular use for myself is to use it to find the value of ‘IsMSShipped’.  But that is only one possible use for this function, there are many many more.  Check out MSDN to see more power!!

Here is a relatively meager example.

SELECT OBJECT_NAME(OBJECT_ID) AS TabName,create_date,type,type_desc
	FROM sys.objects
	WHERE OBJECTPROPERTY(OBJECT_ID,'isMSShipped') = 0
		AND OBJECTPROPERTY(OBJECT_ID,'isTable') = 1

ServerProperty()

I like to use this function regularly as well.  Sometimes it is handy to break out information about the server into a columnar set.  I can find useful information from Service Pack level to whether the Instance is Clustered or not.  Here is a quick sample.

SELECT 	SERVERPROPERTY('ProductVersion') AS ProductVersion
		,SERVERPROPERTY('ProductLevel') AS SPLevel
		,SERVERPROPERTY('Edition') AS Edition
		,SERVERPROPERTY('ResourceVersion') AS ResourceDBVersion
		,SERVERPROPERTY('InstanceName') AS InstanceName
		,SERVERPROPERTY('IsClustered') AS Clustering
		,SERVERPROPERTY('LicenseType') AS LicenseType
		,SERVERPROPERTY('NumLicenses') AS NumLicenses
		,SERVERPROPERTY('Collation') AS CollationLevel

Primes

Now for a little bit of fun stuff.  It was suggested on Twitter last night that I show a TSQL solution for generating prime numbers.  Thanks to Adam Mikolaj (Twitter) for this suggestion.  I am not going to explain it other than to say that this will help generate the prime numbers between 1 and 1000.

WITH Units ( nbr ) AS (
	    SELECT Number AS nbr
			FROM (VALUES (0),(1),(2), (3), (4), (5), (6), (7),(8), (9) )AS X(number))
, nums (Number) AS (
	SELECT u3.nbr * 100 + u2.nbr * 10 + u1.nbr + 1
      FROM Units u1, Units u2, Units u3
     WHERE u3.nbr * 100 + u2.nbr * 10 + u1.nbr + 1 <= 1000)
 
SELECT 1 AS Primes
	UNION All
SELECT 2
	UNION All
SELECT n.Number AS Number
	FROM nums c,nums n
	WHERE c.Number < n.Number
        AND c.Number <> n.Number
        And c.Number between 2 and 35
    GROUP BY n.Number
    HAVING MIN(n.Number % c.Number) > 0
    ORDER BY 1

Tips

No Changes

Don’t plan any rollouts during the holiday season.  Try to have a production freeze implemented.  For many companies this is a busy season.  To further the point, many employees like to take a vacation during this time period.  With a lighter staff, key personal may be out of touch should an emergency occur due to a rollout.  So minimize your stress and minimize the chances of an emergency and don’t do any changes (excepting emergency fixes) during this time of year.  It is a good time to catch up on other items on your to-do list.

Take a Time-Out

Don’t be too busy for the important things in life.  Some of these things include self and family.  Don’t get yourself going too fast for too long that you miss out on the good stuff.  Don’t be soooo busy that you have no time for yourself to relax.  Recharge your batteries by taking a time-out.  Yes!  Give yourself a timeout.  Spend time with your spouse and kids.  If you don’t have a spouse or kids, then spend some time with a friend or #sqlfriend or #sqlfamily.

Help somebody in need

This is closely intertwined with the previous topic.  Give something of yourself to help somebody else.  A little joke, a little smile, a little service will go a long way (just as much for you as the other person).  If it means donating a toy, some food, some cash – do it.

T-SQL Tuesday #024: Prox ‘n’ Funx

Comments: No Comments
Published on: November 8, 2011

I find myself just now sitting down to write this as the deadline for TSQL Tuesday 24 fast approaches.  Fortunately I started planning out what I wanted to cover several days ago.

I could have tried writing this last night – but I ended up falling asleep at the keyboard.  Drowsy typing can be a very dangerous thing.  So it is a good thing I didn’t start then.

Alas, I digress.  This month, a TSQL guru (BRAD SCHULZ), is at the helm.  He has challenged us to write about procs and functions.  We get decide which direction we take it.

I thought about this and figured there could be a few functions and paths to take on this.  So let’s get started and see where this bird-walk can take us.

Function for which I am thankful

I decided that this is a topic that must be discussed.  It is the month of Thanksgiving afterall.

For this topic, I want to give a shout out to Jeff Moden for a function that he wrote about and published.  This function has proved useful to solve some performance issues.

In a recent case where this function helped, I was a bit reluctant because it just didn’t seem possible that the function already in place could have that big of an impact.  Guess what, it did.  The function that Jeff did was his string splitter.  The strings in question were short and each value was not very big.  Despite that, the string splitter was chewing up resources.

After implementing the new string splitter, we saw immediate performance gains.  Using this function has paid off big dividends and has saved me a grundle of time.  With time being more valuable as time passes, I am very thankful to Jeff for the work he did on his function.

Function I can’t live without

That might be overstating it a bit.  The reality is, there are some functions that I use far more in my scripting and troubleshooting than others.  One function, above all others, seems to come into use more frequently.

This function is a dynamic management object and is called dm_db_index_physical_stats.  There are several parameters to this function and there are several uses for it as I have seen and demonstrated on my blog.

To learn more on the parameters, read this.  As MSDN states, this function “Returns size and fragmentation information for the data and indexes of the specified table or view.”  So you can ascertain quickly that the most likely use of this index is in index fragmentation analysis.

I have written about this function for index analysis, table size analysis, BLOB index analysis, and finding ghost records.  I even referenced it when doing a table compression analysis.

You can check out all of those articles here.

There is just so much good info that can be gleamed from this function and so many ways to use it.  That is why I find myself coming back to it time and time again.

Funcs for Fun

There are many functions that are useful.  Some are more interesting than others.  Some, I use because I find it fun delving into the internals of SQL Server.  If the function can provide useful information and I learn something along the way, then it is pretty fun.

Here are some of those functions that I have enjoyed using.

sys.fn_physloccracker

sys.fn_virtualfilestats

sys.dm_exec_query_plan

sys.dm_exec_sql_text

sys.dm_io_virtual_file_stats

You can find good information on most of these in MSDN.  For sys.fn_physloccracker, you may want to rely on this from Paul Randal.  You can even read what little I wrote about it a few months ago.  I came across this little function while working on a pet project (that I still need to finish – d’oh).

Conclusion

Functions are very handy.  They can also prove to be the cause of poor performance (such as was the case with that string splitter).  When used appropriately, you can provide well-performing SQL as well as some very handy information.

T-SQL Tuesday #22 – Data Presentation

Tags: ,
Comments: No Comments
Published on: September 13, 2011

Tuesday is upon us.  It also happens to be the second Tuesday of the month and you know what that means.  It is time for TSQL Tuesday.  This month, Robert Pearl, a friend, is hosting.  Read his invitation here.

As the invitation suggests, this topic is wide open.  There are many ways to present data.  Some of those methods may be better than others.  There are a few things of note with Data Presentation that will impact how much better your Data Presentation will be.

The items I will discuss that will help your data be better presented are: Performance, Accuracy, Display, and Business Requirements.  I will use a few scripts to progress through each of these topics.  Let’s start with Performance.

Performance

Why is performance included in this discussion?  How does performance relate to data presentation?  Well, have you ever had an end-user complain that a report was utterly useless because it was too slow?  If the report is too slow, it won’t matter if the data is accurate in the eyes of some.  They needed to have the data yesterday and it simply took too long.

Here is an example of a query that could be optimized a bit more (admittedly this query does not perform soooo slowly that a user would give up on it – by the end you will see that it could perform better).

DECLARE @StartDate DATETIME = '2011-09-08'
		,@EndDate DATETIME = '2013-11-09'
		,@numMonths TINYINT
DECLARE @calendarTable TABLE (monthNum TINYINT, monthDays TINYINT)
 
SET @numMonths = (SELECT DATEDIFF(m,@StartDate,@EndDate) + 1)
 
WHILE @numMonths >= 1
        BEGIN
                INSERT INTO @calendarTable SELECT MONTH(@StartDate), DATEDIFF(d, @startDate, DATEADD(m, 1, DATEADD(d, 1 - DAY(@StartDate), @StartDate)))
                SET @numMonths = @numMonths - 1
                SET @StartDate = (SELECT DATEADD(m, 1, DATEADD(d, 1 - DAY(@StartDate), @StartDate)))
                --SELECT @StartDate
        END
 
SELECT * FROM @calendarTable
 
Go

The requirements for this script are simple.  Provide the number of days in a month in a given date range.  If the starting date provided is not the first of the month, then we must only provide the number of days from that date to the end of the month.

As you can see, this script utilizes a looping mechanism to provide this information.  The loop inserts into a table variable one record at a time.  When examining the execution plan and the execution time on this query, one would see that the insert is the most expensive part of the query.  One would also find that this query does take a fair amount of time to run – despite its’ simplicity.  On my machine, it takes about 1.3 seconds to execute.

Certainly, if this were a more complicated query, one would see that this type of query could cause some delays in data rendering and subsequently cause grief for the end-user and you.

There is one more issue with the provided query in that it doesn’t meet all requirements.  I neglected to mention that the date format needs to accept date/month/year format.  Yes it is nitpicking, but it was a requirement and an invalid date is far too easy to submit with this query.

Accuracy

There should be no question on how accuracy affects data presentation.  No matter how pretty the rest of the data/report may look, wrong data will render the report useless in the eyes of many business users.

Again, this example is not extreme – but it does create sufficient concern that the query should not be used – unless fixed.

SET DATEFORMAT DMY
 
DECLARE @StartDate      DATE = '05/06/2011'
        ,@EndDate       DATE = '31/08/2012'
 
;
WITH getmonths AS (
        SELECT CAST(CONVERT(VARCHAR,'01/' + CAST(MonthNum AS VARCHAR) + '/' + CAST(yr.yr AS VARCHAR)) AS DATE) AS Moy
                FROM ( VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) Mnths (MonthNum)
                Cross Apply (SELECT YEAR(@StartDate) AS yr
                                                UNION
                                                SELECT YEAR(@EndDate) AS yr
                                        ) yr
), inputdates AS (
        SELECT dates FROM (VALUES (@StartDate),(@EndDate)) D (Dates)
        )
 
SELECT DATENAME(m,g.moy) AS MonthInRange,YEAR(g.moy) AS YrInRange
                ,DATEPART(d,DATEADD(mm, DATEDIFF(mm, 0, g.moy)+1, 0)-1)
                        - CASE WHEN DATEPART(d,id.Dates) = DATEPART(d,DATEADD(mm, DATEDIFF(mm, 0, g.moy)+1, 0)-1)
                                                THEN 0
                                        WHEN DATEPART(d,g.moy) < DATEPART(d,id.Dates)
                                                THEN DATEPART(d,id.Dates)
                                        ELSE Isnull(DATEPART(d,id.Dates),DATEPART(d,g.moy)-1)
                                END AS DaysInMonthInRange
        FROM getmonths g
                LEFT Outer Join inputdates Id
                        ON DATEPART(m,g.moy) = DATEPART(m,id.Dates)
                        And YEAR(g.moy) = YEAR(id.Dates)
        WHERE g.moy between @StartDate and @EndDate
                Or id.Dates between @StartDate and @EndDate
 
Go

Though this query works faster than the first query, it is not entirely accurate.  This query only supports a max of 12 months.  Also, this query is overly complex.  There is value in keeping this particular query more simple.

Display

This topic could be deemed to be largely a matter of preference.  That said, there is great value in how you display the data to the end user.  Displaying the data involves such things as useful column headings, and meaningful data.  For instance, naming a column “Month” but displaying a year value in it – is not very useful.  This could also overlap with the prior topic of accurate data.

Personally, when I am displaying the month, I like to see month names.  In my experience, displaying the name of the month is more easily recognizable than the month number for many users.

SET STATISTICS TIME ON
SET DATEFORMAT DMY
 
DECLARE @StartDate   DATETIME = '05/06/2011',
        @EndDate     DATETIME = '31/08/2013'
 
BEGIN
	WITH Nbrs_2( n ) AS (SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 0),
			Nbrs_3( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 CROSS JOIN Nbrs_2 n3)
	, tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY n)
							   FROM Nbrs_3
							   )
	,cteMonthEnd AS
	(
	 SELECT t.N,
			MonthEnd = DATEADD(mm,DATEDIFF(mm,0,@StartDate) + t.N, 0) - 1
	   FROM Tally t
	  WHERE t.N BETWEEN 1 AND DATEDIFF(mm,@StartDate,@EndDate) + 1
	)
	 SELECT MonthEnd
		,NumDays     = DATEPART(dd,MonthEnd) - CASE WHEN N > 1 THEN 0 ELSE DATEPART(dd,@StartDate) END
	   FROM cteMonthEnd
	   ORDER BY MonthEnd ASC
END

In this case, the Column Heading matches the data and works.  However, a little change could make this query and output more useful to an end user who is quickly scanning over the results.

SET STATISTICS TIME ON
SET DATEFORMAT DMY
 
DECLARE @StartDate   DATETIME = '05/06/2011',
        @EndDate     DATETIME = '31/08/2013'
 
BEGIN
	WITH Nbrs_2( n ) AS (SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 1 UNION all
					SELECT 0),
			Nbrs_3( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 CROSS JOIN Nbrs_2 n3)
	, tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY n)
							   FROM Nbrs_3
							   )
	,cteMonthEnd AS
	(
	 SELECT t.N,
			MonthEnd = DATEADD(mm,DATEDIFF(mm,0,@StartDate) + t.N, 0) - 1
	   FROM Tally t
	  WHERE t.N BETWEEN 1 AND DATEDIFF(mm,@StartDate,@EndDate) + 1
	)
	 SELECT DATENAME(m,MonthEnd) AS [MONTH],YEAR(MonthEnd) AS [YEAR]
		,NumDays     = DATEPART(dd,MonthEnd) - CASE WHEN N > 1 THEN 0 ELSE DATEPART(dd,@StartDate) END
	   FROM cteMonthEnd
	   ORDER BY MonthEnd ASC
END

To this point, each query has progressively increased in performance.  This last query (both versions) also matches the need to accurately display the data.

The display of data is closely coupled with the next topic – Business Requirements.

Business Requirements

When looking at the results of the last two queries, which one is more accurate?  Well, that entirely depends on what the business has specified as acceptable output for this query.  In this case, both could be correct or both could be wrong.

It is essential to have the requirements written down and understood by all parties so as to avoid any misconception or misunderstanding of what the query should display.  Matching your output to the specified business requirements will help to provide an overall impression of accuracy and usefulness.  It will also help to achieve faster sign-off by the business.

Conclusion

Data Presentation is not just about the look of a report.  Data Presentation involves performance, accuracy, display and the business requirements.  By giving proper attention to each of these facets, the Data presented will be both remarkable and acceptable to the end-user.

TSQL Tuesday 21 – FAIL FAIL FAIL

Comments: No Comments
Published on: August 10, 2011

It is TSQL Tuesda… err Wedn… err Tuesday for August 2011.  This month the party is a day later and bringing us oodles of failure.

Adam Machanic is hosting this month.  He thought it would be a good idea to poke and prod us while getting us to torture ourselves.  It seems to be a month long theme (as he alluded to in his announcement of the topic).

See, the topic this month requires us to share our failures.  More specifically we are to share our ugly code that we know better than to do.  Crap Code is a failure.  At least we have come to recognize that it was crap code and that it was indeed a failure.  The question now is this:  What have we done about what we learned?

I put a fair amount of thought into this topic.  I could come up with several possible experiences for this topic.  I kept running into a wall though.  That wall was recalling detailed facts and examples of those experiences.  Many of them were documented and left behind with former employers.  I no longer have the “dirt” on those dirty little secrets.

Then it hit me!  I had posted some blog entries about some of my experiences and was certain that something would be applicable.  You know what?  It’s a good thing I have this blog going.  Not only does it help me to learn many things and write better, it serves as a repository for introspection and recollection.  So, I will rely on a couple of posts from the early days of this blog to help write my entry for TSQL Tuesday this month.  And, I hope that what I share and what I learned from my experience will prove helpful to somebody else.

The FAILURE / Crap Code

If you want, you can read the entire first part here then skip to the next heading.  Or you can continue reading as I recount that article.

I had been asked to provide a report that would provide data according to pay periods.  I had no calendar table yet I needed to be able to compare dates passed to the report and match them to this pay period calendar.  Despite requests to the business (the pay periods for this report did not line up with the normal business pay periods) to get a list of those pay periods in order to create a Calendar table, I got nothing.  I did however receive generic instruction as to the general time-lines for these pay periods.  Based on this, I decided to get tricky and try to meet some basic guidelines that I created for myself.

  1. The payperiods need to be accessible to the query
  2. No Physical Lookup Table
  3. Do not hard-code the data in the code.

Based on that, I came up with a pretty neat little solution.

WITH periodstarts (StartID, StartDate,EndDate)AS (
SELECT 1 AS StartID,@StartDate,DATEADD(wk, DATEDIFF(wk, 0, @StartDate) + 2, 0) -1
UNION All
SELECT StartID + 1,DATEADD(wk, DATEDIFF(wk, 0, StartDate) + 2, 0) AS StartDate,DATEADD(wk, DATEDIFF(wk, 0, EndDate) + 2, 0)-1 AS EndDate
FROM PeriodStarts
WHERE DATEADD(wk, DATEDIFF(wk, 0, StartDate) + 2, 0) > StartDate
And Startid < 105

This worked well – for awhile.  Performance degraded over time, then it didn’t seem so cool.

Egg On Face

The update to that first article can be read in full here.  In short, that method really did not work out as well as I had hoped.  In the end, I had to create a table that would need to be maintained – at some point.  I populated the data in the table with 5 years worth of pay periods.  I hope somebody reads the documentation and takes care of that!

With the implementation of the table, I updated the CTE and the code being run by the report.  Performance of this updated version outperformed the first version – even when it was running fast.  I probably could have gotten away with doing an Itzik style numbers CTE in lieu of that recursive CTE – as I look back on it now.  Again, that is another level of learning and could potentially be an improvement on that first round of crap code I offered up for that report.

Conclusion

I learned from that experience.  First, it was a pretty neat little trick.  Secondly, I really should have tested it more.  Thirdly, I can likely still learn from it and improve on it because I am getting better at writing faster code and testing my proposed solutions (and I keep learning new tips/tricks).  Obviously something like this has not deterred me.  Quite the opposite really.  Because of a little failure like this, I work harder to get better at what I do.

T-SQL Tuesday #19 – Disasters & Recovery

Tags:
Comments: 1 Comment
Published on: June 14, 2011

When I first read this article, I thought to myself “I have no clue on this one.  What could I write that would be unique or interesting?”

My Disasters

Have you heard that old axiom that bad things come in sets of three?  For me, I have three such tales and two of them have a correlation to SQL Server (at least in some degree).

Disaster the First – You Can see it Coming

This one is a slow, ongoing process.  We are in a state of watch and wait.  But it at least gives us a chance to prepare and thus be prepared.  We live in an area where a flood plain exists but once every 20-25 years.  It just so happens that this is one of those years.  Our neighborhood is preparing by digging ditches, building dykes and sandbagging the most likely flow areas.  We see it coming and we are making plans.

Disaster the Second – Efforts Didn’t Go According to Plan

Last Friday I was applying a software patch.  As you would have guessed, the patch did not go according to plan.  The patch actually corrupted windows files and caused windows to fail reboots.  I tried to repair the corrupted files (default action with Win7).  I also tried to restore to a previous checkpoint.  Last effort was to re-install Windows.  You know that means I had to re-install all of my apps as well.  Any settings in the apps (such as SSMS) would be lost.

In this case, I should have imaged my machine immediately prior to installing the patch.  In this case, I wanted to retrieve all of my SSMS registered servers.  In order to do that (even though I did not export the registered servers) I needed to locate the prior RegSvr.XML from the backup that the Windows Installation makes of an existing windows install.  In some cases this prior install will be relocated to windows.Old.  Search in that directory for the RegSvr.XML file and copy it to %username%\AppData\Roaming\Microsoft\Microsoft SQL Server\100\Tools\Shell.  This will RECOVER the registered servers in SSMS that you had there prior to the reinstall.  Of course you would first need to reinstall SQL Server Tools prior to attempting this.

Disaster the Third – UnPlanned

After successfully recovering from that minor disaster, I encountered a new disaster only after two days of laptop use.  Upon arriving on-site yesterday morning (and mind you the laptop worked at the hotel) in Chicago (which is out of state for me), my laptop froze, rebooted and came back to “No Boot Disk.”  Well, I was completely unprepared for this midlevel disaster.  I didn’t have my reinstall discs with me.  I did not have my laptop backed up from the Friday disaster – I was still trying to get everything back to normal.  I had but one option – buy a new laptop and try to recover the hard drive later.

In trying to make sure it was not going to boot, I booted the system to a Boot CD (UBCD and a couple others) to be certain.  I was getting Int13h errors trying to find the drive.  I checked CMOS and CMOS did not see the drive.  I bought the new laptop and first tried to boot the new laptop from that HDD.  It too did not recognize it.  The drive was not even spinning up at that time.

Today, I have attached that drive to an external cage and connected via USB.  It’s alive!!  The data is still there but some clusters are corrupt.  That is fine so long as I get the data.  In this case, I don’t normally take daily backups of my laptop drives.  In the future, I will be taking more frequent backups of the laptop drives.  And now I have a backup laptop for cases just like this.

And once this last disaster is recovered, I will be copying settings from apps on that laptop to the new laptop – might save a little time.

T-SQL Tuesday #18 – CTEs

Comments: 3 Comments
Published on: May 10, 2011

To CTE or not to CTE, that is the Question

So my post really has nothing to do with the title.  This is not a post to help you determine whether a CTE is appropriate or not.  Or is it?

This month, we have the 18th installment in the TSQLTuesday series.  We are being hosted by Bob Pusateri  (Blog|@SQLBob) this month.  The essence of the topic this month is around CTEs (common table expressions).  There are a great many uses for a CTE in SQL server and this was a nice addition to the product.

One thing I like about CTEs is how much cleaner the code looks to me.  Another benefit for me is the recursion that is available through the use of a CTE.  An observation about CTEs is that a common use would be to use them to replace inline derived tables (which lends to cleaner looking code for some).

Think Think Think

I gave this topic a good long thought.  As I thought about the topic, I came to the conclusion that I had nothing new or unique on the subject.  I did however have some scripts that I posted once upon a time that would work very well for this topic.  Though it is a bit of a cop out, it is an appropriate solution for this month.

In the case of the CTE that I have chosen, there are multiple CTEs being used.  I use the CTE to recurse through data, and then to recurse that same data again – in reverse.  This particular script was created to traverse through system catalogs and create a hierarchy of table relationships.  I use this hierarchy to better understand the structure of the database and the interrelationship of the data between objects.  It is a cheap way of mapping out the objects in an effort to better understand it.

I had thought about using this script once upon a time for a different TSQLTuesday, but thought better of it that time.  Since the original post is more than a year old, it is a good time to bring it up and use it again.  Without further adieu, you can read about that script and CTE here.  I hope you enjoy.

T-SQL Tuesday #17 – APPLY Knowledge

Comments: 2 Comments
Published on: April 12, 2011

We have another opportunity to write as a part of TSQL Tuesday today.  This month Matt Velic (Blog | Twitter).  Matt has proposed a challenge that was derived from a comment on twitter.  The challenge this month is to write about “Apply.”

Admins Apply Within

As an administrator of databases, do you use Apply?  Not only can Apply be used to help return result sets as a part of your application, it can come in quite handy from an administration point of view too.  It can be used to help retrieve the text of currently executing code or code stored in cache.  Apply can be used to help retrieve index information and it can come in quite handy when performing a security audit.

Some Back Story

Just a few days ago, I posted some scripts to help find role membership of logins at the server level.  I wasn’t satisfied with the query and decided to rewrite it as a part of my entry for this months blog party.  You can read up on that script here.  My dissatisfaction with the query was that it felt klugy.  I wanted to write the query with something other than the Union and to present fewer rows to user.  So, I took another look at the query and purpose for the query and decided to use a combination of Apply and Pivot.

My new take on the script is simple, I will create a result set that will show a 1 for each server role for which a login is a member.  Each login will only have 1 row, opposed to the row per server role membership from the prior query.

The Script

SELECT PrincipalName,CA.[PUBLIC],sysadmin,bulkadmin,securityadmin,serveradmin,setupadmin,processadmin,diskadmin,dbcreator
	FROM    (
		SELECT  SUSER_NAME(SR.role_principal_id) AS ServerRole, SP.name AS PrincipalName
			FROM sys.server_role_members SR
				INNER Join sys.server_principals SP
					ON SR.member_principal_id = SP.principal_id
		) Roles
		PIVOT   (
			COUNT(serverrole)
			FOR serverrole IN (sysadmin,bulkadmin,securityadmin,serveradmin,setupadmin,processadmin,diskadmin,dbcreator)
		) PVT
	Cross Apply (SELECT 1 AS [PUBLIC]) CA

In this example, the bulk of the heavy lifting is done through the pivot.  You will see that the Apply only seems to play a minor role in this script.  It is an important role nonetheless.  All logins (unless you have altered the public role) are also a member of the public role.  That membership is not displayed through the views I have used in this query.  Due to that handling of the public role, and to ensure that the membership in that role is not forgotten, it must be accounted for in the query.

This query will add a new column for the public role to those columns already presented via the pivot.  In this column we will show that every login returned via the query is a member of the public role.  But how is that done?  Using the apply, in really basic terms, acts like a join to this subquery that I use for the public column.  I am taking the value of 1 for column public, and applying that value to all of the results from the Pivot portion of the query.  I am using the Cross version and there is one more version – called Outer.  In my testing, both the Outer and the Cross Apply will return the same results for this particular query.

Some common uses for Apply are in conjunction with TVFs.  Should you use the Apply with a function where the “Join” criteria would be more tightly defined, then the use of Cross V. Outer will produce different results.  Just as with an Outer Join, the Outer would return results for all records in the “outer” or “left” part of the join whether there was a matching record in the TVF or not.  In the same sort of setup, the Cross Apply would only return records that had a match in the TVF.

Conclusion

Again, this is an oversimplification of the Apply statement.  My main goal was to present a use for Apply.  Apply can be a useful tool in the hands of the database administrator.  Examine it and learn what other potential it may have for some of your administration needs.

This is just one piece of the puzzle when performing a security audit.  Getting this information quickly when requested for an audit can be very helpful.  The output format is quite simple and very conducive for many auditors and management.  A simple output report for the Server Roles in your environment is merely one example of the usefulness of Apply.

T-SQL Tuesday #016: Aggregates and Statistics

Comments: No Comments
Published on: March 8, 2011

Data Size Collection and Analysis

By now you should be well acquainted with this phenomenon we call TSQL Tuesday.  This party is being hosted this month by Jes Schultz Borland (Twitter Blog).  She has challenged us to write something about aggregations and aggregation functions in SQL Server.

I stewed on this topic for a while trying to figure out something that would be relevant yet a little unique.  I think that is the real challenge – finding some application of the topic that may be somewhat unique or at least informative for somebody.

It dawned on me finally that I already have a topic in queue waiting to be written.  It was supposed to be a follow-up to my entry for last month.  (You can read that entry here.)  I concluded that entry with an admission that I hurried through the article to get it done in time.  Well, I was hoping to find the time to write the rest of my process – but now it fits quite well with this months theme.

Recap

Quickly, let’s recap what I did in that post and then I will proceed from there to tie this month to last month (as far as TSQL Tuesday goes).

In last months entry, I shared a script (an ugly one) that I created to cycle through all of the tables and columns of a database to get me some relevant data concerning the size of the data in my tables.  Well, actually I took a 2% sample of that information so I could run further statistical analysis.  The end goal was to have relevant data from an existing system and the R&D databases to create appropriate data sizes in the new database prior to releasing it to production.  With all of that data aggregated into a staging table, I was ready to being the next phase.

One thing I did not mention in that prior article was the creation of another table for this aggregation process.  I didn’t mention it because it used much the same process (though considerably faster because it didn’t do the same thing).   This table had a prime objective of collecting the max length of each column of each table of each database.  The structure is simple:

CREATE TABLE [dbo].[DataAnalysis](
	[DatabaseName] [sysname] NOT NULL,
	[TableName] [sysname] NOT NULL,
	[ColumnName] [sysname] NOT NULL,
	[MaxLength] [INT] NULL,
	[ColDataType] [VARCHAR](20) NULL,
	[MaxColLength] [INT] NULL
) ON [PRIMARY]

I mention this table now because I will be using it in my final aggregation.

Statistics and Aggregation

For my data analysis and trending, I wanted to find a simple distribution across quartiles.  A quartile is: One of the three numbers (values) that divide a range of data into four equal parts.  A quartile is used in statistical analysis and is commonly a part of a box plot.  Other statistical values that work well with the quartile include the MAX value.  Since I had usable data to be able to produce my quartile ranges, I used the following query to further aggregate and even used a function that provides the quartile.

SELECT DLA.DatabaseName,DLA.TableName,DLA.ColumnName, DA.ColDataType
	,DLA.ColLens,COUNT(DLA.ColLens) AS NumOccurence
	,CASE WHEN DA.MaxColLength = -1 THEN 2000000 ELSE DA.MaxColLength END AS MaxSupportedLen
	,DA.MaxLength AS MaxDataLen
    ,NTILE(4) OVER (partition BY DLA.DatabaseName,DLA.TableName,DLA.ColumnName ORDER BY DLA.ColLens) AS 'Quartile'
INTO DataStats
FROM DataLenStats_Alt DLA
	LEFT Outer Join DataAnalysis DA
		ON DLA.DatabaseName = DA.DatabaseName
		And DLA.TableName = DA.TableName
		And DLA.ColumnName = DA.ColumnName
	GROUP BY DLA.DatabaseName,DLA.TableName,DLA.ColumnName,DLA.ColLens,DA.MaxColLength,DA.MaxLength, DA.ColDataType
	ORDER BY DLA.DatabaseName,DLA.TableName,DLA.ColumnName,DLA.ColLens

From this script, you can see that I am taking data from the first two staging tables and dumping it into a third table.  With this query I am taking advantage of the NTILE() function that is available in SQL Server (2005 and up).  And since my data that has been accumulated is for more than one table, more than one database, and more than one column – I needed to partition that function based on those attributes.

With this data now available, I could see the trend of the data for any data field that may have been collected.  For instance, if I wanted to figure out the proper size (based on current data) of the phone number in my new database, I could now query the DataStats table like this:

SELECT * FROM DataStats DS
	 WHERE DS.ColumnName like '%phone'

I would then be able to determine where that field exists and the distribution of data across the quartiles and in comparison to the max data size for that field.  This helps to more intelligently assign a data size to fields based on existing data.  I could quickly ascertain that most of the data is within the second quartile (for instance) and that I have few outliers in the third and fourth quartiles and maybe an extreme case where the max is way out of scope in comparison.  At this point I could make an educated judgement call as to an appropriate size based on distribution, outliers and risk.

Conclusion

This exercise was a particularly challenging one.  It was challenging due to the desire to create quartiles for analysis.  I had wanted to break it down into Standard Deviations for further analysis (and still may).  This is highly useful when in the R&D or development phase.  I wouldn’t run the query from the first post on a production system because it is a long running process and can be resource intensive (I need to optimize it more).  This kind of script can really help to get better acquainted with the data as well.  I learned a lot by doing this and am looking forward to how I can improve upon it.

T-SQL Tuesday #15 DBA Automaton

Comments: 1 Comment
Published on: February 8, 2011

I have been incognito for the last couple of weeks and nearly missed TSQL Tuesday this month.  If it weren’t for somebody pinging me requesting the link to the list of upcoming hosts, I would have missed it entirely.  The topic merges well with the work I have been doing for the past few weeks.  In fact, there are so many things that DBA’s do on a regular basis to automate things – there should be plenty of items to cherry pick for a topic.  Unfortunately, that doesn’t make this topic any easier for me.  My decision process for this is partially based on thinking through what might be unique without having read what others may have written.  (While thinking about reading blogs, it would be nice if there was an automated method to ingest all of the blogs into my head without having to iterate through them one at a time – manually.)

This month the party is hosted by Pat Wright (blogtwitter).  Pat has asked us to describe some of the things we automate – or some of our automation that we have implemented.  Often, we hear about DBA’s automating everything under the sun.  Why?  It simplifies the job and creates time to work on other projects.  With all of the automation, I wonder if DBA’s are related to Hephaestus in any fashion.  If we had our way, it seems that our databases would be…Automatons.

A Lesser Automation

Now that I have rambled for a good bit, I guess it is time to get to the meat of the topic.  First, we need to understand automation.  So, what is automation?

  1. The automatic operation or control of equipment, a process, or a system.
  2. The techniques and equipment used to achieve automatic operation or control.

A closely related word to automation is:

  1. Computerization – the control of processes by computer

In other words, for a DBA, automation is the implementation of a process or control for the computer to operate without the DBA doing the work.  This is typically something that is repetitive or menial or tedious or frequently done.  But that is not always the case.  This can also be something that is fun and/or only done once or twice a year.

I have one of those cases where I may use the automation even less often than once a year, or maybe it could come about more frequently.  It all depends on the needs of designing and testing new databases.

This process is to help in properly sizing the database before the database is finally released to production.  With good project requirements, you may have a good idea of what the fields and sizes of those fields should be.  In some cases, you will be getting data from an external source in some fashion or another.  This data does not always come with storage requirements or data size parameters.  You can make guesses at it by looking at the data – but sometimes, something more is required.

It is when more is required that this script comes in a bit handy for me.  The script is ugly, but it does the deed.  The base idea is to retrieve the data length for a sample of data from each of the tables (after import into a staging database).  This is done so I can run statistical analysis on the data later.  And cringe now because I use a nested cursor to get at what I want.

So Here is the script that will load a sample of data from every table in every database for every column and give you the length of the pertinent columns (I have excluded obvious columns such as numeric types and certain LOB types).

IF exists (SELECT name FROM sys.objects WHERE name = 'DataLenStats_Alt')
BEGIN
	DROP TABLE DataLenStats_Alt
END
 
CREATE TABLE DataLenStats_Alt (DatabaseName sysname,TableName SYSNAME, ColumnName SYSNAME, ColLens INT)
 
DECLARE dbfetch CURSOR STATIC FOR
SELECT name
	FROM sys.databases
	WHERE database_id > 4
		And Name <> 'Admin'
	ORDER BY Name
 
OPEN dbfetch;
DECLARE @sql1 VARCHAR(MAX), @dbname VARCHAR(128)
 
FETCH NEXT FROM dbfetch
INTO @dbname;
 
WHILE @@FETCH_STATUS = 0
BEGIN
 
	SET @sql1 = '
	Use ' + @dbname +';
	DECLARE datalens_cursor CURSOR Static FOR
	SELECT Distinct IST.TABLE_NAME,IST.Column_Name
	FROM Information_schema.Columns IST
		Inner Join sys.dm_db_partition_stats AS st
			On object_id(IST.TABLE_NAME) = st.object_id
	WHERE ObjectProperty(object_id(Table_Name),''Ismsshipped'') = 0
		And Table_Name not like ''sysdi%''
		And DATA_TYPE not in (''XML'',''uniqueidentifier'',''image'')
		And isnull(numeric_precision,0)=0
	Group By Table_Name,Column_name
	Having Sum(st.row_count) > 0
	ORDER BY Table_name;
 
	OPEN datalens_cursor;
	declare @sql varchar(max), @table sysname, @column sysname
 
	FETCH NEXT FROM datalens_cursor
		INTO @table,@column;
 
	WHILE @@FETCH_STATUS = 0
	BEGIN
	Set @sql=''''
 
		select @sql=@sql+''select top 2 Percent db_name(),''''''+@table+'''''',''''''+@column+'''''', len(isnull(convert(varchar(max),''+@column+''),''+''0''+'')) from [''+@table+''] ''
							From information_schema.columns
								Where table_name=''''+@table+''''
									And column_name = ''''+@column+''''
 
		Begin Tran
			--print @sql
			Insert Into Admin.dbo.DataLenStats_Alt
			exec(@sql)
		Commit Tran
	FETCH NEXT FROM datalens_cursor
		INTO @table, @column;
	END
 
	CLOSE datalens_cursor
	DEALLOCATE datalens_cursor
	--Checkpoint
	Use Tempdb;
	Checkpoint
	'
EXEC (@sql1)
FETCH NEXT FROM dbfetch INTO @dbname;
PRINT @sql1
END
 
CLOSE dbfetch
DEALLOCATE dbfetch

And yeah, I rushed through this to get it finished up quickly.  I hope to have more time to delve into it later.  At least with this script, I can load a table with adequate data to be able to generate histograms on the data length/size distribution and then make appropriate sizing decisions based on the statistics.  Be warned – though automated it is slow and should not be run on a production server.

page 1 of 4»
Calendar
February 2012
M T W T F S S
« Jan    
 12345
6789101112
13141516171819
20212223242526
272829  
Follow me on Google+
Jason Brimhall

In 246 people's circles

Add to circlesi
Content
Categories

Categories

Now Reading

Now Reading

Planned books:

Current books:

  • ChiRunning: A Revolutionary Approach to Effortless, Injury-Free Running

    ChiRunning: A Revolutionary Approach to Effortless, Injury-Free Running by Danny Dreyer, Katherine Dreyer

  • Advanced Marathoning – 2nd Edition

    Advanced Marathoning – 2nd Edition by Peter Pfitzinger, Scott Douglas

  • SQL Server MVP Deep Dives

    SQL Server MVP Deep Dives by Nielsen Paul, Delaney Kalen, Machanic Adam, Tripp Kimberly, Randal Paul, Low Greg

  • A World Without Heroes (Beyonders)

    A World Without Heroes (Beyonders) by Brandon Mull

Recent books:

View full Library

SQLHelp

SQLHelp


Welcome , today is Saturday, February 4, 2012