Using Synonyms to Extend SSIS

Comments: No Comments
Published on: July 3, 2014

There are a million and one uses for synonyms.  There are at least that many uses for SSIS.  The reality is, not all of those uses are good for you nor for your data nor for your database.

Recently I wrote an article about some good and some bad with synonyms.  You can read that article from my work blog site here.  Today, I just want to expand on that article a little bit.  I glossed over some things pretty quick in that article that I though might be fun to explain with a little more detail.

The Bad – a little closer look

First, let’s take a look a little closer at the bad.  More specifically, in the example I used, there was a vendor that created a synonym for everything.  I really do mean everything.  This is one case where using the word “literally” would be accurate.  On the client database, I could run a quick query such as the following and see over 7300 synonyms.

[codesyntax lang="tsql"]

[/codesyntax]

In the case of this client and this vendor, 7300+ synonyms is far too many.  It just led to mass confusion.  If you were to run that query, you might see something like the following image.

massSynonyms

I added a “derived” column to show the total count of synonyms and the record name as it relates to that total.  That is a crazy amount of synonyms.  That just makes me think that somebody got into the krazy kool-aid one day, was bored and gave life to a synonym beast.

The Good – With more detail

On the flip side, in the aforementioned article, I talked about synonyms as a means to tweak performance in SSIS.  Normally I would not tout a synonym as a performance tweak.  So I want to caution that the performance gains are specific to SSIS and a handful of those millions of uses for SSIS.

Let’s just begin with a little bit of background.  For that background, some pictures will be really handy.  So here goes.

SpiceSource

In the preceding image we see a very simple segment of a data flow.

The data source uses a sql command to fetch the data required for the data flow.  In the beginning, it is very straight forward.  You probably have some package lying around with something similar to this.

In the following image, we see what the SQL Command was for that data source circled in red in the previous image.

SQLWithout

In the next image we see a slight tweak to the query.  This time to include a reference to a table that is defined/obfuscated by a synonym.

SQLWith

At this point I can hear some of you saying, “Ok, I see what he is doing.”  While many others are wondering why I just made the query more complex than the previous example.

Well as luck would have it, this change serves a couple of purposes.  1) The data has been staged in a separate database.  That database has a different name in every environment (recall the aforementioned article).  So the synonym minimizes code changes when deploying the package.  2) The synonym allows us to confirm that there is data in the stage table and that the data matches a ClientCode in the destination table.  3) Lastly, the synonym reduces my dataset which reduces memory requirements and also gets the data loaded faster (because it is smaller).

In addition to this minor tweak, I can also do something like the following.

WithoutSynonym

In the preceding image, we see two datasources.  Each datasource is subsequently sorted and then eventually joined.  Much like the previous example, due to naming standards and an effort to try and minimize code changes during deployments, at least one datasource is pulling in too much data.  The data is filtered down due to the Join transformation, but this is not very efficient.

WithSynonym

Through the use of a synonym, the datasources can be reduced to a single datasource.  This will eliminate the need for the Sort transformations and Join transformation.  Removing those three transformations reduced memory requirements.  And like the previous example, since we can trim down the number of records, the data flow will run a little bit faster too.

BigSQLWith

As You can see, the code is simple.  It’s not a super TSQL trick or anything to add a synonym into an existing query.  It just gets referenced like any other table.  Once again, in this case, the synonym is pointing to a table in a staging database.  That table has been loaded as a part of an ETL process and now needs to be manipulated a little bit through some other transformations and then inserted eventually into a “target” database.

Conclusion

As with tuning stored procedures or any TSQL, a similar technique was used here.  Reducing the datasets to contain JUST the data that is needed for the flow.  To facilitate that reduction in data to be just the essential data, I employed synonyms.

The reasons for using a synonym in this case were to: 1) restrict data to precisely what was needed, 2) ensure data being loaded was “constrained” by data in the destination table (e.g. only load for a specific client that does exist), and 3) minimize code changes during deployments.

When dealing with databases that serve the same purpose but follow some absurd naming standard that changes the name between environments, it can become cumbersome to maintain code during deployments.  This is particularly true when dealing with cross database joins or lookups.

Auditing and Event SubClasses

Categories: News, Professional, Scripts, SSC
Comments: No Comments
Published on: May 28, 2014

A recent discussion got me to thinking about Auditing.  To be honest, it got started with a complaint about some documentation that seemed overly light about the various fields related to auditing as it stands in SQL Server.

In talking to the person who raised the valid concern about the lack of good documentation, I was curious why he suddenly had so many questions about auditing and its functionality within SQL Server.  Reflecting on the answers, it seems that it made good sense and it all kind of fell into place with the whole Audit Life Cycle.  I hadn’t previously considered the Life Cycle, but it makes sense to map it out.  Here is a simple rendition of what an audit Life Cycle might entail.

 

 

AuditCycle_trans

 

 

In order to audit anything, it is necessary to know what you really want to audit, why you want to audit it and how to achieve those goals within the tools given to you.  In that vein, it makes sense that one would need to study up on the topic to figure out what different things meant within the tool.

Of course, once you start collecting that data, then you also need to figure out how to measure it and then to determine if adjustments to the auditing plan need to be made.  In the end, it boils down to what is the data to be collected, what are you doing with that data and what does that data represent.

In our simple discussion, the data trying to be understood was related to the Event Subclass field in this View (sys.trace_subclass_values) and in this Microsoft document (one of several).  The beauty of this field is that it is not just tied to Auditing, but you will also find it in Profiler, server side traces, and Extended Events.

With so little information to help understand what the field values represent, maybe it is better to just turn to the data to help understand what the values might represent or how to interpret them.  To do this, we can query a few catalog views as in the following query.

[codesyntax lang="tsql"]

[/codesyntax]

With the above query, I can filter down to just the Event Types that have Audit in the name.  Or I could add a different filter so I can try and better understand the different subclasses in a more focused effort.

I hope this helps in your efforts to provide a better auditing or “profiling” type of experience in your environment.

 

Missing Indexes Script v2

Comments: 8 Comments
Published on: May 15, 2014

missing_idx

Frequently we hear the analogy that <insert item here> is like opinions, everybody has one and not all of them are good (some may stink).

Well, this may just be another one of those <items>.  Whether it stinks or not may depend on your mileage.

I had shared a similar script back in January 2012 and wanted to share something a little more current.  As is the case for many DB professionals, I am always tweaking (not twerking) and refining the script to try and make it more robust and a little more accurate.

This version does a couple of things differently than the previous version.  For one, this is a single database at a time (the prior version looped through all of the databases with a less refined query).  Another significant difference is that this query is designed to try and pull information from multiple places about the missing indexes and execution statistics.  I felt this could prove more advantageous and useful than to just pull the information from one place.

Here is the current working script.

The following script gets altered on display.  n.VALUE is displayed but in the code it is actually n.value.  The code display is wrong but it is correct in the code as presented in the editor.  If copying from this page, please change the U-cased “VALUE” in the XML segment to “value” so it will work.  A download of the script has been added at the end.

[codesyntax lang="tsql"]

[/codesyntax]

As has been the case in the past, this is not the end query.  I have a backlog of updates I want to make to the query and try to create something even more useful from it.

As a bit of fair caution, just because an index shows up as a recommended implementation from this script, it does not mean you should blanket implement it.  Also, like the previous script, the higher the impact value, the more probable the index will have a positive impact on your system.

In addition to the impact, I also look at the “Similar Queries” and “Similar Plans” numbers from the result set.  These values can sometimes lead to an understanding that their are other queries or other plans that could also benefit from the creation of the index.  Your mileage may vary.

Download script Missing_Index.

Table Space updated again

Categories: News, Professional, Scripts, SSC
Comments: No Comments
Published on: March 28, 2014

Today we have another update for an age old script on this site.  You can find the last update to the script here.

This time we have a bit of a bug update.  The details of that bug are in the notes for the script.

[codesyntax lang="tsql"]

[/codesyntax]

Table Hierarchy updated

Recently a need resurfaced to explore the foreign key tree (hierarchy/genealogy) as it related to a specific table within a database.  As I pulled out the script from the repository, I realized there was some unfinished work to be done.  But there was also some polish that needed to be added.  This is an update to the most recent posting of that script.  You can see several revisions in the series at this link or the group here.

Some of the changes involve formatting and and labeling.  I added a new column called “Direction” to help understand the relationship of the key to the table in question.  I also changed up the FKGenealogy (formerly called SortCol) to reflect the source table more accurately in the case when the key comes from an ancestor rather than a descendant.  The Level of the FK was also modified to help understand a little better how far away the ancestor was in relationship to the origin table.

A final adjustment also comes from the Genealogy attribute.  Ancestors were all starting at the wrong point in the lineage.  I adjusted that so the lineage can be seen from the point in the tree that the ancestor is related rather than as a root direct from the origin table.

All in all, this script should make more sense to the user than the previous versions.

[codesyntax lang="tsql"]

[/codesyntax]

Last Time CheckDB was Run

Categories: Corner, News, Professional, SSC
Comments: 10 Comments
Published on: January 17, 2013

Corrupt PagesBack in November 2012, I shared a story about checking the last known good checkdb in the boot page of a database.  You can read that story here.

This is an important enough topic that it is worth repeating frequently if I wanted to do that.  If  for no other reason than to continue to hammer at how important it is to both run checkdb and know the last time that checkdb was run successfully.

Alas, I am writing to fix a few things with the script that I shared in that last past.

I run this script on every server I touch to get a report for the last known good checkdb for every single database.  I had been running the script flawlessly across many servers without error.  Then it happened.  The script failed with a nasty error.

After a bit of looking, it became apparent my flaw in the script.  I had not written the script with CS (case sensitivity) in mind.  I touch so few CS servers, that I sometimes forget to check for that.  Slap my hands and I will work on that going forward.

So here is the update to the script.

[codesyntax lang="tsql"]

[/codesyntax]

SSRS Subscriptions Report

As a part of my series leading up to Christmas 2012, I shared a script to Report on SSRS Subscriptions.  It was soon found to have a bug with SQL Server 2008 R2 SP2.  IN the comments on that post, I promised to post an updated script.  Here is that update – without the bug.

[codesyntax lang="tsql"]

[/codesyntax]

The inline code seen above likes to reformat and and will throw an error due to capitalization of the function value and text().  Download the script here: SSRS_SubscriptionsV1_5

On the Ninth Day…

It’s the end of the world as we know it.  And as the song goes…I feel fine!  But hey, we didn’t start the fire.

Those are a couple of songs that pop into my head every time somebody starts talking doomsday and doomsday prophecies.

If you are reading this, I dare say that the world is still turning.  And that is a good thing because that gives us a chance to talk about the 12 days of pre-Christmas.

Today we will be talking about a tool that can be at the DBAs disposal to help in tracking performance as well as problems.

First there are a couple of items of housekeeping.  First item is that I only realized with this post that the post on the first day was wrong.  I had miscalculated my twelve days to end on Christmas day.  That is wrong!  Counting down from that first post on the 13th means the 12th day will end up on December 24th.  Maybe I will throw in a bonus 13th day post, or delay a post for a day, or just go with it.  It will be a mystery!

Second item is naturally the daily recap of the 12 days to date.

On the Ninth Day of pre-Christmas…

My DBA gave to me a Data Collection!

If only my DBA had told me that I would need to have Management Data Warehouse (MDW) preconfigured.  Well, that is not a problem.  We can handle that too.  For a tutorial on how to setup MDW, I am going to provide a link to one that has been very well written by Kalen Delaney.

The article written by Kalen covers the topic of MDW very well all the way from setting up the MDW, to setting up the Data Collectors, and all the way down to the default reports you can run for MDW.  The MDW and canned Data Collectors can provide some really useful information for your environment.

What I want to share though is a means to add custom data collections to your MDW.  To create a custom collection, we need to rely on two stored procedures provided to us by Microsoft.  Those stored procedures are: sp_syscollector_create_collection_item and sp_syscollector_create_collection_set.  Both of these stored procedures are found in the, you guessed it, msdb database.

Each of these stored procedures has a number of parameters to help in the creation of an appropriate data collection.  When creating a data collection, you will first create the collection set, and then you add collection items to that set.

There are a few notable parameters for each stored procedure that I will cover.  Otherwise, you can refer back to the links for each of the stored procedures to see more information about the parameters.

Starting with the sp_syscollector_create_collection_set stored procedure, I want to point out the @schedule_name, @name, and @collection_mode parameters.  The name is pretty obvious – make sure you have a distinguishable name that is descriptive (my opinion) or at least have good documentation.  The collection mode has two values.  As noted in the documentation, you will want to choose one value over the other depending on the intended use of this data collector.  If running continuously, just remember to run in cached mode.  And lastly is the schedule name.  This will help determine how frequently the job runs.

Unfortunately, the schedule names are not found in the documentation, but rather you are directed to query the sysschedules tables.  To help you find those schedules, here is a quick query.

[codesyntax lang="tsql"]

[/codesyntax]

Now on to the sp_syscollector_create_collection_item stored procedure.  There are three parameters that I want to lightly touch on.  For the rest, you can refer back to the documentation.  The parameters of interest here are @parameters, @frequency and @collector_type_uid.  Starting with the frequency parameter, this tells the collector how often to upload the data to the MDW if running in cached mode.  Be careful here to select an appropriate interval.  Next is the parameters parameter which is really the workhorse of the collection item.  In the case of the custom data collector that I will show in a bit, this is where the tsql query will go.

Last parameter to discuss is the collector type uid.  Like the schedule for the previous proc, the documentation for this one essentially refers you to a system view - syscollector_collector_types.  Here is a quick query to see the different collector types.

[codesyntax lang="tsql"]

[/codesyntax]

The collector type that I will be using for this example is Generic T-SQL Query Collector Type.  A discussion on the four types of collectors can be reserved for another time.

Let’s move on to the example now.  This custom data collector is designed to help troubleshoot deadlock problems.  The means I want to accomplish this is by querying the system_health extended event session.

I can query for deadlock information direct to the system_health session using a query like the following.

[codesyntax lang="tsql"]

[/codesyntax]

You may notice that I have converted to varchar(4000) from XML.  This is in large part to make sure the results will play nicely with the data collector.  Now to convert that to a query that can be used in the @parameters parameter, we get the following.

[codesyntax lang="tsql"]

[/codesyntax]

With this query, we are loading the necessary schema nodes that correlate to the Data Collector Type that we chose.  Since this parameter is XML, the schema must match or you will get an error.  We are now ready to generate a script that can create a deadlock data collector.

[codesyntax lang="tsql"]

[/codesyntax]

Upon creation, this will create a SQL Agent job with the defined schedule.  Since this is a non-cached data collector set, the Agent job will adhere to the schedule specified and upload data on that interval.

Now all we need to do is generate a deadlock to see if it is working.  It is also a good idea to introduce you to the table that will be created due to this data collector.  Once we create this collector set, a new table will be created in the MDW database.  In the case of this collector set, we have a table with the schema and name of custom_snapshots.systemhealthdeadlock.

This new table will have three columns.  One column represents the DeadlockGraph as we retrieved from the query we provided to the @parameters parameter.  The remaining columns are data collector columns for the collection date and the snapshot id.

Now that we have covered all of that, your favorite deadlock query has had enough time to finally fall victim to a deadlock.  We should also have some information recorded in the custom_snapshots.systemhealthdeadlock table relevant to the deadlock information (if not, it will be there once the agent job runs, or you can run a snapshot from SSMS of the data collector).  With a quick query, we can start looking into the deadlock problem.

[codesyntax lang="tsql"]

[/codesyntax]

This query will give me a few entries (since I went overkill and created a bunch of deadlocks).  If I click the DeadlockGraph cell in the result sets, I can then view the XML of the DeadlockGraph, as in the following.

[codesyntax lang="xml"]

[/codesyntax]

Code to generate deadlock courtesy of “SQL Server 2012 T-SQL Recipes: A Problem-Solution Approach” by Jason Brimhall, Wayne Sheffield et al (Chapter 12, pages 267-268).  If you examine the deadlock graph you will see the code that generated the deadlock.

Since this is being pulled from the RingBuffer target of the system_health, it can prove useful to store that data into a table such as I have done.  The reason being that the Ringbuffer can be overwritten, and with good timing on the data collector, we can preserve this information for later retrieval and troubleshooting.  Deadlocks don’t always happen at the most opportune time and even less likely to occur when we are staring at the screen waiting for them to happen.

As you read more about the stored procedures used to create a data collector, you will see that there is a retention parameter.  This helps prevent the table from getting too large on us.  We can also ensure that an appropriate retention is stored for these custom collectors.

Conclusion

Creating a custom data collector can be very handy for a DBA.  Especially in times of troubleshooting.  These collectors are also quite useful for trending and analysis.  Consider this a tool in the chest for the poor man. ;)

Enjoy and stay tuned!

All scripts and references were for SQL 2008 R2.  The Deadlock script was pulled from the 2012 book, but the script runs the same for the 2008 version of the AdventureWorks database.

On the Eighth Day…

Today’s post is merely an illusion.  The illusion being that we have finally stopped talking about the msdb database.  I’ll explain about that later in this post.

This should be a good addition to the script toolbox for those Mere Mortal DBAs out there supporting their corporate SSRS environment.  Everybody could use a script now and then that helps them better support their environment and perform their DBA duties, right?

No reading ahead now.  We’ll get to the script soon enough.  First, we have a bit of business to cover just as we normally do.

We need to quickly recap the first seven days thus far (after all, the song does a recap with each day).

 

  1. Runaway Jobs – 7th Day
  2. Maintenance Plan Gravage - 6th Day
  3. Table Compression - 5th Day
  4. Exercise for msdb - 4th Day
  5. Backup, Job and Mail History Cleanup - 3rd Day
  6. Service Broker Out of Control - 2nd Day
  7. Maint Plan Logs - 1st Day

On the Eighth Day of pre-Christmas…

My DBA gave to me a means to see Report Subscriptions and their schedules.

One of the intriguing points that we find with having a reporting environment is that we also need to report on that reporting environment.  And one of the nuisances of dealing with a Reporting Services Environment is that data like report subscription schedules is not very human friendly.

Part of the issue is that you need to be fluent with math.  Another part of the issue is that you need to be a little familiar with bitwise operations in SQL Server.  That said, it is possible to get by without understanding both very well.  And as a last resort, there is always the manual method of using Report Manager to check the subscriptions for each of the reports that have been deployed to that server.  Though, I think you will find this to be a bit tedious if you have a large number of reports.

I have seen more than one script that provides the schedule information for the subscriptions without using math and just relying on the bitwise operations.  This tends to produce a lot of repetitive code.  The method works, I’m just not that interested in the repetitive nature employed.

Within SQL Server you should notice that in several tables, views, and processes employ the powers of 2 or base 2 or binary number system.  This is natural since this is so integral within computer science in general.  Powers of 2 translates to binary fairly easily and then integrates so well with bitwise operations.

The following table demonstrates the powers of 2 and conversion to binary.

power of 2 value binary
0 1 1
1 2 10
2 4 100
3 8 1000
4 16 10000
5 32 100000
6 64 1000000
7 128 10000000
8 256 100000000

To get numbers and values between the binary results or the value results listed above is a matter of addition.  We add the value from a power of 2 to another power of 2.  So if I need a value of 7, then I need 2^0 + 2^1 + 2^2.  This results in a binary value of 0111.  Now this is where the need for bit comparisons comes into play so we will use some bitwise operations (read more here) to figure out quickly which values are used to reach an end value of 7 (so I don’t need to really know a lot of math there ;) ).

How does this Apply to Schedules?

This background has everything to do with scheduling in SSRS.  Within the ReportServer database, there is a table called Schedule in the dbo schema.  This table has multiple columns that store pieces of the Subscription Schedule.  The three key columns are DaysofWeek, DaysofMonth and Month.  The values stored in these columns are all sums of the powers of 2 necessary to represent multiple days or months.

For instance, you may see the following values

DaysOfWeek DaysOfMonth Month
62 135283073 2575

These values are not friendly to normal every day mere mortal DBAs.  The values from the preceding table are shown below with the human friendly data they represent.

DaysOfWeek DaysOfMonth Month
62 135283073 2575
Monday,Tuesday,Wednesday,Thursday,Friday 1,8,9,15,21,28 January,February,March,April,October,December

That is better for us to understand, but not necessarily better to store in the database.  So, I hope you can see the value of storing it in a numeric representation that is easily deciphered through math and TSQL.

Without further adieu, we have a script to report on these schedules without too much repetition.

[codesyntax lang="tsql"]

[/codesyntax]

Consider this as V1 of the script with expected changes coming forth.

I have set this up so a specific report name can be provided or not.  If not provided, the query will return all scheduling information for all reports.

Through the use of a numbers table (done through the CTE), I have been able to create a map table for each of the necessary values to be parsed from the schedule later in the script.  In the creation of that map table, note the use of the power function.  This map table was the critical piece in my opinion to create a script that could quickly decipher the values in the schedule and provide something readable to the DBA.

Conclusion

I did this script because I feel it important to know what reports are running and when they are running.  Add that management also likes to know that information, so there is value to it.  But, I found scripts on the web that used the bitwise operation piece but a lot of repetitive code to determine each Weekday and month.

An alternative would be to perform a query against the msdb database since Scheduled reports are done via a SQL Agent job.  I hope you find this report useful and that you can put it to use.

On the Seventh Day…

So far this series has been a bit of fun.  We have discussed a fair amount of useful things to date.  And here we are only on the seventh day.

Today we will be reverting our attention back to the msdb database.  This time not as a means of maintaining the under-carriage but rather to help us try and become better DBAs.

Sometimes to be a better DBA, we have to be a bit proactive and less reactive.

We’ll get to that shortly.  As we have done just about every day so far though, we need to recap real quick what we have to date in the 12 Days of pre-Christmas.

 

  1. Maintenance Plan Gravage – Day 6
  2. Table Compression - Day 5
  3. Exercise for msdb - Day 4
  4. Backup, Job and Mail History Cleanup - 3rd Day
  5. Service Broker Out of Control - 2nd Day
  6. Maint Plan Logs - 1st Day

On the Seventh Day of pre-Christmas…

My DBA gave to me an early Nuclear Fallout detection and warning system.  Ok, maybe not quite that extensive – but it could sure feel like it if something were to slip through the crevasses and the business management started breathing down your neck.

Have you ever had a SQL Job run longer than it should have?  Has the job run long enough that it ran past the next scheduled start time?  These are both symptoms that you might want to monitor for and track in your environment.  If you are watching for it – you should be able to preemptively strike with warning flares sent out to the interested parties.  Then you would be more of the hero than the scapegoat.  And we all want to be on the winning side of that conversation without the confrontation from time to time ;).

Today, I am going to share a quick script that can be used to help monitor for jobs that have waltzed well beyond the next scheduled start time.  First though, there is a script available out there for the jobs that have run beyond normal run times.  You can check out this article by Thomas LaRock to get a good script to check and log long running jobs.

Though the criteria are similar – we do have two different needs that need to be reported on and monitored.  This is why we have this second script.  If you have a job that should run every 15 minutes but you find that the job has been running non-stop for 45 minutes, that is quite a problem.  And many times we should alert on something like that.

So here is the quick script.

[codesyntax lang="tsql"]

[/codesyntax]

From here, it is a matter of creating a SQL job to run this script.  The schedule is dependent on job frequency and your expected needs to monitor.  It can very easily be set to run every five minutes.

Once the job is created then it is a matter of setting up a mechanism to alert.  There are any number of ways to do that as well.  I leave that to any who use the script to determine how best to implement for their own environment.

Conclusion

This script will catch jobs that are running and have exceeded the next scheduled start time.  This is accomplished through the filters in the where clause.  With a couple of date comparisons as well as filtering on job_status we can get the desired results.

 

«page 1 of 6




Calendar
July 2014
M T W T F S S
« May    
 123456
78910111213
14151617181920
21222324252627
28293031  
Content
SQLHelp

SQLHelp


Welcome , today is Thursday, July 31, 2014