T-SQL Tuesday #050: Automating Database Restores

Comments: 1 Comment
Published on: January 14, 2014

Here it is time for the party of the month for the SQL Server acolytes and I was running a bit behind.  Why?  Well, that was due in part to me rushing around trying to do some of what this months topic is.  Some would call that pretty could timing.

TSQL2sDay150x150Hemanth.D (blog|twitter) is hosting the T-SQL Tuesday blogging party this month, and he wants all of us to post about Automation. As Hemanth.D put it on his blog in his own words:

You could write about, what options you would consider when automating something? Where do you draw the line? What are our preferred tools for automation? T-SQL, PowerShell, VBScript or Batch files(?) or maybe just share something that you automated in the last couple of years.

You can read the invite he posted here.

History

As Hemanth.D mentioned in his invitation, this is not the first time this topic has come up for TSQLTuesday.  As it would happen, I also participated in the first go around with my contribution about sizing databases on limited information.  You can read that here.

This time around, I have a little bit of a different topic to approach.  I hadn’t considered this until after having read that Wayne Sheffield wrote about his efforts to verify backup files via script (automation).  You can read what Wayne wrote at this link.

Having read that, it seemed painfully obvious to me that I should go ahead and write about my efforts to automate backup restores.  After all, if you are verifying the backup files existence, you might also want to test the backups to ensure they are good.  Besides, we all need to test our backups anyway, right?

Present

I have a few different methods I have used over the years to automate restores.  In one iteration, the restore operations were hard coded in a procedure that was scheduled to run on a nightly or weekly basis.  It probably was also just hard coded to be specific to a database.  That kind of setup is not super useful except for that one database.

With that in mind, I worked on several iterations to help create a script for myself that would automate the restores of any database, with any number of backup files, to a different server, and not have file paths/names/anything hard-coded.  Well – there ended up being one thing hard-coded but that can be changed easily enough.

I decided on a script that would read the backup path for the most recent backup file from the msdb database of the server where the database was backed up.  I do this via a linked server that can be created/destroyed in the proc or that can reside permanently (I prefer to leave it in place).  Take the filepath of that backup file and restore it to the destination server.  All of this via tsql.

Now a caveat with this solution is that the backup path works best if it is a UNC path.  Even if you are backing up the database to the local server, backing up to UNC means that the restore operation can just grab that path and not encounter errors due to drive mappings (e.g. backed up to D but the D on the restore server is the cd-rom drive).

What if you don’t want to restore the source database with the same name to the new server?  Well, that has also been considered and a parameter can be passed to the stored procedure to allow for a new database name.  What if the default file paths are different?  That consideration has been made too!  All of that said, more testing is always welcome.

The script can be evaluated from here.

Next steps:

With the script, the next things to do would be to create SQL Agent jobs to run the script on a routine basis.  Test the script and verify it.

User of this script assumes all risk.

Day 12 – High CPU and Bloat in Distribution

This is the final installment in the 12 day series for SQL tidbits during this holiday season.

Previous articles in this mini-series on quick tidbits:

  1. SQL Sat LV announcement
  2. Burning Time
  3. Reviewing Peers
  4. Broken Broker
  5. Peer Identity
  6. Lost in Space
  7. Command ‘n Conquer
  8. Ring in The New
  9. Queries Going Boom
  10. Retention of XE Session Data in a Table
  11. Purging syspolicy

distribution

Working with replication quite a bit with some clients you might run across some particularly puzzling problems.  This story should shed some light on one particularly puzzling issue I have seen on more than one occasion.

In working with a multi-site replication and multi-package replication topology, the cpu was constantly running above 90% utilization and there seemed to be a general slowness even in Windows operations.

Digging into the server took some time to find what might have been causing the slowness and high CPU.  Doing an overall server health check helped point in a general direction.

Some clues from the general health check were as follows.

  1. distribution database over 20GB.  This may not have been a necessarily bad thing but the databases between all the publications weren’t that big.
  2. distribution cleanup job taking more than 5 minutes to complete.  Had the job been cleaning up records, this might not have been an indicator.  In this case, 0 records were cleaned up on each run.

The root cause seemed to be pointing to a replication mis-configuration.  The mis-configuration could have been anywhere from the distribution agent to an individual publication.  Generally, it seems that the real problem is more on a configuration of an individual publication more than any other setting.

When these conditions are met, it would be a good idea to check the publication properties for each publication.  Dive into the distribution database and try to find if any single publication is the root cause and potentially is retaining more replication commands than any other publication.  You can use sp_helppublication to check the publication settings for each publication.  You can check MSrepl_commands in the distribution database to find a correlation of commands retained to publication.

Once having checked all of this information, it’s time to put a fix in place.  It is also time to do a little research before actually applying this fix.  Why?  Well, because you will want to make sure this is an appropriate change for your environment.  For instance, you may not want to try this for a peer-to-peer topology.  In part because one of the settings can’t be changed in a peer-to-peer topology.  I leave that challenge to you to discover in a testing environment.

The settings that can help are as follows.

EXEC SP_CHANGEPUBLICATION 
    @publication = 'somepub', -- put your publication name here 
    @property = 'allow_anonymous', 
    @VALUE = 'false' 
GO 
 
EXEC SP_CHANGEPUBLICATION 
    @publication = 'somepub', -- put your publication name here 
    @property = 'immediate_sync', 
    @VALUE = 'false' 
GO

These settings can have a profound effect on the distribution retention, the cleanup process and your overall CPU consumption.  Please test and research before implementing these changes.

Besides the potential benefits just described, there are other benefits to changing these commands.  For instance, changing replication articles can become less burdensome by disabling these settings.  The disabling of these settings can help reduce the snapshot load and allow a single article to be snapped to the subscribers instead of the entire publication.

Day 11 – Purging syspolicy

This is the eleventh installment in the 12 day series for SQL tidbits during this holiday season.

Previous articles in this mini-series on quick tidbits:

  1. SQL Sat LV announcement
  2. Burning Time
  3. Reviewing Peers
  4. Broken Broker
  5. Peer Identity
  6. Lost in Space
  7. Command ‘n Conquer
  8. Ring in The New
  9. Queries Going Boom
  10. Retention of XE Session Data in a Table

Garbage-Dump

Did you know there is a default job in SQL Server that is created with the purpose of removing system health phantom records?  This job also helps keep the system tables ,that are related to policy based management, nice and trim if you have policy based management enabled.  This job could fail for one of a couple of reasons.  And when it fails it could be a little annoying.  This article is to discuss fixing one of the causes for this job to fail.

I want to discuss when the job will fail due to the job step related to the purging of the system health phantom records.  Having run into this on a few occasions, I found several proposed fixes, but only one really worked consistently.

The error that may be trapped is as follows:

A job step received an error at line 1 in a PowerShell script.
The corresponding line is ‘(Get-Item SQLSERVER:\SQLPolicy\SomeServer\DEFAULT).EraseSystemHealthPhantomRecords()’.
Correct the script and reschedule the job. The error information returned by PowerShell is:
‘SQL Server PowerShell provider error: Could not connect to ‘SomeServer\DEFAULT’.
[Failed to connect to server SomeServer. -->

The first proposed fix came from Microsoft at this link.  In the article it proposed the root cause of the problem being due to the servername not being correct.  Now that article is specifically for clusters, but I have seen this issue occur more frequently on non-clusters than on clusters.  Needless to say, the advice in that article has yet to work for me.

Another proposed solution I found was to try deleting the “\Default” in the agent job that read something like this.

(Get-Item SQLSERVER:\SQLPolicy\SomeServer\Default).EraseSystemHealthPhantomRecords()

Yet another wonderful proposal from the internet suggested using Set-ExecutionPolicy to change the execution policy to UNRESTRICTED.

Failed “fix” after failed “fix” is all I was finding.  Then it dawned on me.  I had several servers where this job did not fail.  I had plenty of examples of how the job should look.  Why not check those servers and see if something is different.  I found a difference and ever since I have been able to use the same fix on multiple occasions.

The server where the job was succeeding had this in the job step instead of the previously pasted code.

if (‘$(ESCAPE_SQUOTE(INST))’ -eq ‘MSSQLSERVER’) {$a = ‘\DEFAULT’} ELSE {$a = ”};
(Get-Item SQLSERVER:\SQLPolicy\$(ESCAPE_NONE(SRVR))$a).EraseSystemHealthPhantomRecords()

That, to my eyes, is a significant difference.  Changing the job step to use this version of the job step has been running successfully for me without error.

I probably should have referenced a different server instead of resorting to the internet in this case.  And that stands for many things – check a different server and see if there is a difference and see if you can get it to work on a different server.  I could have saved time and frustration by simply looking at local “resources” first.

If you have a failing syspolicy purge job, check to see if it is failing on the phantom record cleanup.  If it is, try this fix and help that job get back to dumping the garbage from your server.

Day 10 – Retention of XE Session Data in a Table

This is the tenth installment in the 12 day series for SQL tidbits during this holiday season.

Previous articles in this mini-series on quick tidbits:

  1. SQL Sat LV announcement
  2. Burning Time
  3. Reviewing Peers
  4. Broken Broker
  5. Peer Identity
  6. Lost in Space
  7. Command ‘n Conquer
  8. Ring in The New
  9. Queries Going Boom

food-storage

 

Gathering event information is a pretty good thing.  It can do wonders for helping to troubleshoot.  What do you do if you need or want to be able to review the captured information in 3 months or maybe 12 months from now?

Retaining the session data for later consumption is often a pretty essential piece of the puzzle.  There is more than one way to accomplish that goal.  I am going to share one method that may be more like a sledgehammer for some.  It does require that Management Data Warehouse be enabled prior to implementing.

When using MDW to gather and retain the session data, you create a data collector pertinent to the data being collected and retained.  In the following example, I have a data collector that is created to gather deadlock information from the system health session.  In this particular case, I query the XML in the ring buffer to get the data that I want.  Then I tell the collector to gather that info every 15 minutes.  The collection interval is one of those things that needs to be adjusted for each environment.  If you collect the info too often, you could end up with a duplication of data.  Too seldom and you could miss some event data.  It is important to understand the environment and adjust accordingly.

Here is that example.

BEGIN TRANSACTION 
BEGIN Try 
DECLARE @collection_set_id_1 INT 
DECLARE @collection_set_uid_2 UNIQUEIDENTIFIER 
EXEC [msdb].[dbo].[sp_syscollector_create_collection_set] @name=N'system_health_deadlock', @collection_mode=1, 
@description=N'system_health_deadlock', @logging_level=1, @days_until_expiration=14, 
@schedule_name=N'CollectorSchedule_Every_15min', @collection_set_id=@collection_set_id_1 OUTPUT, 
@collection_set_uid=@collection_set_uid_2 OUTPUT 
 
SELECT @collection_set_id_1, @collection_set_uid_2 
 
DECLARE @collector_type_uid_3 UNIQUEIDENTIFIER 
SELECT @collector_type_uid_3 = collector_type_uid FROM [msdb].[dbo].[syscollector_collector_types] 
WHERE name = N'Generic T-SQL Query Collector Type'; 
DECLARE @collection_item_id_4 INT 
EXEC [msdb].[dbo].[sp_syscollector_create_collection_item] 
@name=N'system_health_deadlock'
, @PARAMETERS=N'<ns:TSQLQueryCollector xmlns:ns="DataCollectorType"><Query><Value>
 
SELECT CAST(
                  REPLACE(
                        REPLACE(XEventData.XEvent.value(''(data/value)[1]'', ''varchar(max)''), 
                        '''', ''''),
                  '''','''')
            AS XML) AS DeadlockGraph
FROM
(SELECT CAST(target_data AS XML) AS TargetData
from sys.dm_xe_session_targets st
join sys.dm_xe_sessions s on s.address = st.event_session_address
where name = ''system_health'') AS Data
CROSS APPLY TargetData.nodes (''//RingBufferTarget/event'') AS XEventData (XEvent)
where XEventData.XEvent.value(''@name'', ''varchar(4000)'') = ''xml_deadlock_report'' 
 
</Value><OutputTable>system_health_deadlock</OutputTable></Query></ns:TSQLQueryCollector>', 
@collection_item_id=@collection_item_id_4 OUTPUT
, @frequency=900 --#seconds in collection interval
,@collection_set_id=@collection_set_id_1
, @collector_type_uid=@collector_type_uid_3 
 
SELECT @collection_item_id_4 
 
COMMIT TRANSACTION; 
END Try 
BEGIN Catch 
ROLLBACK TRANSACTION; 
DECLARE @ErrorMessage NVARCHAR(4000); 
DECLARE @ErrorSeverity INT; 
DECLARE @ErrorState INT; 
DECLARE @ErrorNumber INT; 
DECLARE @ErrorLine INT; 
DECLARE @ErrorProcedure NVARCHAR(200); 
SELECT @ErrorLine = ERROR_LINE(), 
@ErrorSeverity = ERROR_SEVERITY(), 
@ErrorState = ERROR_STATE(), 
@ErrorNumber = ERROR_NUMBER(), 
@ErrorMessage = ERROR_MESSAGE(), 
@ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-'); 
RAISERROR (14684, @ErrorSeverity, 1 , @ErrorNumber, @ErrorSeverity, @ErrorState, @ErrorProcedure, 
@ErrorLine, @ErrorMessage); 
 
END Catch; 
 
GO

Looking this over, there is quite a bit going on.  The keys are the following paramaters: @parameters and @interval.  The @parameters parameter stores the XML query to be used when querying the ring buffer (in this case).  It is important to note that the XML query in this case needs to ensure that the values node is capped to a max of varchar(4000) like shown in the following.

WHERE XEventData.XEvent.VALUE(''@name'', ''VARCHAR(4000)'')

With this data collector, I have trapped information and stored it for several months so I can compare notes at a later date.

Day 9 – Queries Going Boom

This is the ninth installment in the 12 day series for SQL tidbits during this holiday season.

Previous articles in this mini-series on quick tidbits:

  1. SQL Sat LV announcement
  2. Burning Time
  3. Reviewing Peers
  4. Broken Broker
  5. Peer Identity
  6. Lost in Space
  7. Command ‘n Conquer
  8. Ring in The New

Kaboom

Ever see an error like this??

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

Error 8623
Severity 16

That is a beautiful error.  The message is replete with information and gives you everything needed to fix the problem, right?  More than a handful of DBAs have been frustrated by this error.  It’s not just DBAs that this message seems to bother.  I have seen plenty of clients grumble about it too.

The obvious problem is that we have no real information as to what query caused the error to pop.  The frustrating part is that the error may not be a persistent or consistent issue.

Thanks to the super powers of XE (extended events), we can trap that information fairly easily now.  Bonus is that to trap that information, it is pretty lightweight as far as resource requirements go.

Without further ado, here is a quick XE session that could be setup to help trap this bugger.

CREATE EVENT SESSION
    overly_complex_queries
ON SERVER
ADD EVENT sqlserver.error_reported
(
    ACTION (sqlserver.sql_text, sqlserver.tsql_stack, sqlserver.database_id, sqlserver.username)
    WHERE ([severity] = 16
		AND [error_number] = 8623)
)
ADD TARGET package0.asynchronous_file_target
(SET filename = 'C:\Database\XE\overly_complex_queries.xel' ,
    metadatafile = 'C:\Database\XE\overly_complex_queries.xem',
    max_file_size = 10,
    max_rollover_files = 5)
WITH (MAX_DISPATCH_LATENCY = 5SECONDS)
GO
 
-- Start the session
ALTER EVENT SESSION overly_complex_queries
    ON SERVER STATE = START
GO

And now for the caveats.  This session will only work on SQL 2012.  The second caveat is that there are two file paths defined in this session that must be changed to match your naming and directory structure for the output files etc.

Should you try to create this session on SQL Server 2008 (or 2008 R2) instead of SQL Server 2012, you will get the following error.

Msg 25706, Level 16, State 8, Line 1
The event attribute or predicate source, “error_number”, could not be found.

Now that you have the session, you have a tool to explore and troubleshoot the nuisance “complex query” error we have all grown to love.  From here, the next step would be to explore the output.

Day 7 – Command ‘n Conquer

This is the seventh installment in the 12 day series for SQL tidbits during this holiday season.

Previous articles in this mini-series on quick tidbits:

  1. SQL Sat LV announcement
  2. Burning Time
  3. Reviewing Peers
  4. Broken Broker
  5. Peer Identity
  6. Lost in Space

As a DBA, we sometimes like to shortcut things.  Not shortcutting a process or something of importance.  The shortcuts are usually in the realm of trying to shortcut time, or shortcut the number of steps to perform a task or shortcutting by automating a process.

We seldom like to perform the same task over and over and over again.  Click here, click there, open a new query window, yadda yadda yadda.  When you have 100 or so servers to run the same script against – it could be quite tedious and boring.  When that script is a complete one-off, there probably isn’t much sense in automating it either.

To do something like I just described, there are a few different methods to get it done.  The method I like to use is via SQLCMD mode in SSMS.  Granted, if I were to use it against 100 servers, it would be a self documenting type of script.  I like to use it when setting up little things like replication.

How many times have you scripted a publication and the subscriptions?  How many times have you read the comments?  You will see that the script has instructions to run certain segments at the publisher and then other segments at the subscriber.  How many times have you handed that script to somebody else to run and they just run it on the one server?

Using SQLCMD mode and then adding a CONNECT command in the appropriate places could solve that problem.  The only thing to remember is to switch to SQLCMD mode in SSMS.  Oh and switching to SQLCMD mode is really easy.  The process to switch to SQLCMD mode is even documented.  You can read all about that here.

And there you have it, yet another simple little tidbit to take home and play with in your own little lab.

Day 6 – Lost in Space

This is the sixth installment in the 12 day series for SQL tidbits during this holiday season.

Previous articles in this mini-series on quick tidbits:

  1. SQL Sat LV announcement
  2. Burning Time
  3. Reviewing Peers
  4. Broken Broker
  5. Peer Identity

Credit: NASA/JPL

One of the more frequently recurring themes I see in my travel and work is the perpetual lack of space.

For instance, every time I fly there is inevitably a handful of people that have at least three carry-on items and at least one of those items is larger than the person trying to “carry” it on the plane.  Imagine watching these people trying to lift 100+ pound bags over their heads to put them into these small confined overhead storage compartments.  We are talking bags that are easily 2-3 times larger than the accepted dimensions, yet somehow this person made it on the plane with such a huge bag for such a tiny space.

Another favorite of mine is watching what appears to be a college student driving home in a GEO Metro.  A peek inside the vehicle might reveal 5 or 6 baskets of soiled laundry and linens.  A look at the vehicle as a whole might reveal a desert caravan’s worth of supplies packed onto the vehicle.  Watching the vehicle for a while you might notice that it can only lumber along at a top speed of 50 mph going downhill and good luck getting back up the hill.  It is just far too over-weighted and over-packed.  The vehicle obviously does not have enough room internally.

In both of these examples we have a limited amount of storage space.  In both of these examples we see people pushing the boundaries of those limitations.  Pushing those boundaries could lead to some unwanted consequences.  The GEO could break down leaving the college student stranded with dirty laundry.  The air-traveler just may have to check their dog or leave it home.

But what happens when people try to push the boundaries of storage with their databases?  The consequences can be far more dire than either of the examples just shared.  What if pushing those boundaries causes an outage and your database is servicing a hospital full of patient information (everything from diagnostics to current allergies – like being allergic to dogs on planes)?  The doctor needs to give the patient some medication immediately or the patient could die.  The doctor only has two choices and one of those could mean death the other could mean life.  All of this is recorded in the patient records but the doctor can’t access those records because the server is offline due to space issues.

Yeah that would pretty much suck.  But we see it all the time.  Maybe nothing as extreme as that case, but plenty of times I have seen business lose money, revenue, and sales because the database was offline due to space.  The company wants to just keep pushing those boundaries.

In one case, I had a client run themselves completely out of disk space.  They wouldn’t allocate anymore space so it was time to start looking to see what could be done to alleviate the issue and get the server back online.

In digging about, I found that this database had 1Tb of the 1.8TB allocated to a single table.  That table had a clustered index built on 6 columns.  The cool thing about this clustered index is that not a single query ever used that particular combination.  Even better was that the database was seldom queried.  I did a little bit of digging and found that there really was a much better clustered index for the table in question.  Changing to the new clustered index reduced the table size by 300GB.  That is a huge chunk of waste.

Through similar exercises throughout the largest tables in the database, I was able to reduce index space waste by 800GB.  Disk is cheap until you can’t have anymore.  There is nothing wrong with being sensible about how we use the space we have been granted.

Thinking about that waste, I was reminded of a great resource that Paul Randal has shared.  You can find a script he wrote, to explore this kind of waste, from this link.  You can even read a bit of background on the topic from this link.

Day 5 – Peer Identity

This is the fifth installment in the 12 day series for SQL tidbits during this holiday season.

Previous articles in this mini-series on quick tidbits:

  1. SQL Sat LV announcement
  2. Burning Time
  3. Reviewing Peers
  4. Broken Broker

identityIn the digital age it seems we are constantly flooded with articles about identity crises.  From identity theft to mistaken identity.  SQL server is not immune to these types of problems and stories.  Whether SQL Server was housing the data that was stolen ,leading to identity theft, or if SQL Server is having an identity management issue of its own – SQL Server is definitely susceptible to the identity issues.

The beauty of SQL Server is that these identity issues seem to be most prevalent when trying to replicate data.  Better yet is when the replication multiple peers setup in a Peer-to-Peer topology.

When these Identity problems start to crop up there are a number of things that can be done to try and resolve them.  One can try to manually manage the identity ranges or one can flip the “Not for Replication” attribute on the table as two possible solutions.

The identity crisis in replication gets more fun when there are triggers involved.  The triggers can insert into a table that is not replicated or can insert into a table that is replicated.  Or even better is when the trigger inserts back into the same table it was created on.  I also particularly like the case when the identity range is manually managed but the application decides to reseed the identity values (yeah that is fun).

In one particular peer-to-peer topology I had to resort to a multitude of fixes depending on the article involved.  In one case we flipped the “Not for Replication” flag because the tables acted on via trigger were not involved in replication.  In another we disabled a trigger because we determined the logic it was performing was best handled in the application (it was inserting a record back into the table the trigger was built on).  And there was that case were the table was being reseeded by the application.

In the case of the table being reseeded we threw out a few possible solutions but in the end we felt the best practice for us would be to extend the schema and extend the primary key.  Looking back on it, this is something that I would suggest as a first option in most cases because it makes a lot of sense.

In our case, extending the schema and PK meant adding a new field to the PK and assigning a default value to that field.  We chose for the default value to be @@ServerName.  This gave us a quick location identifier for each location and offered us a quick replication check to ensure records were getting between all of the sites (besides relying on replication monitor).

When SQL Server starts throwing a tantrum about identities, keep in mind you have options.  It’s all about finding a few possible solutions or mix of solutions and proposing those solutions and then testing and implementing them.

One of the possible errors you will see during one of these tantrums is as follows.

Explicit value must be specified for identity column in table ‘blah’ either when IDENTITY_INSERT is set to ON or when a replication user is inserting into a NOT FOR REPLICATION identity column.

Day 4 – Broken Broker

This is the fourth installment in the 12 day series for SQL tidbits during this holiday season.

Previous articles in this mini-series on quick tidbits:

  1. SQL Sat LV announcement
  2. Burning Time
  3. Reviewing Peers

mini-Broker

Brokers

 

 

 

 

On a recent opportunity to restore a database for a client, I experienced something new.  

I thought it was intriguing and it immediately prompted some questions.  First, let’s take a look at the message that popped up during the restore and then on to what was done to resolve the problem.

 

Query notification delivery could not send message on dialog ‘{someguid}.’. Delivery failed for notification ‘anotherguid;andanotherguid‘ because of the following error in service broker: ‘The conversation handle “someguid″ is not found.’

My initial reaction was “Is Service Broker enabled?”  The task should have been a relatively easy straight forward database restore and then to setup replication after that.  My next question that popped up was “Is SB necessary?”

Well the answers that came back were “Yes” and “YES!!!”  Apparently without SB, the application would break in epic fashion.  That is certainly not something that I want to do.  There are enough broke brokers and broke applications without me adding to the list.

Occasionally when this problem arises it means that the Service Broker needs a “reset.”  And in this case it makes a lot of sense.  I had just restored the database and there would be conversations that were no longer valid.  Those should be ended and the service broker “reset.”

The “reset” is rather simple.  First a word of warning – do not run this on your production instance or any instance without an understanding that you are resetting SB and it could be that conversations get hosed.

ALTER DATABASE  <dbname> SET NEW_BROKER WITH ROLLBACK IMMEDIATE

For me, this worked like a charm.  There was also substantial reason to proceed with it.  If you encounter this message, this is something you may want to research and determine if it is an appropriate thing to do.

My Hack…err…Rube Goldberg Machine

Categories: News, Professional, SSC
Comments: No Comments
Published on: September 10, 2013

TSQL2sDay150x150My deadline is fast approaching.  It seems these days that editors have no patience.  On top of that there is this ever-shrinking deadline problem and the constantly shrinking lead-time to publication issue.

Oh wait, it’s only TSQL Tuesday.  But the end of TSQL Tuesday is fast approaching, and I am cutting it close once again.

This month we have been invited to this party by Rick Krueger (blog twitter), and he wants us to talk about our hacks, kludges and wedgies.  We are calling them Rube Goldberg Machines for this month.  And a Rube Goldberg could be many different things in the SQL Server world.  Heck, it could be many different things in the IT world.

If you are unfamiliar with these types of devices, kludges, widgets, here is a good example from the folks at Mythbusters.

Super villain Monologue

My first inclination was to talk about a solution involving the use of SOAP calls in SSIS to perform batch credit card authorizations and then to do inventory control and process shipment orders.  That was a fun project.  It was also extremely successful compared to the previous solution.  We went from a nightly 12 hour process to finishing in 1-2 hours for twice the orders.

I then thought of a couple of solutions that involved the backup of a database on one server and then to restore the database on another server.  Minimal moving parts there as it only involved the backup, restore, and then to re-assign permissions (e.g. prod backup to QA environment, the permissions should probably be different).

Ka-POW

And then it hit me.  There is a pretty cool solution that involved multiple servers, multiple moving parts, and was somewhat useful.

Imagine having three servers that each perform a different function.  In our case, we will call them Server1, Server2, and Server3 – all of them are SQL Server.  Server1 is the primary production server.  Server2 is a mirror of Server1.  And Server3 is the DR server that is a secondary in logshipping.

Due to policy or previously established procedure, mirroring is stopped (maybe the deployment requires massive data changes and it causes issues with latency to the mirror.  Maybe the network link between servers is broken and you want to prevent excessive log growth – it’s up to you to find the excuse).  You already have a process for re-establishing mirroring and logshipping (yeah that got stopped too), but you want a 1-click approach.

Without going into extensive details, what worked in my scenario was to create linked servers between each of the three servers.  Then I could go and create a job relevant to each server (such as the steps listed below).  Each job would then execute a job step and then start the next job in the sequence via the linked server – if the job step was successful.  If it failed then it would fire off an alert to notify the appropriate people of where the failure occurred.

Here is a sample of the steps that could be executed.

1. stop logshipping on server1
2. backup db on server1
3. restore backup to server2
4. start mirror on server2
5. start mirror on server1
6. restore logs to server3
7. start logshipping on server1

Depending on the requirements, you may want to try something like this or do something completely different.  Your mileage may vary.  The point is, we have a little mouse trap here that triggers another mouse trap and then another until the task is complete.  Hopefully the setup will require minimal intervention, if any (that is a requirement of a Rube Goldberg machine – you start it but don’t touch it after that).

There you have it.  Nothing big and fancy. Just a short and sweet description of a possible avenue for some of those multi-server tasks you may have.

«page 1 of 3




Calendar
April 2014
M T W T F S S
« Mar    
 123456
78910111213
14151617181920
21222324252627
282930  
Content
SQLHelp

SQLHelp


Welcome , today is Thursday, April 24, 2014