File Maintenance – Cleaning Up Old Files

Comments: 1 Comment
Published on: December 21, 2018

Using SSIS to Maintain the File System

We have all run into a need or a desire to clean up old stale files from the file system, whether it be to remove old backup files or flat files that are created from one process or another.  And based on this need/desire, we have all come up with a method to help with achieve that goal.

Some of the methods might be to include a flag in a maintenance plan that may be used.  Other methods may be to use a SQL script employing xp_cmdshell and delete statements.  Yet another may utilize the sp_oa stored procs and DMO.  And still others may have ventured into powershell to accomplish the same task.  The point is, there are many methods.

I am adding yet another method to the mix.  Why?  I didn’t much like the option of using the sp_oa method or the xp_cmdshell route.  I am very novice with powershell and it would take a lot more tinkering to get the script working properly.  Also, I felt pretty comfortable with SSIS and had approval to try and get this done using that method.  And just because I am a novice with powershell, does not mean that I will not circle back around to try and accomplish this task via that means.

Note: This article was originally written in 2011 and got stuck in an unpublished state. Things have changed since then so i will definitely be circling back around for a powershell version.

Requirements

The method employed needs to be able to do the following:

  1. Remove multiple file types
  2. Be configurable
  3. Clean out files from numerous directories
  4. Remove files older than a specified number of days.

Setup

The solution I chose utilizes SSIS.  It also requires that there be a table in a database that helps to drive the package.

The table looks like the following.

The filepath column holds the FileSystem Path for each directory that needs to be cleaned.  Paths that are supported are local (e.g. C:\temp ) and unc paths (\\machine\c$\temp).  I set this attribute to a length of 256, but if you have a longer path, you will want to adjust the length.

The Process column will hold a value describing what that path relates to, such as MaintainDirectory.  In my example, I am using MaintainDirectory to control which directories hold files that potentially need to be deleted.

Here is an example of the contents of that table I am using currently.

The last piece of the setup before we start working on the SSIS package is the need for a string splitting function.  Pick the string splitter of your liking.  I have one that I like and am sure you have one that you prefer.  The SSIS package relies on the return field from the splitter being named “Item.”  If it is named something else, please make the adjustments in the package as necessary.

The Package

The package I created has been created in SSIS 2008.  To meet the requirements already set forth, I utilized the following objects: ADO.Net Data Source, 2 Execute SQL Tasks, 2 ForEach Loop Containers, a Script Task, and 8 variables.  Let’s take a look at these starting with the variables.

Variables

  • SQLServerName – The value held here is used in an Expression for the Data Source.  This will overwrite the ServerName value in the Data Source.
  • DatabaseName – Used alongside the SQLServerName variable in an Expression for the Data Source.  This value will overwrite the InitialCatalog value in the Data Source.  This should be the name of the database where the FilePaths table and String Split function exist.
  • DaysToKeep – This value is the cutoff point for which files to keep and which files will be deleted.  This variable is used as a ReadOnly variable in the Script Task.
  • obj_FileExtension – This object variable is used to store the result set from one of the Execute SQL tasks and the results of the string split function from the FileExtensionList variable.
  • FileExtensionList – This is a delimited list of file extensions that need to be evaluated for deletion.  It is important to note that the file extensions that are to be processed are case sensitive.  The extension must appear in this list as it appears in the file system.
  • FileExtension – to be used in one of the ForEach loops.  This variable will receive the FileExtension from the obj_FileExtension variable one at a time.
  • obj_ListOfDirectories – This variable will receive the result set of an Execute SQL Task to be later consumed by one of the ForEach loops.
  • DirectoryToMaintain – receives one at a time the Directory to process for file deletion.  The ForEach loop stores a value from obj_ListOfDirectories in this variable for processing.

Execute SQL Tasks

The two Execute SQL Tasks are simple in function.  One is to get the list of directories to maintain from the FilePaths table.  The other is strictly to split the string for the FileExtensionList variable.

The first is named “Get Directory List” and should receive the Full Result Set from the following query.

The Result Set tab of this task also needs to be modified.  it should look like this.

From this task, we flow to the next Execute SQL Task named “Split FileList.”  The setup of this task is very much like the previous task.  We want to receive the full result set.  We have a configuration to make on the result set tab.  We also need to map a parameter.  Let’s take a look at those real quick.

Parameter Mapping

Result Set

And this is the query that we will be executing.

Notice that the Parameter we named in the Parameter Mapping tab is being used in the function call.  I chose this method because I could see and understand how it works better.

ForEach Loops

The next stop in the flow is the ForEach Loop – Directory object.  As the name implies, this ForEach Loop is designed to work with the obj_ListOfDirectories variable/array.

With this first Loop container, we have two tabs that need to be configured in the properties.  Both Loop containers are similar in that they need the same tabs to be configured.  First, let’s talk about the Collection tab.

On the Collection tab, we need to set the Enumerator option to “ForEach ADO Enumerator.”  Then we need to select the obj_ListOfDirectories from the drop down labeled “ADO Source Object Variable.”  Your screen should look like the following image.

With this tab configured, we can focus our attention to the quick changes that need to be made on the Variable Mappings tab.  On this tab, we are telling the enumerator how to handle the data from the object variable.  We are mapping columns from the result set to variables for further consumption.  When configured, it should look like the following.

Inside of this ForEach loop container, we have another ForEach loop container.  This second ForEach loop container handles the file extensions that we listed out in delimited fashion in the FileExtensionList variable.  I have called this container “ForEach Loop – FileExtension” (just keeping it simple).

The collection tab follows the same configuration setup.  The difference of course being that this container will use the obj_FileExtension object from the source variable dropdown menu.

The variable mapping tab is also slightly different.  We will be mapping column 0 of the object to the FileExtension variable.  The explanation for the different number between the two loop container variable mappings is simple.  In obj_ListOfDirectories, we have multiple columns being returned.  In obj_FileExtension, we have but one single column being returned.

This inner Loop container will loop through each of the extensions for each of the directories that have been returned to the outer loop container.  the inner loop container has the remainder of the workload in it via the Script Task.

Script Task

It is via the script task that we actually get to start deleting files.  This was the most difficult piece of the entire package – though the script is not very large.

For the script task, I chose to implement it via the Visual Basic option (instead of C#).  I have three ReadOnlyVariables employed by the script.  Those variables are: User::DaysToKeep,User::DirectoryToMaintain, and User::FileExtension.

Once you have set those on the script tab, the next step is to click the Edit Script… button where we need to place the following script.

An important note of interest is the need for the Try…Catch.  Without this block as it is, you could run into an issue where the file (such as those pesky temp files) may be in use by some process and cause the package to error.  The Try…catch will move past that nasty error and delete the files that it can.

Inside this script, you will see that I am comparing the LastWriteTime to the PurgeDays and ensuring that the file extension matches one that is in the list.  Then we move into the try…catch and either delete the file that matches those criteria or throw an exception and move on to the next file.

When all is said and done, your package should look something like this.

You should also have a variable list that looks like this.

Each variable that is not an Object has a value assigned to it at this point.  These values will be overwritten where applicable.

Next Steps

Having this package is a good start.  But unless you are prepared to manually run this on a daily basis, it needs to be added to a job and scheduled.  There are two ways to go about scheduling this package.

The first option is to configure the FileExtensionList and DaysToKeep variables and save the package with those values.  Then run this package through SQL Agent with those values every time.  The drawback to this method is that if you need to add or remove a file extension (as an example) then you need to edit the package and re-save it.

The alternative option is pass the values through the job to overwrite those variables as the job runs.  Should you need to remove or add a file extension, it would just be done at the job definition level.

Let’s take a look at this second option.  I will skip past how to create the job as an SSIS job in SQL Server and we will look directly how to modify those variables from the job properties.

To configure these variables directly from the SQL Agent job, open the Job properties and click on the Set Values tab (assuming you have defined this job as an SSIS Job type).  You should get a screen similar to this (void of the set values shown in the pic).  Just add the parameters (variables) we have discussed to this point with appropriate values to fit your needs/environment.

I have chosen to only include the four variables shown above since the remaining variables are either objects or get overwritten in the ForEach loops during processing.  The only thing remaining now is to set the schedule for the job.  Once set, the job (and package) will take care of the rest.

Conclusion

I have now shown you how to maintain some of the directories on your system through the use of SSIS and SQL server.  There are many methods to accomplish this goal, it is up to each of us to choose the best method for our environment and comfort level (by means of supporting the chosen solution).

If you would like to read more interesting stuff concerning SSIS, you might want to check any of these articles: lost password, expected range errors, and synonyms extending SSIS.

SSIS: Value does not Fall Within the Expected Range

Categories: News, Professional, SSC
Comments: 1 Comment
Published on: July 17, 2014

Every now and again I find myself working with SSIS for one reason or another.  Every now and again I find myself fighting with SSIS because it provides less than useful error messages.  This is one of those error messages that can be a pain until you understand what can be causing it.  Albeit, that doesn’t help resolve the problem with the error message or with its usefulness or the pain that it can/will cause you.

execute sql taskThe error message “Value does not Fall Within the Expected Range” is tied to the execute sql task that can be placed in the control flow.

Let’s assume we have such a task in our control flow as demonstrated by the attached image.  We’ll call the Execute SQL Task “Parse Client Code.”

Inside of this task, we will probably have something like what is seen in the next image.

parametermapping

Here we have attached parameters to the task and even assigned those parameters to “variable” names that we might use within the query.  In this case, we are just trying to parse a code from a filename that can be used in downstream processing.  The code might look like the following.

[codesyntax lang=”tsql”]

[/codesyntax]

If I run that task at this point, I will get an error.  The error could be for any number of reasons based on the setup that I just showed.  The most common is that the Parameter Name is not really a name but really should be an ordinal position as to when the parameter is used in the script.  The ordinal position is 0 based.

The second issue is the data type that has been selected in error.  This should be a name and not a guid.  This means I should change the data type to the varchar type from the drop down that is available on the parameter screen under data type.

The next issues is the use of the variable name in the script itself.  One should use a ? instead of variable names.  So, this script should be fixed to look like the following.

[codesyntax lang=”tsql”]

[/codesyntax]

And the parameter screen should be made to look like the following.

paramfixed

These quick fixes can eliminate or even prevent what might possibly be a headache when dealing with SSIS.

Now, what if you need to have more than one parameter for your query?  No problem.  The same principles apply.  Just map your parameters according to proper data type and to the ordinal position that the parameter needs to be used within the query and you should be all set.

Using Synonyms to Extend SSIS

Comments: No Comments
Published on: July 3, 2014

There are a million and one uses for synonyms.  There are at least that many uses for SSIS.  The reality is, not all of those uses are good for you nor for your data nor for your database.

Recently I wrote an article about some good and some bad with synonyms.  You can read that article from my work blog site here.  Today, I just want to expand on that article a little bit.  I glossed over some things pretty quick in that article that I though might be fun to explain with a little more detail.

The Bad – a little closer look

First, let’s take a look a little closer at the bad.  More specifically, in the example I used, there was a vendor that created a synonym for everything.  I really do mean everything.  This is one case where using the word “literally” would be accurate.  On the client database, I could run a quick query such as the following and see over 7300 synonyms.

[codesyntax lang=”tsql”]

[/codesyntax]

In the case of this client and this vendor, 7300+ synonyms is far too many.  It just led to mass confusion.  If you were to run that query, you might see something like the following image.

massSynonyms

I added a “derived” column to show the total count of synonyms and the record name as it relates to that total.  That is a crazy amount of synonyms.  That just makes me think that somebody got into the krazy kool-aid one day, was bored and gave life to a synonym beast.

The Good – With more detail

On the flip side, in the aforementioned article, I talked about synonyms as a means to tweak performance in SSIS.  Normally I would not tout a synonym as a performance tweak.  So I want to caution that the performance gains are specific to SSIS and a handful of those millions of uses for SSIS.

Let’s just begin with a little bit of background.  For that background, some pictures will be really handy.  So here goes.

SpiceSource

In the preceding image we see a very simple segment of a data flow.

The data source uses a sql command to fetch the data required for the data flow.  In the beginning, it is very straight forward.  You probably have some package lying around with something similar to this.

In the following image, we see what the SQL Command was for that data source circled in red in the previous image.

SQLWithout

In the next image we see a slight tweak to the query.  This time to include a reference to a table that is defined/obfuscated by a synonym.

SQLWith

At this point I can hear some of you saying, “Ok, I see what he is doing.”  While many others are wondering why I just made the query more complex than the previous example.

Well as luck would have it, this change serves a couple of purposes.  1) The data has been staged in a separate database.  That database has a different name in every environment (recall the aforementioned article).  So the synonym minimizes code changes when deploying the package.  2) The synonym allows us to confirm that there is data in the stage table and that the data matches a ClientCode in the destination table.  3) Lastly, the synonym reduces my dataset which reduces memory requirements and also gets the data loaded faster (because it is smaller).

In addition to this minor tweak, I can also do something like the following.

WithoutSynonym

In the preceding image, we see two datasources.  Each datasource is subsequently sorted and then eventually joined.  Much like the previous example, due to naming standards and an effort to try and minimize code changes during deployments, at least one datasource is pulling in too much data.  The data is filtered down due to the Join transformation, but this is not very efficient.

WithSynonym

Through the use of a synonym, the datasources can be reduced to a single datasource.  This will eliminate the need for the Sort transformations and Join transformation.  Removing those three transformations reduced memory requirements.  And like the previous example, since we can trim down the number of records, the data flow will run a little bit faster too.

BigSQLWith

As You can see, the code is simple.  It’s not a super TSQL trick or anything to add a synonym into an existing query.  It just gets referenced like any other table.  Once again, in this case, the synonym is pointing to a table in a staging database.  That table has been loaded as a part of an ETL process and now needs to be manipulated a little bit through some other transformations and then inserted eventually into a “target” database.

Conclusion

As with tuning stored procedures or any TSQL, a similar technique was used here.  Reducing the datasets to contain JUST the data that is needed for the flow.  To facilitate that reduction in data to be just the essential data, I employed synonyms.

The reasons for using a synonym in this case were to: 1) restrict data to precisely what was needed, 2) ensure data being loaded was “constrained” by data in the destination table (e.g. only load for a specific client that does exist), and 3) minimize code changes during deployments.

When dealing with databases that serve the same purpose but follow some absurd naming standard that changes the name between environments, it can become cumbersome to maintain code during deployments.  This is particularly true when dealing with cross database joins or lookups.

Agent Jobs Using SSIS

Categories: News, Professional, Scripts, SSC, SSSOLV
Comments: No Comments
Published on: March 15, 2012

This is a short script to help the DBA with documentation purposes.  This would come in handy especially in those cases where you are consulting or you have taken on a new job.

Suppose you want/need to find out what SQL Agent jobs are running SSIS packages.  This script will help to quickly identify those jobs.

[codesyntax lang=”tsql”]

[/codesyntax]

SSRS Export part 2 (Export Data Source)

Categories: News, Professional, SSC
Tags: , ,
Comments: No Comments
Published on: November 29, 2011

Back in August, I published a post on exporting SSRS report files en masse.  That article (which can be read here) detailed an SSIS package I created to export the files.  As it is published, it is only good for exporting the actual report files and not the data sources.

I knew of this short coming for some time and updated my SSIS package shortly after with the expectation of writing an update to that article.  Well, time went by a little too quickly.  Now it has been almost four months and I am finally getting to that update.  I am doing this all while also working out a TSQL only solution to do the same thing.  I hope to have that worked out soon with how to do it being published shortly after.

So, in keeping with the same format as the previous article, let’s start by detailing out the variables that have been added.

FileExportExtension – As the variable name should imply, this is the extension of the xml file that is to be created.  RDL would be for the report file, and RDS would be the data source (as examples).

Then inside the script task we will find the next change to be made.  The new variable we created will need to be added to the readonly variable list as shown.

So far so good.  The changes are simple and straight forward.

The next change is to the script.  Let’s click the edit script button and we can change the Main with the following.

[codesyntax lang=”vbnet”]

[/codesyntax]

Looking at this code, you will see once again that variable that we added popping up.

One key to this working effectively is the use of the ReportSourcePath variable.  An appropriate path must be specified that contains Data Sources in the Catalog table.  An example would be /Data Sources/.  Some environments may have a subfolder after the data sources.  Just make sure that the path specified leads to the data sources you want to export.

I had also considered altering the “Execute SQL Task” that starts the flow to this package.  The script there could be altered such that another variable may be added to designate report part type.

[codesyntax lang=”tsql”]

[/codesyntax]

The change would add another variable into this query in the where clause.  Change the type from in to an equality.  Add a variable that would designate the different types listed in the case statement – and it becomes pretty straight forward.  This change would allow more flexibility.  I will update at a later time with the changes I have made to the package to permit that.  But for now, I felt it more of a bonus addition and didn’t need it quite yet.  (Necessity drives functionality, right?)

If you make these suggested changes, you will have more flexibility in being able to export the various files related to reporting.  If you have played with Report Manager, you will know that there is no way to export an RDS file.  Now, you have a means to export the xml into a usable file that can be imported to another server – if you need it.

Check back in the future for that update to do this using TSQL as well as for the update to provide more flexibility to the package.

 

 

SSRS Export En Masse

Tags: , , ,
Comments: 5 Comments
Published on: August 8, 2011

Have you ever found yourself in need of exporting all of the reports from a Report Server?  I have found myself in need of extracting all RDLs from a Report Server on more than one occasion.  Reporting Services 2008 would rather have you do that in a rather monotonous and repetitive style.  That doesn’t work very well for me – and especially not if you have a ton of reports to export.

I combed the internet to see if there was a quick to implement way to do this – and found some recommendations.  Most of these  required the installation of another utility or the purchase of other software.  I found one site however that was extremely useful for what I wanted.  Brett Stateham used some queries to display the XML for these reports that is stored in the Catalog for Report Server.  You can read about that here.

Reading his blog post on the topic, I got the distinct idea that I could use a query to extract the xml and save it to an RDL file.  To verify this, I copied the XML (as returned by the queries on Bretts page) to a notepad file and save the file with an RDL extension.  Then I added that file back into a solution in Visual Studio and ran the report – sweet.  It worked.

Now that I know it can be done as simply as he displayed, it was time to take it a step further.  I now need to create something that will help me export numerous reports to their own individual files.  Hmmm, what could I use?  Having done something similar in the past for a single xml file in SSIS, I decided I would just use SSIS.  Thinking it through, I figured this should be pretty simple in appearance (a file destination object, an execute sql task, a data source, a file source, a foreach loop – you get the idea).

As I started to work through the solution, I found that I was over thinking it a bit and the solution could possibly be easier.  That made me wonder if a script task would be helpful here.  I decided to research and see if there was a quick way to write the xml to a file via a script task.  Guess what, there is.  Using VB .Net, there are simple methods inside a script task to write to the filesystem.

So, without further ado, let’s take a look at the package.

First, let’s create a few variables for use throughout the package.

  • objReport – This is of type object and will be used to store the results of our TSQL statement from an Execute SQL Task.
  • ReportExportPath – Type is String and will be the destination file path.  The last character for this variable should be a “\”
  • ReportName – Type is String and is for use by the ForEach Loop.  This variable will receive the report name from each iteration through the objReport and then be used by the Script Object to later give a name to the output RDL file.
  • ReportSourcePath – Type is String.  This variable is to help extract only the reports from a specific location.  In many cases, reports are stored in different folder paths and you may not need all folders’ contents.
  • ReportXML – Type is String.  Similar in function to ReportName
  • DBName – Type is String.  This value will be used to override the Initial Catalog of the Connection String
  • ServerName – Type is String.  This value will be used to override the Server Name of the Connection String.

Next, we need to create a connection object to create a data source for this package.  I created an ADO.Net connection object and named it ReportSourceDB.  Once created, use expressions from the properties tab to overwrite the Initial Catalog and Server Name values with the already mentioned variables.

Now that we have that stuff out of the way, we can begin work on creating the data flow.

Above is displayed the entire data flow for this package.  I adapted, albeit ever so slightly, the script we discussed at the beginning to the following for use in this package.

This script is in the ExecuteSQL task.  The task is configured to retrieve the full result set and store it in objReport.  A parameter is specified and passed to the above query in the first CTE.  The following images should help clarify the configuration a bit.

General Tab:

Parameter Mapping Tab:

Result Set Tab:

So far so good.  This is now getting all of the data that we need from the ReportServer database.  We can now pass this information to the ForEach Loop container and write the XML out to RDL files.  The configuration of the ForEach Loop is not too complicated and looks like this:

Collection Tab:

Variable Mappings Tab:

Inside of this ForEach Loop container we have that script task that was displayed.  This was probably the trickiest (yet extremely simple) part of the whole exercise for myself.  I have used script tasks in the past for various tasks and was worried this might be a bit more complicated.  So let’s start with a couple of pictures.

There is no need to use a ReadWrite variable in this task.  This task simply reads the variable and then writes the contents of that variable out to a file.  Note that the script language is set to VB 2008.  The default (for me) was C# – and I changed that.  Once the variables are specified for ReadOnlyVariables, click the Edit Script button at the bottom of the screen.

In the new window that opens, highlight everything and overwrite it all with the following.

The line that is important here is the line containing My.Computer.FileSystem inside Public Sub Main().  Note how I am using the variables we have created to this point to create the file name (first parameter series inside WriteAllText()) and also to write the contents of the file based on the ReportXML variable (that is the second parameter inside WriteAllText()).

From here, it is simply a matter of testing the package.  Run the package and check the directory you specified in the ReportExportPath variable.  This little project will save me an enormous amount of time in just exporting the reports to files from the database.  Yay, no more repetitive download task done through report manager.

Enjoy, and put it to good use.

SSIS Job Ownership

Comments: 2 Comments
Published on: April 11, 2011

I was strolling along one day when I saw somebody asking how to find out who owns a maintenance plan.  That evolved into finding out who owns the the job associated with the maintenance plan.  All of this in SQL 2005 and SQL 2008.

Well, we were stumped for a bit trying to figure the link between the job tables in the msdb database and the ssis table in the same database.  Linking the two together is not very obvious and we struggled with it for a bit.  After some research and trying this that and the other, I was able to come up with the below script.

[codesyntax lang=”tsql”]

[/codesyntax]

This script is set to work out of the gate with SQL 2008.  Should you want it to work with SQL 2005 the change is simple.  Change the sysssispackages table to sysdtspackages90.  As you can see, the query joins the SSIS table to the jobsteps table with a pretty nasty string extraction.  There are other ways of extracting this information (I’m sure of it).  This works quite well for what it is intended.

Using this script, you can find out the jobowner, the packageowner, and the packagetype.  This is pretty good information to have on hand if you have several ssis packages that are stored in msdb and are run from a job.  One thing this script does not yet handle is if the SSIS file is stored on the file system.  Note that I only coded it so far to work with files stored in SQL.  When looking in the jobsteps table, you can tell the difference quickly by seeing that those stored in msdb have a /SQL at the beginning of the command string.  Those in the filesystem have a /FILE en lieu of that /SQL.

In a future revision I will work on parsing the package name out of that string that represents those stored in the file system.  And despite that nasty join, this runs quickly on my systems.  I am open to suggestions or other solutions that can provide this kind of insight.

SSIS Multiple File Export

Tags:
Comments: No Comments
Published on: December 28, 2010

Have you ever needed to export different data sets to different flat files?  Each of these data sets could be customer information for different clients – but they all require the same fields.  I have run into this requirement a few times and have found that it is not as hard as it may sound.  To evolve that requirement, I have also had the requirement to export multiple different files for each of the clients.  For the sake of this post though, I will just stick to how to do the first level.  (It is only a little more complex but not much different to add the extra requirement.)

SSIS to Save the Day

SSIS has a looping mechanism built in to help achieve this.

If you want to learn more about this particular data flow object, here is the MSDN resource on it.

When trying to create an SSIS package solution that meets our requirements, the first step (in my opinion) is to make sure you have an Object variable created for use by the Foreach Loop Container.

To use this variable in the foreach loop container, we first need to load data into it.  This is accomplished by using an Execute SQL Task.

Note:  There is a red-x in this image simply because I have not defined any properties for this object.

The Execute SQL Task needs to be given a data source and a SQL command to run.  The result set of this task should be set to full result-set on the general page.  On the result-set page, set the result-set to be stored in the object variable that we created already.  On the same page, you should set the result name to a value of “0” (zero without the quotes).

Now we can start looking more at the Foreach Loop Container.  When we open the properties of the Foreach Loop Container, we want to go to the Collection page and adjust two settings.  The first is to select the appropriate type from the dropdown.

In this case, we will select ForEach ADO Enumerator.  The next change to be made is the next drop down labeled “ADO Object source variable:”.  In this box we will select the object variable that we already created.

Now we need to configure the Variable Mappings page.  For this you can either create a new variable from this page, or use variables you have already defined.  Let’s Create some variables from here.

Once that is established we can work on configuring the Data Flow task.  For the sake of the example, we can do something simple.  It is a good idea to make sure you have your connection managers created by this point.  If they aren’t there – then let’s create some.  First, let’s create an OLEDB connection.  Second, we need to create a Flat File Connection Manager.  I won’t cover how to create those connection objects.  I will cover how to make a configuration change to the Flat File Connection Manager though.  To make sure the files will be written differently – we should use an expression on this connection manager.

In your flat file connection manager, you will need to look at the properties tab in order to create an expression.  In the Property Expressions Builder, select the ConnectionString Property from the drop down.  In the expression side build something like this:

[codesyntax lang=”sql”]

[/codesyntax]

Now back to that Data Flow task I mentioned already.  In your Data Flow Task, select the appropriate OLEDB Connection manager that you have created.  Then select SQL Command in the data access mode dropdown menu.  In the SQL Command text box, type an appropriate query to retrieve your data.

Example:

[codesyntax lang=”tsql”]

[/codesyntax]

Then click on the Parameters button.  From here, you will assign the appropriate variable to the Parameter that was created in the query.  In our example, we should use the variable [ User::TestSub ].

That pretty much covers it.  In a follow-up I will go through an actual setup that I have created.  More screenshots, and slightly different setup than this baseline.  Hope you enjoy.

SSIS Job Execution

Tags: , ,
Comments: No Comments
Published on: December 13, 2010

While working on a process to refresh the QA environment (data purge, reload data and reapply changes made over time as parts of release cycles), I ran into self-imposed requirement.  I was attempting to create a process that would perform the entire process consistently.  Amongst the items that I needed to include in this process to “refresh” the QA environment was the need to run a couple of SSIS packages.  SSIS was not installed on this particular server where I was creating this process.  I needed to have a way of executing two SSIS packages without installing SSIS on the local server.

With that need, I decided to test a remote execution of a job that had SSIS steps.  The method I decided to use was to employ OSQL.  Here is what I did for that.

[codesyntax lang=”tsql”]

[/codesyntax]

I added that line of code to a SQL Agent Job step.  The job step type is Operating System (CMDExec).

This method works well for me since I had already tested the package on the remote server and knew that it worked.  There are several other methods as well, this happened to be implemented more quickly for me at this time.

A little Dance with SSIS and Informix

Tags: , ,
Comments: No Comments
Published on: July 23, 2010

Not too many moons ago I embarked on porting some servers over from SQL 2000 to SQL 2008.  On some of these SQL 2000 servers, we had numerous DTS packages performing various ETL functions.  One of these jobs interacted with an Informix database.  On the old server, an ODBC driver had been installed permitting a connection be created between the two servers.

Thunderstruck

Something that we did not flesh out prior to the migration was this particular setup.  One would think that would not be that big of a deal considering the ODBC connection information was still present on the old server.  Sure we could find most of the pertinent information to recreate this connection.  One vital piece of information was missing – the password for the user account in the Informix system.  FANTASTIC!  This raises a dilemma.  We could easily change the password so we could recreate this connection.  If we change the password, then we run the chance of breaking something else.  We could also create a new account for this one process and ensure we document the userid, process, and password.  However, that does not resolve the problem of not knowing the password for this other account used by some processes.

Decisions, decisions, decisions.  We chose to potentially break some unknown process.  We reset the password.  Woohoo!!  We were finally able to successfully recreate the ODBC connection on the new server.  We were also able to confirm that it worked on both the new and old server.  Prior to changing the password, we could not confirm that the connection was properly created on the old server since the password must be re-entered in the System DSN in order to test.  By changing the password (and putting it safely into the vault) we were able to improve our documentation as well as confirm that the process could work.

Verify

Along the lines of my entry for TSQL Tuesday this month where I talked about the necessity to Observe and Report, I had to go back and learn how the old DTS package worked to ensure I could make it work in the new environment.  I quickly ran into a new series of problems in my efforts to study this package.

64 v. 32

The server was 64 bit, the ODBC client was 64 bit, and dts packages were 32 bit.  I knew this from previous experience but had become engrossed with trying to do too much that I overlooked it.  Not too big of a problem, I will convert it to SSIS.  I plan on converting all of the packages to SSIS over time, the schedule just got moved up for this package.  That was really a no-brainer.

Opening BIDS

Creating an ODBC connection to Informix that can be consumed by the SSIS package is a little different.  Again, this was not too difficult of a problem to overcome.  You can create an ODBC connection in BIDS by creating a New Connection from the Connection Managers region.  From the Add SSIS Connection Manager window, there is an option for “ODBC.”   Select this option.

Another option to create this connection is to Create a new Ado.Net connection.  From the new Ado.Net connection Manager, you need to select Odbc Data Provider from the “Provider” drop down menu at the top.

Once you have decided which method to use to create an ODBC connection you would need to specify the appropriate settings such as the user name, password, and dsn.  This worked well for a bit.  This method quickly ran me into an Informix problem.

Doesn’t play well with others

I was able to test my connection successfully initially.  On the first attempt to pull data from the connection though, I started getting an error message in SSIS.

I started wondering if this was an SSIS problem.  I wanted to confirm what the overall impact of this error was so I tried to test it from the DTS package as well.  Guess what – same thing.  What about testing from the ODBC DSN?  I got the same thing there too.  From there I proceeded to the Internet to see what else I could learn about this problem.  Well, as it turns out – this seems to be a common occurrence where Informix is involved.  The resolution for this is to modify the SQLHOSTS file on the Informix database server.  One needs to change the nettype from Shared Memory to a Network connection (TCP/IP).  This was a solution that would have required involving our vendor.  Thus while waiting for a time when they could assist, I decided to try other things.

Missing Link

Since I didn’t like the need to pass a password to a connection manager in SSIS or for multiple people to need to know this account and password being used to connect to the Informix DB, and since I also needed to find a way around this shared memory problem, I decided to try an alternative method.  I would create a linked server that would use the ODBC settings I had created for the DSN just created.

For this linked server, I provided the setting for the user and password of the remote server.  The user has read only on the remote Informix server.  By employing this, I can ensure that the password will not need to be known by anybody outside of the DBA group.  The password will also not need to be stored in any files for connecting to this Informix server – I see it as being more secure and protected than the previous setup.

With connecting to Informix via a linked server, I am using the OPENQUERY method.  I saw this as a method used / recommended by several people who have come across a similar need.  To use the OPENQUERY, one needs to use a query similar to the following:

[codesyntax lang=”tsql”]

[/codesyntax]

I recommend changing the ‘Select *’ on the interior select statement to be specific columns.  As a sidebar, I ran into problems with this query initially due to using a top in the interior select statement.  OPENQUERY does not like the top, and thus I removed it and it works flawlessly.

One more for the road…

Once I got the OPENQUERY statement working to satisfaction in SSMS, I tried to use the OPENQUERY directly from an execute SQL task.  This attempt failed miserably.  The task was unable to connect through the linked server and thus bombed.  With that, I tried a different avenue once again.

I decided that the query needed to be put into a stored procedure.  I would test from there since I knew that the query worked from within SSMS.  I had three tables and three tasks in the DTS package that required the same process.  So for all three, I combined them into a single stored procedure.  The basic functionality was to simply extract the data through OPENQUERY and then dump the data into a staging table.  In the new world, that means I will be replacing six steps from the DTS package with one step in the SSIS package.

With all of the code in the proc necessary for these tasks, I proceeded with testing.  Testing was very encouraging and worked rather well.  That was testing from within SSMS.  How would it fare from BIDS?  That was the next step.  I created an Execute SQL task with an ado.net connection to my database and set the IsQueryStoredProcedure property to True.  I then ran this step individually and it worked just as expected.  Now I can finally finish translating the rest of this DTS package into SSIS and get this ETL process back up and running.

In the End

When all was said and done, I took a DTS package and converted it to SSIS 2008.  The package works better now than it did in the old world.  I simplified the package from 18 steps down to 6.  This package in the end probably does not need to be run from SSIS.  All of the steps are Execute SQL tasks and no use of the SSIS transformations are employed.  That is something that can be looked into changing at a later date.  For now, I will leave it as is since it also serves as an example for some of the team on how to upgrade a DTS package to SSIS.  They needed something to help familiarize themselves to the product –  and this serves that purpose.

«page 1 of 2

Calendar
August 2019
M T W T F S S
« Jul    
 1234
567891011
12131415161718
19202122232425
262728293031  

Welcome , today is Monday, August 26, 2019