BIT9 PathNames

We live in a day and age when security (data, network, server, etc) is seemingly at the forefront of the daily news. In our quest to improve security we seemingly always find more and more products that are supposed to help with that security. Sometimes, those products require a data repository. As it happens, it is not uncommon for that repository to be in a database. Today, I am looking into a specific issue with the Bit9 product.

Why?

There are some very good reasons as a matter of fact. One really big reason is that I could find no decent information about this issue. The bigger reason is the seemingly indifferent level of response and the delays I saw in response from the Bit9 support channels.

In this specific case, the first response from their support channels took more than two days. Subsequent responses were more than a week later. For the client, this actually caused delays in a project they were working on. Was the issue significant? It was not an outage causing issue, but it was one that did seem to continue to grow and cause concerns with disk space.

The Issue

bit9It may be appropriate to discuss what Bit9 does prior to breaking into the details about the issue. If you have never had any experience with this product before, you may be running a product from a competitor such as Symantec. Bit9 is a an endpoint protection tool from Carbon Black. This helps to prevent against malware and endpoint attacks. In short it is a security based tool to protect your computing enterprise.

Go ahead and click the image to link to their site if you wish to learn more about how it works. For me, the nitty gritty specifics on how it works is a bit outside the scope of this article.

In short, Bit9 records a record of every filename and every filepath for every client machine. It stores these in a database in SQL Server. This shouldn’t be too much of an issue. For one instance serving about 300 client machines we saw about 1.2 million paths being stored. One another instance with far fewer clients and far fewer actual paths on those clients (physically checked) we had 114 million paths stored in the database. The difference in size was 20GB vs ~160GB. When the server is installed with all defaults, you can imagine how this looked for this instance – a very bloated OS volume.

Investigating this issue from a database perspective, I looked to find what was consuming so much space in the database. To do that, I ran my tablespace script and discovered the following:

bit9_tablespacehl

A significant standout there with the pathnames table. Looking at the definition of the table I see something far more interesting and disturbing all at once.

pathnames_table

I see a table with three columns and two large string fields. Each of these fields has a non-clustered index on it. This may or may not be such a big problem (other than the fact that the size of the non-clustered indexes on this table are much larger than the data) except that each of the string fields is an exact duplicate of the other. That’s right. Within this table, the data is duplicated into this second string field and each field has its own index. Not only does it appear that I have a ton of duplicated data, it appears I have entirely useless indexes (neither had been touched for a read since the server had been up)

I inquired about this to the folks at Bit9, both from a design perspective and from an archival perspective. Absolute silence on the design (expected). The inquiry about archival (or purge) of non-essential data did fetch a response – albeit a painfully slow response. The basic question is: “Is it safe to purge or archive old or unnecessary data.” The response we received was “run this and we will tell you what to do next.”

Great, they sent a script to help determine the state of data within the database. I am not posting their script here. Suffice it to say that the script they sent was not very pretty. They query about 20 tables, union the results from those tables, then perform a not in operation to see how many of the pathnames are invalid. No problem. Executing the script did reveal the following:

orphan_result

There seems to be the problem. 95.89% of the rows being stored in the pathnames table are orphaned records! This is a bit of a problem. The software does not appear to manage removal of invalid paths. From here, I knew what the course of action needed to be and acted on it. A big piece of the equation was provided from the results of the script. Another piece was provided in the makeup of the script. From these pieces of information, I created a purge script to help manage the orphan problem. I then put that script into an agent job and set it to run on a weekly basis.

If you find yourself in the same boat, here is the script needed to purge the data. If running this, I recommend disabling the two non-clustered indexes and then performing the delete (especially if you sit at 96% orphaned and over 100 million rows). After deleting the mass amount of orphans, go ahead and rebuild the indexes to stay in compliance with the software contract until Bit9 responds about the index requirement and the schema of the table.

Yep! It is just like that. The worst part of the whole thing is actually the delete. The delete operation will touch about 37 indexed views to update and delete. Be prepared for a long running process if you have a situation like I encountered.

For kicks, here is what that plan would look like in one of my favorite tools (SQL Sentry Plan Explorer).

bit9_deleteplan

What is your pain scale and story?

pain_scale

SQL Agent Job Schedules

Many moons ago, I posted a script to help report on the human friendly scheduling sched_reportinformation for SSRS subscriptions. You can read all about that here.

In a very similar fashion, today I want to share how easy it is to turn the internal schedules for SQL Agent jobs into something that is more human friendly. SQL Agent job schedules should be easy to read for humans if for nothing else than the fact that you have to provide this information to the auditors from time to time. We all know they can’t properly interpret the numeric job schedules and properly determine what they mean.

The job schedules work quite well with the current design – which is not human friendly. The use of numbers is a streamlined approach and far more efficient. It may be a bit more useful to us if there were some sort of translation table that was stored. A stored translation table would make it terribly easy to figure out what is happening within the schedules for the SQL Agent jobs. We don’t have that, so we often have to do a little trial and error. Or we rely on people that have done that trial and error – with a little verification on our part. Afterall, these scripts are posted on the internet.

SQL Agent Job Schedules

I am not going to break down the script. With the similarity to the aforementioned SSRS script, I would recommend looking there for a little bit of an explanation. Here is the script to retrieve SQL Agent Job Schedules from your MSDB instance.

This script will only retrieve the SQL Agent Job schedules that are not SSRS related and that are not SSIS maintenance package related. The SSRS stuff has already been introduced, and a quick eval of those blog posts will show the differences between grabbing SSRS information vs. grabbing just SQL Agent Jobs.

Stay tuned. I will post a script for the SSIS related scheduling information.

Now, go forth and audit your agent jobs. Document the schedules in an easy to read format. Give that report / document to the auditors and your manager and look like a rock star for the day!

On the Seventh Day…

So far this series has been a bit of fun.  We have discussed a fair amount of useful things to date.  And here we are only on the seventh day.

Today we will be reverting our attention back to the msdb database.  This time not as a means of maintaining the under-carriage but rather to help us try and become better DBAs.

Sometimes to be a better DBA, we have to be a bit proactive and less reactive.

We’ll get to that shortly.  As we have done just about every day so far though, we need to recap real quick what we have to date in the 12 Days of pre-Christmas.

 

  1. Maintenance Plan Gravage – Day 6
  2. Table Compression – Day 5
  3. Exercise for msdb – Day 4
  4. Backup, Job and Mail History Cleanup – 3rd Day
  5. Service Broker Out of Control – 2nd Day
  6. Maint Plan Logs – 1st Day

On the Seventh Day of pre-Christmas…

My DBA gave to me an early Nuclear Fallout detection and warning system.  Ok, maybe not quite that extensive – but it could sure feel like it if something were to slip through the crevasses and the business management started breathing down your neck.

Have you ever had a SQL Job run longer than it should have?  Has the job run long enough that it ran past the next scheduled start time?  These are both symptoms that you might want to monitor for and track in your environment.  If you are watching for it – you should be able to preemptively strike with warning flares sent out to the interested parties.  Then you would be more of the hero than the scapegoat.  And we all want to be on the winning side of that conversation without the confrontation from time to time ;).

Today, I am going to share a quick script that can be used to help monitor for jobs that have waltzed well beyond the next scheduled start time.  First though, there is a script available out there for the jobs that have run beyond normal run times.  You can check out this article by Thomas LaRock to get a good script to check and log long running jobs.

Though the criteria are similar – we do have two different needs that need to be reported on and monitored.  This is why we have this second script.  If you have a job that should run every 15 minutes but you find that the job has been running non-stop for 45 minutes, that is quite a problem.  And many times we should alert on something like that.

So here is the quick script.

[codesyntax lang=”tsql”]

[/codesyntax]

From here, it is a matter of creating a SQL job to run this script.  The schedule is dependent on job frequency and your expected needs to monitor.  It can very easily be set to run every five minutes.

Once the job is created then it is a matter of setting up a mechanism to alert.  There are any number of ways to do that as well.  I leave that to any who use the script to determine how best to implement for their own environment.

Conclusion

This script will catch jobs that are running and have exceeded the next scheduled start time.  This is accomplished through the filters in the where clause.  With a couple of date comparisons as well as filtering on job_status we can get the desired results.

 

page 1 of 1








Calendar
August 2017
M T W T F S S
« Jun    
 123456
78910111213
14151617181920
21222324252627
28293031  
Content
SQLHelp

SQLHelp


Welcome , today is Monday, August 21, 2017