BIT9 PathNames

We live in a day and age when security (data, network, server, etc) is seemingly at the forefront of the daily news. In our quest to improve security we seemingly always find more and more products that are supposed to help with that security. Sometimes, those products require a data repository. As it happens, it is not uncommon for that repository to be in a database. Today, I am looking into a specific issue with the Bit9 product.

Why?

There are some very good reasons as a matter of fact. One really big reason is that I could find no decent information about this issue. The bigger reason is the seemingly indifferent level of response and the delays I saw in response from the Bit9 support channels.

In this specific case, the first response from their support channels took more than two days. Subsequent responses were more than a week later. For the client, this actually caused delays in a project they were working on. Was the issue significant? It was not an outage causing issue, but it was one that did seem to continue to grow and cause concerns with disk space.

The Issue

bit9It may be appropriate to discuss what Bit9 does prior to breaking into the details about the issue. If you have never had any experience with this product before, you may be running a product from a competitor such as Symantec. Bit9 is a an endpoint protection tool from Carbon Black. This helps to prevent against malware and endpoint attacks. In short it is a security based tool to protect your computing enterprise.

Go ahead and click the image to link to their site if you wish to learn more about how it works. For me, the nitty gritty specifics on how it works is a bit outside the scope of this article.

In short, Bit9 records a record of every filename and every filepath for every client machine. It stores these in a database in SQL Server. This shouldn’t be too much of an issue. For one instance serving about 300 client machines we saw about 1.2 million paths being stored. One another instance with far fewer clients and far fewer actual paths on those clients (physically checked) we had 114 million paths stored in the database. The difference in size was 20GB vs ~160GB. When the server is installed with all defaults, you can imagine how this looked for this instance – a very bloated OS volume.

Investigating this issue from a database perspective, I looked to find what was consuming so much space in the database. To do that, I ran my tablespace script and discovered the following:

bit9_tablespacehl

A significant standout there with the pathnames table. Looking at the definition of the table I see something far more interesting and disturbing all at once.

pathnames_table

I see a table with three columns and two large string fields. Each of these fields has a non-clustered index on it. This may or may not be such a big problem (other than the fact that the size of the non-clustered indexes on this table are much larger than the data) except that each of the string fields is an exact duplicate of the other. That’s right. Within this table, the data is duplicated into this second string field and each field has its own index. Not only does it appear that I have a ton of duplicated data, it appears I have entirely useless indexes (neither had been touched for a read since the server had been up)

I inquired about this to the folks at Bit9, both from a design perspective and from an archival perspective. Absolute silence on the design (expected). The inquiry about archival (or purge) of non-essential data did fetch a response – albeit a painfully slow response. The basic question is: “Is it safe to purge or archive old or unnecessary data.” The response we received was “run this and we will tell you what to do next.”

Great, they sent a script to help determine the state of data within the database. I am not posting their script here. Suffice it to say that the script they sent was not very pretty. They query about 20 tables, union the results from those tables, then perform a not in operation to see how many of the pathnames are invalid. No problem. Executing the script did reveal the following:

orphan_result

There seems to be the problem. 95.89% of the rows being stored in the pathnames table are orphaned records! This is a bit of a problem. The software does not appear to manage removal of invalid paths. From here, I knew what the course of action needed to be and acted on it. A big piece of the equation was provided from the results of the script. Another piece was provided in the makeup of the script. From these pieces of information, I created a purge script to help manage the orphan problem. I then put that script into an agent job and set it to run on a weekly basis.

If you find yourself in the same boat, here is the script needed to purge the data. If running this, I recommend disabling the two non-clustered indexes and then performing the delete (especially if you sit at 96% orphaned and over 100 million rows). After deleting the mass amount of orphans, go ahead and rebuild the indexes to stay in compliance with the software contract until Bit9 responds about the index requirement and the schema of the table.

Yep! It is just like that. The worst part of the whole thing is actually the delete. The delete operation will touch about 37 indexed views to update and delete. Be prepared for a long running process if you have a situation like I encountered.

For kicks, here is what that plan would look like in one of my favorite tools (SQL Sentry Plan Explorer).

bit9_deleteplan

What is your pain scale and story?

pain_scale

Easily Shred Event Data

shred_xeIt has been nearly a year since I started an extensive series about Extended Events. Previous to that start, I had already written a bunch of articles about Extended Events.

Among the batch of articles preceding the 60 day series one can find an article about shredding extended event data. Then as a part of the series (the first article in the 60 day series) one can find how to shred the action payload data attached to an extended event session.

You can read the article on shredding XEvent payload data here. Then you can follow that up by reading how to shred the Action data here. And once you are done with those and you really want to learn more about Extended Events, please read the whole series which can be found here.

All of that reading to catch up should keep you busy for a tad bit.

Upgrade

A year is a really long time to go without updating a script – according to some. Today, I have an update for both of the XML shredding scripts you just read about in the list of articles mentioned moments earlier in this post.

Before I dive into the script, let’s revisit a sample of the XML from an event session payload. Within an event session, you can see that there are nodes for both the data and the actions (highlighted in green and red). With all of this information found within the same XML for the session, it somewhat makes sense to try and parse all of the data at once.

action_xml

In addition to parsing all of the XML for the data and the actions at the same time, it seems to also make sense to generate the statements that would parse the XML within a single effort. As you would have noted, my previous scripts were just that – scripts. That implies executing a separate script for each the data and the actions. Maybe it would make more sense to execute a single script.

leaving_painIt is that notion of a single script that constitutes this upgraded version of the script.

One Script

In the following script, I have tried to accomplish just that – a single script to create the entire XML parser for me, for you, and for anybody wishing to use it. I don’t want to have to remember the subtle nuances of how to parse each of the events each time I need to parse the session data. I want something that is quick, easy, and repeatable.

With all of that said, here is the script that I now use to parse my session data. You should notice that it has been simplified and is more extensive now.

Not only does this script slice and dice for you…Wait that’s not quite right.

This script doesn’t just generate the XML shredding strings for the payload and action data. This script also will produce the rest of the tsql statements that should precede and conclude the XML parsing statements. Yes it uses dynamic sql. That is to your advantage in this case.

Could there be other enhancements? Absolutely! More will be coming.

Enjoy this script and happy parsing.

Extra Extra – Read All About It!

From the comments, you will see that Brent Ozar (blog | twitter) made an excellent suggestion. To be honest, I considered sharing some examples during the initial write of this article. I had opted against it then, not considering it in the same way that Brent puts it. So, in this addendum I have a quick example using an XEvent Session that I have not yet written about in all of my articles on Extended Events (pseudo spoiler alert).

Let’s use the AuditSelects XEvent Session I have deployed and running (remember a session can be deployed but may be in the “stopped” state). If I want to figure out what data has been captured for this session, I can take the session name and plug it into the script shown in this article.

parsexe_params

Ignore the green tsql comments for now (you can read them direct from the script). Highlighted in gold is the variable for the session name. By entering a valid session name for this variable, I can restrict the results to just the metadata for that specific session. In addition, note that I have highlighted, in a pale chartreuse, two variables that will give me unique column names for each of the events and actions within each session. This is important because some events have the same name for different attributes as other events. If events with the same attribute names are used in the same session, you will get a duplication of data. Maybe you want the duplicate data. Maybe you don’t.

The session name variable is used in each of the next three code segments. Each segment in the script will generate sql statements that will need to be copied into a new query window. I will leave the further breakdown of the script as an exercise for you. What I want to do here is show the results from executing this script.

When I execute the script, I will have four result sets returned to me (in grid mode). It will look something like shown in the following image.

parser_results

Breaking this down into three colors is probably easiest to see how things fit together. Highlighted in red, I have the build out of the pre-processing statements and the build of the “select” statement that will be returning our data to us. Then in blue comes the guts of the query – all of the beautiful XML parsing statements. The segments in blue correlates to the columns in the select list. And at the bottom of the image I have the finishing pieces that includes my predicates, from, and joins. Each of the highlighted segments will then be copied and pasted to a new query window and look something like the following.

parsed_gluedtogether

And there you have it. Sure there are still some manual steps in it, but you no longer need to memorize all of that glorious XML parsing syntax. You only need to copy and paste with this version of the script.

Enjoy your adventures in parsing XML!

SQL Agent Job Schedules

Many moons ago, I posted a script to help report on the human friendly scheduling sched_reportinformation for SSRS subscriptions. You can read all about that here.

In a very similar fashion, today I want to share how easy it is to turn the internal schedules for SQL Agent jobs into something that is more human friendly. SQL Agent job schedules should be easy to read for humans if for nothing else than the fact that you have to provide this information to the auditors from time to time. We all know they can’t properly interpret the numeric job schedules and properly determine what they mean.

The job schedules work quite well with the current design – which is not human friendly. The use of numbers is a streamlined approach and far more efficient. It may be a bit more useful to us if there were some sort of translation table that was stored. A stored translation table would make it terribly easy to figure out what is happening within the schedules for the SQL Agent jobs. We don’t have that, so we often have to do a little trial and error. Or we rely on people that have done that trial and error – with a little verification on our part. Afterall, these scripts are posted on the internet.

SQL Agent Job Schedules

I am not going to break down the script. With the similarity to the aforementioned SSRS script, I would recommend looking there for a little bit of an explanation. Here is the script to retrieve SQL Agent Job Schedules from your MSDB instance.

This script will only retrieve the SQL Agent Job schedules that are not SSRS related and that are not SSIS maintenance package related. The SSRS stuff has already been introduced, and a quick eval of those blog posts will show the differences between grabbing SSRS information vs. grabbing just SQL Agent Jobs.

Stay tuned. I will post a script for the SSIS related scheduling information.

Now, go forth and audit your agent jobs. Document the schedules in an easy to read format. Give that report / document to the auditors and your manager and look like a rock star for the day!

Shredding Extended Event Actions

lovehatedice_v1The other day I wrote about that torrid love/hate relationship DBAs tend to have with working with XML. In that same post, I promised I would have a follow up post about XML in Extended Events.

Well, today I present to you another opportunity for you to renew your love/hate relationship with XML.

In the previous post (which you can read here), I discussed just one facet of shredding the XML related to Extended Events. Today, we have “actions” to discuss. I am not going to get into what an “action” is today. I will save that for a near future post.

For the shredding of the actions, I will use the TreeHuggerCPU Event Session I used in the last article (for the sake of consistency). The session doesn’t necessarily need to be running. I will just pull the actions related to the session from the metadata accessible via the system catalog views.

Should the session be running (and in my case it is for demonstration purposes), I could open the session data and view it as XML and see something like the following:

action_xml

 

I have highlighted two different types of nodes available in the XML data. In the previous article, I discussed the “data” nodes and I have highlighted that in red here. Today, we are talking actions, and those are highlighted in green this time around. It is the “action” nodes that we will be shredding via the following script.

 

With this script, I can either search for all actions tie to an XE Session, for the actions tied to one event within a Single XE Session, for all actions tied to a specific event across multiple sessions, or for all actions that are tied to any event tied to any event session deployed to the server.

Combine this with the previous script and suddenly all of that XML just got several times easier.

Tables, Columns and Defaults Oh My!

Comments: No Comments
Published on: August 19, 2015

Picture this if you could. You inherit a new database either through the change of a job, or finding a black box database server, or maybe due to the acquisition of a new application.

No matter the cause, you are now responsible for maintaining that database and you also really need to become acquainted with it. Some of the acquaintance that is needed is learning how things are related and figuring out what data goes where. Not just what data, but what type of data.

I know that one of the things I am frequently asked is to figure out what data type belongs with which column. Furthermore, does this same column exist in more than one table. It’s nice when things are named consistently and you are able to tie things together by column name.

There are also times, when that column that is so consistently named, might need to be confirmed on data types between each of the implementations. I know I have run into cases when the column might be differently defined in different tables.

So, what to do to tie this all together and get that intimate acquaintance with your database?

Luckily, I have a script that will help you figure out some of these connections.

First, the script.

Now, let’s give it a trial run. See if you can spot the difference by running that against the AdventureWorks database using the value “AccountNumber” for the @ColName parameter.

ColData_Queryresult

 

Maybe there is a legitimate reason for that Column to have a different data type specification than the other two. Maybe it doesn’t. One thing for certain is that you will be able to walk away with some questions to ask and hopefully gain a better understanding!

Table Size Script Update for 2015

Categories: News, Professional, Scripts, SSC
Comments: 2 Comments
Published on: August 3, 2015

As was recently promised, I have an db_engineupdate to an old tried and true script for calculating the size of an object.

That promise was doled out in the previous article when I discussed the internal table types. You can read the article here. Information from that article was used in this script – so it could be of use prior to taking a gander at this new revision.

And of course, previous releases can be found through the link chain here.

The script was in need of an update the last time around due to the internal table types that had been updated in SQL Server. Those internal tables haven’t really changed in the latest release (at least as far as calculating space is concerned). But, I didn’t like needing to update multiple places in the script for one. For the other part of it, I did not like that It was difficult to know what exactly the internal table numbers actually represented. So I added that documentation to this script.

In addition to that little change, I also wanted to figure out if a table might have been spread across multiple filegroups. Due to this, I added in a delimited list of filegroups.

And the last minor update is that the script now has an option to exclude MS Shipped objects, or to include them, or to look just at the MS Shipped objects.

All of this is included in the effort of trying to get a more complete view of the size of the objects within a database and to be able to do so without needing to loop through the objects.

Note: A quick update to move the index creation outside of the table creation instead of inline since that is a feature not supported in older versions of SQL Server.

Internal Tables and Space Used

Categories: News, Professional, Scripts, SSC
Comments: No Comments
Published on: July 29, 2015

engine_indbSome of the beauty of SQL Server is the complexity that it holds. Some of that complexity is right before our eyes. And some of that requires a little bit of diving into the internals to figure out what is happening or even why it is complex.

I enjoy the complexity that is offered to us through this great technology. Because of some of the hidden complexity, I found myself in a rabbit hole recently trying to figure out what exactly was happening with how table size is being calculated by default.

I have written previously on how to find some of the information pertinent to table size. And sadly, thanks to recent rabbit hole excursions, I found that some of the information in the previous articles was wrong (and accordingly there is even a bit of an error in some documentation but that is a different story – connect filed here).

One of the most common means to calculate size in a database is through the use of sp_spaceused. Over the recent editions, there have been a few minor updates to this stored procedure. But a common theme in every update has been to reference the sys.internal_tables system table to get some internal_types excluded from certain calculations.

I found myself wanting to know just exactly what these internal_types were. I searched BOL and Google. I found some references to some of the table types but they always seemed to just be in code. There was a short list in BOL that had some information, but it was far from complete. What to do? What to do?

What did I do?

Since I couldn’t find all of the internal_types for the internal_tables, I was left to do but one last thing. Well, actually, I resorted to asking around a bit first. I asked a group of MCMs and some people (e.g. Paul Randal – b|t ) for a little help. After those inquiries, I found myself still short of a complete list. That is, complete as far as the exclusion list for sp_spaceused.

My last resort for the time being was to begin testing. I tested various different features and configurations. I did this until I was able to come up with a complete list (with regard to sp_spaceused again). In addition to that complete list, I found a handful of additional internal table types.

Now this investigation and rabbit hole was not just for my own enjoyment. I have to admit it was rather frustrating. I ran into test failure after test failure trying to find the exact internal table types that were referenced in that blasted stored procedure.

I was asked by a friend (blog | twitter) why I was submitting myself to this kind of pain and why it was so important. Well, it’s not just for my enjoyment. SPOILER ALERT: I have an update for the table space script that was planned, and it needs to have a little bit better information in lieu of the “because it says so in BOL” explanation that I had made in previous releases of the script.

But I digress. That will all be better discussed in the next installment of the script. As for today, I want to share my findings of this expedition into the nooks and crannies of the database engine.

The script

I have hard-coded a few things in this script that you will possibly need to change. The most important being that I hard-coded a reference to the AdminDB database to the string splitter that I use. There is a note of that string-splitter in the script. And to use the same one (By Jeff Moden with Community contributions) that I have employed, you can download it from here.

You can see that I have included the types for versions from 2005 through 2016 that I have found to date. In addition, SQL Server 2016 has the same exclusions (for now) as 2012 and 2014. That is, at least within sp_spaceused. In 2016, sp_spaceused does make a call to an extended stored proc called sp_spaceused_remote_data_archive, and I do not yet know what is within that proc. You can bet though, that it is related to the new Stretch feature.

The Tease!

Stay tuned! The new release for the table space script will be coming to you on the other side of this short blogging break!

What’s the Size of that Index?

Categories: News, Professional, Scripts, SSC
Comments: 1 Comment
Published on: July 27, 2015

I have previously written about finding what the big space consumers are within the database. I even shared scripts in those articles on how to find those big consumers (read all about it here).

sizematters2That is grand when it comes to quickly finding which objects happen to be consuming the greatest amount of space within the database. It really helps to quickly assess where some extra TLC may need to be spent. But what about more specific details? What if you wished to figure out what the size was for each of the indexes within that object?

Well, the solution is pretty simple. I have another script to help determine which indexes are the big space consumers within the specified table for the given database.

 

First, let’s see what that script is.

As you can see, I am parsing a few different measures on space usage from the dynamic management view sys.dm_db_partition_stats. This will tell me information such as how much space is being reserved for each index as well as how much space is actually being used on each index. Then I use a grouping set to summarize that size information so I can then compare the total index size back to the size I would have seen from the table space script (discussed lightly earlier).

Why might I want to do something like this?

That is a really good question! The first obvious answer to a question such as this is in the off chance that you discover (again, from the use of the previous script), that one of your terabyte tables is actually only 150GB of data and the rest of that space consumption is actually from all of the indexes on that table.

When I see that a table has more space allocated due to indexes than due to data, I might be curious which indexes are contributing to that consumption. In addition, I can look at numbers much faster to see if maybe there are multiple indexes that are the exact same size. If they are, I might want to evaluate and dig a little deeper into the definitions of those indexes. We would only really love to have 850GB of indexes on a table to only find that three or four of those indexes are exact duplicates.

From there, I might even decide to look at the usage stats for the indexes in question. Why go about it in this fashion? Well, on some databases it is much faster to run a few quick queries than to try and run a big query against the entire database to get all of the index usage stats. I am sure you are familiar with those times when trying to query physical and operational stats takes hours to complete.

But But But…

I can hear you thinking right now. Another question has popped into your head. What about sp_spaceused. That old proc does work just fine for this as well. But there is something about being able to get to know what contributes to the space of an index and being able to work through how to get it calculated. Additionally, with a minor tweak, this query can provide that (detailed) information very quickly for all objects in the database.

Tweaked query

 

Database Settings Changes – Red Handed

Comments: 4 Comments
Published on: July 8, 2015

One of my pet-peeves (and consequently frequent topic of discussion) is finding database settings (or any setting that has changed) without knowing about it. Worse yet is finding that the change has occurred and nobody claims to have any knowledge of it or having done the deed.

This happened again recently where a database was set to single_user and suddenly performance in the database tanked. Change the database back to multi_user and the performance issues are magically resolved.

Fortunately there is a means to combat this. Well, sort of. The default trace in SQL Server does capture the event that occurs when the database is set to single_user or read_only. Unfortunately, all that is captured is that an Alter Database occurred. There is no direct means of mapping that event to the statement or setting that changed.

This inadequacy got me to thinking. The default trace is looking at a set of specific “events”, why wouldn’t that set of events be available within Extended Events. It only seems logical! So I decided to query the event catalog and lo and behold, I found just the event I was seeking – object_altered. Combine this with a recently used predicate (object_type = ‘DATABASE’) and we are well on our way to having just the trap to catch the source of these database changes red-handed.

Easy enough to create this particular session. The event does not capture the “whodunnit” without a little extra prodding. So, I added in a couple of actions to get that information – sqlserver.nt_username,sqlserver.server_principal_name,sqlserver.client_hostname. Additionally, the event does not explicitly tell me what setting changed – just that some setting changed. For this, I decided to add the sql_text action so I could correlate event to the actual setting being changed. Then to cap it all off, I made sure the predicate specified that we only care about database settings changes as previously mentioned.

Running the session and then testing some settings changes should prove fruitful to capturing good info. Here are a few of the tests that I ran and the results of those tests (by querying the session data).

Now to try and look at the results.

DB Change Data

 

There you have it! I have just been caught red-handed changing my AdventureWorks2014 database to single_user and multi_user.

Bonus

For more ideas on settings and changes and so forth, Andy Yun (blog | twitter) has invited all to participate in TSQL2SDAY on this very topic. He has invited all to talk about their experiences with “default settings” and what you might change them to! You can read about it here. I have another article coming up that will fit just nicely with that. Let’s just call this a preview and maybe it can help you get those tsql2sday juices flowing.

Database Drops in SQL 2012

In the previous article on this topic (which can be read here), I discussed the problem of having a database get dropped and the need to find out who dropped the database and when they dropped it.  In that article, I demonstrated how the information (at least some of it) could be found through the use of Extended Events.

What I forgot to mention is the method I shared was for SQL Server 2014. While the events do exist for SQL Server 2012, there is a minor modification that needs to be made in order to avoid the dreaded error message that says something like:

Msg 25713, Level 16, State 23, Line 1
The value specified for event attribute or predicate source, “object_type”, event, “object_created”, is invalid.

I am sure many of us would rather not have to deal with such a terrible thing as an error when we want to do something that should just work. Well, here is the fix for that error if you tried to deploy that XE Session to a previous version (such as 2012).

Do you see that change? Instead of using the map_value in 2012, one must use  the map_key. This was a good change in 2014 to allow us to use human friendly terms instead of needing to lookup the map_key for everything like in 2012.

In following the theme from the previous article, here is the rest of the setup for testing this XEvent session to see how it would trap that data and how to parse the data from the session.

Enjoy!

Nice and simple for a test. This should be enough to have trapped some data for our XEvent session.  Let’s parse the data and take a look.

With the XEvent session started, and a database that has been created and dropped, this last query will produce some data to help track who has been adding or removing databases from your SQL Instance.  If you have run the example code, you should see data very similar to what is shown in this next image.

xe_results

In the attached image, I have separated the pair of statements for the DB create from the pair of statements for the DB drop (recall that I mentioned each will fire twice).  I hope this serves you well in your attempts to reign in the control of your environment and discover who may be creating rogue databases or even dropping the databases that are essential to the business.

«page 2 of 7»








Calendar
April 2017
M T W T F S S
« Mar    
 12
3456789
10111213141516
17181920212223
24252627282930
Content
SQLHelp

SQLHelp


Welcome , today is Thursday, April 27, 2017