T-SQL Tuesday #081: Getting Sharper

Comments: No Comments
Published on: August 16, 2016

Sharpen Something

sqlskillsharpener_pigThis month I am the host of the TSQL Tuesday blog party. In the invite, which can be read here, I asked people to decide on something to work on, plan out and then report the success/failure.

Not only am I the host, but I am a member this month. In my invite (and the reminder) I provided a few examples of what I was really looking for from participants this month. It became apparent that the topic may have been over thought. So, for my contribution, I decided to do something extremely simple.

There is so much about SQL Server that it would not be feasible nor should it be expected that one single person should know everything about the product. That said, within SQL Server alone, all of us have something to learn and improve upon within our skill-set. If we extend out to the professional development realm, we have even more we can explore as a skill sharpening experiment for this month.

I am going to keep it strictly within the SQL Server realm this month. I have chosen to develop my skills a little more with the topic of JSON. I should be an expert in JSON, but since it is spelled incorrectly – maybe I have something to learn. That said, I really do love being in the database now – haha.

JSON

katanaLet’s just get this out there right now – I suck at JSON. I suck at XML. The idea of querying a non-normalized document to get the data is not very endearing to me. It is for that reason that I have written utilities or scripts to help generate my XML shredding scripts – as
can be seen here.

Knowing that I have this allergy to features similar to XML, I need to build up some resistance to the allergy through a little learning and a little practice. Based on that, my plan is pretty simple:

  1. Read up on JSON
  2. Find some tutorials on JSON
  3. Practice using the feature
  4. Potentially do something destructive with JSON

With that plan set before me, it is time to sharpen some skills and then slice, dice, and maybe shred some JSON.

Sharpening

Nothing in this entire process was actually too terribly difficult. That is an important notion to understand. My plan was very lacking in detail and really just had broad strokes. This helps me to be adaptable to changing demands and time constraints. I dare say the combination of broad strokes and a very limited scope also allowed me an opportunity for easier success.

Researching JSON was pretty straight forward. This really meant a few google searches. There was a little bit of time spent reading material from other blogs, a little bit from BOL and a little bit from msdn. Nothing extravagant here. I did also have the opportunity to review some slides from a Microsoft presentation on the topic. Again, not terribly difficult or demanding in effort or time requirement. This research covers both steps one and two in the plan.

Now comes the more difficult task. It was time to put some of what had been seen and read to practice. A little experimentation was necessary. I have two easy enough looking examples that I was able to construct to start experimenting with in my learning endeavors.

Here is the first example. This is a bit more basic in construct. (Updated to use an image since the json was messing with the rss feed and causing malformed xml.)

json_xmpl

And some basic results:

basic_json

Pretty slick. Better yet is that this is many times easier than XML.

How about something a little different like the following:

Admittedly, this one is a bit more of a hack. In my defense, I am still learning how to work with this type of stuff. At any rate, I had an array of values for one  of the attributes. The kludge I used reads up to 3 values from that array and returns those values into individual attributes. I am still learning in this area so I can live with this for now.

array_json

The last part of the plan involved doing something destructive. Why? Well just for the fun of it. I was unable to get to this stage but it is still in the plans.

TSQL2sDay150x150Report on The Successes and Failures

 

I have written about some of the successes and failures along the way thus far. Overall, I would rate this a successful endeavor. The big reason for it being a success is because I do feel I learned more about json within SQL Server than I had prior to the experiment.

Taking a bite sized chunk of learning and acting on it sure makes it a lot easier to learn a new concept or to learn more about such a vast topic such as SQL Server.

*Note: This is a late publish because the post didn’t auto post. This is a tad late but I discovered it as I was prepping the roundup.

Audit All Select Statements

audit_selectLegislation and regulation sometimes dictates that certain activities must be tracked within a database. On occasion, it will be required that queries be audited and tracked to reach compliance with the legislation or regulation. To achieve this compliance, this article will demonstrate how to use Extended Events to audit statements being executed within the database.

Over the course of a few recent articles, I have included little hints here and there alluding to this article. In fact, now would be a good time to review one of these articles in preparation for what I will be sharing today. Take a moment to refresh your memory on this article about finding the right event – here.

Backstory

I have to be a little honest here. Prior to somebody asking how they could possibly achieve a statement audit via extended events, I had not considered it as a tool for the job. I would have relied on Audit (which is Extended Event related), or some home grown set of triggers. In this particular request, Audit was not fulfilling the want and custom triggers was not an option. Another option might have included the purchase of third party software but there are times when budget does not allow for nice expensive shiny software.

So, with a little prodding, I hopped into the metadata and poked around a bit to see what I could come up with to achieve this low-budget audit solution.

Low-Budget Audit

Using the handy scripts I have shown previously (and that I just linked to), I was able to explore the Extended Events metadata and find just what may work properly. The requirements in this case were that it needed to be done in XEvents and that it must capture SELECT statements. To find the events that seemed plausible for this task, I plugged the term “SELECT” into my queries as follows:

From this query, there was really only one event that made any sense for my task – “degree_of_parallelism”. There was another event returned in the result set, but it made no sense to me as a possible candidate event for auditing select statements (it was ucs_transmitter_reclassify). From the results, I then took the keyword associated to degree_of_parallelism to see what else might be pertinent. I plugged that keyword “execution” into the following query:

The results of the previous query will be quite a bit more substantial. Some make sense to include in the audit and some do not. I opted to not included any of the events to keep things as clean and simple as possible.

More Requirements

I now have the base events covered that I want to use for my audit. That said, my base requirements are just not extensive enough. When I audit something, I really want to know who did it, when it was done, where it originated and what was it that they did. The degree_of_parallelism event will capture all of the select statements but it does not meet these additional requirements.

In order to meet the additional requirements, I will attach a handful of actions to the degree_of_parallelism event. The addition of the actions will provide all of the data I want and need. Combine the event and actions together, I came up with the following session definition.

This is a very simple session to pull together. I have added a few things in on my predicate to help limit the scope of the session to the AdventureWorks2014 database and to make sure I am not trapping events related to the code completion tools shown. Notice here also that there is a potential chance to optimize this session. Can you find it? Hint: It may be in the predicate. Second hint: re-read this article about predicates.

Now the trick to what makes this work to audit only the select statements is right there in the predicate. I have instructed the session to ignore any statement_type that is not a value of 1. As it turns out, statement_type of 1 is a select statement. To see how these values map, here is a quick query and the correlated results.

dop_statementtype

Based on these values, should you want to audit a different query type, just change the predicate to use the map_key value that corresponds to the desired statement type.

Testing

Testing this session is rather simple. Here is a quick and dirty script to help test it.

Conclusion

Building low cost solutions is a common requirement for the data professional. When given the opportunity, try things out to see what you can build to provide the solution. In this case, I have demonstrated how Extended Events, however unlikely a candidate, can provide a working solution to help audit any select statements occurring within your database.

This has been another article in the 60 Days of XE series. If you have missed any of the articles, or just want a refresher, check out the TOC.

BIT9 PathNames

We live in a day and age when security (data, network, server, etc) is seemingly at the forefront of the daily news. In our quest to improve security we seemingly always find more and more products that are supposed to help with that security. Sometimes, those products require a data repository. As it happens, it is not uncommon for that repository to be in a database. Today, I am looking into a specific issue with the Bit9 product.

Why?

There are some very good reasons as a matter of fact. One really big reason is that I could find no decent information about this issue. The bigger reason is the seemingly indifferent level of response and the delays I saw in response from the Bit9 support channels.

In this specific case, the first response from their support channels took more than two days. Subsequent responses were more than a week later. For the client, this actually caused delays in a project they were working on. Was the issue significant? It was not an outage causing issue, but it was one that did seem to continue to grow and cause concerns with disk space.

The Issue

bit9It may be appropriate to discuss what Bit9 does prior to breaking into the details about the issue. If you have never had any experience with this product before, you may be running a product from a competitor such as Symantec. Bit9 is a an endpoint protection tool from Carbon Black. This helps to prevent against malware and endpoint attacks. In short it is a security based tool to protect your computing enterprise.

Go ahead and click the image to link to their site if you wish to learn more about how it works. For me, the nitty gritty specifics on how it works is a bit outside the scope of this article.

In short, Bit9 records a record of every filename and every filepath for every client machine. It stores these in a database in SQL Server. This shouldn’t be too much of an issue. For one instance serving about 300 client machines we saw about 1.2 million paths being stored. One another instance with far fewer clients and far fewer actual paths on those clients (physically checked) we had 114 million paths stored in the database. The difference in size was 20GB vs ~160GB. When the server is installed with all defaults, you can imagine how this looked for this instance – a very bloated OS volume.

Investigating this issue from a database perspective, I looked to find what was consuming so much space in the database. To do that, I ran my tablespace script and discovered the following:

bit9_tablespacehl

A significant standout there with the pathnames table. Looking at the definition of the table I see something far more interesting and disturbing all at once.

pathnames_table

I see a table with three columns and two large string fields. Each of these fields has a non-clustered index on it. This may or may not be such a big problem (other than the fact that the size of the non-clustered indexes on this table are much larger than the data) except that each of the string fields is an exact duplicate of the other. That’s right. Within this table, the data is duplicated into this second string field and each field has its own index. Not only does it appear that I have a ton of duplicated data, it appears I have entirely useless indexes (neither had been touched for a read since the server had been up)

I inquired about this to the folks at Bit9, both from a design perspective and from an archival perspective. Absolute silence on the design (expected). The inquiry about archival (or purge) of non-essential data did fetch a response – albeit a painfully slow response. The basic question is: “Is it safe to purge or archive old or unnecessary data.” The response we received was “run this and we will tell you what to do next.”

Great, they sent a script to help determine the state of data within the database. I am not posting their script here. Suffice it to say that the script they sent was not very pretty. They query about 20 tables, union the results from those tables, then perform a not in operation to see how many of the pathnames are invalid. No problem. Executing the script did reveal the following:

orphan_result

There seems to be the problem. 95.89% of the rows being stored in the pathnames table are orphaned records! This is a bit of a problem. The software does not appear to manage removal of invalid paths. From here, I knew what the course of action needed to be and acted on it. A big piece of the equation was provided from the results of the script. Another piece was provided in the makeup of the script. From these pieces of information, I created a purge script to help manage the orphan problem. I then put that script into an agent job and set it to run on a weekly basis.

If you find yourself in the same boat, here is the script needed to purge the data. If running this, I recommend disabling the two non-clustered indexes and then performing the delete (especially if you sit at 96% orphaned and over 100 million rows). After deleting the mass amount of orphans, go ahead and rebuild the indexes to stay in compliance with the software contract until Bit9 responds about the index requirement and the schema of the table.

Yep! It is just like that. The worst part of the whole thing is actually the delete. The delete operation will touch about 37 indexed views to update and delete. Be prepared for a long running process if you have a situation like I encountered.

For kicks, here is what that plan would look like in one of my favorite tools (SQL Sentry Plan Explorer).

bit9_deleteplan

What is your pain scale and story?

pain_scale

T-SQL Tuesday #081: Sharpen Something – Reminder

Comments: No Comments
Published on: August 2, 2016

Sharpen Something

sqlskillsharpener_pigLast week I sent out the invite for the August TSQL Tuesday blog party. In that invite I promised to send out a reminder seven days prior to the event. Well – it is that time.

You are cordially invited to read the invite for TSQL Tuesday 81 and plan your entry for the party.

In the invite, I shared the details for the event including examples of what I am looking for as an entry for the event.

I hope we will be seeing you next Tuesday, August 9th in attendance at this month’s party. I am sure it will prove to be an interesting experience one way or another.

Bonus Example

In the original invite I provided a list of examples of what one could do for this TSQL Tuesday. Today, I am providing one more example in a slightly different format. Recall that the invite requested that participants set out to accomplish something, make a plan and report on that “goal”, the plan, and the outcome.

So, let’s say I have discovered that I write too much in the passive voice. Based on that information, I would like to overcome the passivity in my writing voice, therefore my goal would be to learn how to write more assertively (less passively). In order to accomplish that goal, I may need to read up on the topic and learn exactly what it means to write passively. Then I would need to examine articles that I have written. And then I would need to practice writing more assertively. After all of that is done, I may have somebody (or something) analyze a brand new article or two to determine if I have achieved my desire.

After having executed on that plan, I will write about the experience including what the initial goal and plan were and also on what worked or didn’t work while trying to reach that goal. To summarize, here is an outline of that example:

What I will Accomplish

I will learn how to write more assertively (or just Write more assertively)

How Will I do that

Research what it means to write passively

Research what it means to write assertively

Evaluate “assertively” written articles

Take Notes on how to write assertively

Evaluate my articles

Practice writing assertively

Write a new article and have it reviewed to judge the voice whether it seems too passive or not

Report on The Successes and Failures

Write whether or not each step succeeded or failed.

Write if a step was unnecessary

Write about the experience and your thoughts about the experience.

Did you achieve or fail overall?

What is T-SQL Tuesday?

TSQL2sDay150x150T-SQL Tuesday is a monthly blog party hosted by a different blogger each month. This blog party was started by Adam Machanic (blog|twitter). You can take part by posting your own participating post that fits the topic of the month and follows the requirements below. Additionally, if you are interested in hosting a future T-SQL Tuesday, contact Adam Machanic on his blog.

How to Participate

  • Your post must be published between 00:00 GMT Tuesday, Août 9e, 2016, and 00:00 GMT Wednesday Août 10e, 2016.
  • Your post must contain the T-SQL Tuesday logo from above and the image should link back to this blog post.
  • Trackbacks should work. But, please do add a link to your post in the comments section below so everyone can see your work.
  • Tweet about your post using the hash tag #TSQL2sDay.

Easily Shred Event Data

shred_xeIt has been nearly a year since I started an extensive series about Extended Events. Previous to that start, I had already written a bunch of articles about Extended Events.

Among the batch of articles preceding the 60 day series one can find an article about shredding extended event data. Then as a part of the series (the first article in the 60 day series) one can find how to shred the action payload data attached to an extended event session.

You can read the article on shredding XEvent payload data here. Then you can follow that up by reading how to shred the Action data here. And once you are done with those and you really want to learn more about Extended Events, please read the whole series which can be found here.

All of that reading to catch up should keep you busy for a tad bit.

Upgrade

A year is a really long time to go without updating a script – according to some. Today, I have an update for both of the XML shredding scripts you just read about in the list of articles mentioned moments earlier in this post.

Before I dive into the script, let’s revisit a sample of the XML from an event session payload. Within an event session, you can see that there are nodes for both the data and the actions (highlighted in green and red). With all of this information found within the same XML for the session, it somewhat makes sense to try and parse all of the data at once.

action_xml

In addition to parsing all of the XML for the data and the actions at the same time, it seems to also make sense to generate the statements that would parse the XML within a single effort. As you would have noted, my previous scripts were just that – scripts. That implies executing a separate script for each the data and the actions. Maybe it would make more sense to execute a single script.

leaving_painIt is that notion of a single script that constitutes this upgraded version of the script.

One Script

In the following script, I have tried to accomplish just that – a single script to create the entire XML parser for me, for you, and for anybody wishing to use it. I don’t want to have to remember the subtle nuances of how to parse each of the events each time I need to parse the session data. I want something that is quick, easy, and repeatable.

With all of that said, here is the script that I now use to parse my session data. You should notice that it has been simplified and is more extensive now.

Not only does this script slice and dice for you…Wait that’s not quite right.

This script doesn’t just generate the XML shredding strings for the payload and action data. This script also will produce the rest of the tsql statements that should precede and conclude the XML parsing statements. Yes it uses dynamic sql. That is to your advantage in this case.

Could there be other enhancements? Absolutely! More will be coming.

Enjoy this script and happy parsing.

Extra Extra – Read All About It!

From the comments, you will see that Brent Ozar (blog | twitter) made an excellent suggestion. To be honest, I considered sharing some examples during the initial write of this article. I had opted against it then, not considering it in the same way that Brent puts it. So, in this addendum I have a quick example using an XEvent Session that I have not yet written about in all of my articles on Extended Events (pseudo spoiler alert).

Let’s use the AuditSelects XEvent Session I have deployed and running (remember a session can be deployed but may be in the “stopped” state). If I want to figure out what data has been captured for this session, I can take the session name and plug it into the script shown in this article.

parsexe_params

Ignore the green tsql comments for now (you can read them direct from the script). Highlighted in gold is the variable for the session name. By entering a valid session name for this variable, I can restrict the results to just the metadata for that specific session. In addition, note that I have highlighted, in a pale chartreuse, two variables that will give me unique column names for each of the events and actions within each session. This is important because some events have the same name for different attributes as other events. If events with the same attribute names are used in the same session, you will get a duplication of data. Maybe you want the duplicate data. Maybe you don’t.

The session name variable is used in each of the next three code segments. Each segment in the script will generate sql statements that will need to be copied into a new query window. I will leave the further breakdown of the script as an exercise for you. What I want to do here is show the results from executing this script.

When I execute the script, I will have four result sets returned to me (in grid mode). It will look something like shown in the following image.

parser_results

Breaking this down into three colors is probably easiest to see how things fit together. Highlighted in red, I have the build out of the pre-processing statements and the build of the “select” statement that will be returning our data to us. Then in blue comes the guts of the query – all of the beautiful XML parsing statements. The segments in blue correlates to the columns in the select list. And at the bottom of the image I have the finishing pieces that includes my predicates, from, and joins. Each of the highlighted segments will then be copied and pasted to a new query window and look something like the following.

parsed_gluedtogether

And there you have it. Sure there are still some manual steps in it, but you no longer need to memorize all of that glorious XML parsing syntax. You only need to copy and paste with this version of the script.

Enjoy your adventures in parsing XML!

«page 2 of 106»








Calendar
September 2016
M T W T F S S
« Aug    
 1234
567891011
12131415161718
19202122232425
2627282930  
Content
SQLHelp

SQLHelp


Welcome , today is Friday, September 30, 2016