Finding the Right Path

xe_path1I have a fairly large backlog of articles that are in progress or that are planned for my series about Extended Events. Despite the backlog and planned articles, every now and then something else comes up that bumps things around. This article is an example of bumping the schedule around. You can see some of the backlog and the full list of articles in the series by visiting the table of contents – here.

I bring that up for a couple of reasons. First and foremost being that the topic was recently raised as a “need” by some colleagues. The second being that I see the need and how it was lacking in coverage by anything I had already written.

What is this gaping hole in the coverage of Extended Events? To be honest, it is not a very complicated topic or very difficult gap to fill. It’s just something that has been overlooked. The gap boils down to this: how does one consistently find the correct path to the Extended Event Log file (XEL file)?

Filling the Gap

The gap I will be working to fill in this article deals with consistently finding the file path for Extended Event (XE) sessions. This gap rises due a few different things such as the ability to define a target in different manners, being able to move the logs directory, or even the fact that a target may not be added to the session (let alone a file target). These causes can all contribute to a bit of frustration and may pose as different symptoms when trying to get the file path.

One additional complication is tied to the running state of a session. For that complication, I talked briefly about it in my article about better practices. The state of the session could have an impact and could cause frustration when trying to retrieve the file path. All of these things are considerations that must be made when trying to retrieve the file path.

To find the file path, let’s start with some basics. The default path for XEL files is in the log directory of the instance. In addition to this default behavior, each session stores metadata about running sessions as well as deployed sessions that is accessible from DMVs and system catalogs.

Log Path

The first basic to tackle is the log path for the instance. I can query for the log path of the error log for the instance and rely on that as the path of my xel files. Should I choose this method, then I could execute the following query.

Executing that query on my SQL 2014 instance produces the following results:

errorpath

Now the obvious problems with this method come from the fact that relying on this data is relying upon an assumption that you have set all of your XE Sessions to use the default log path. If you have declared your sessions to use a file target and did not specify a path, then the assumption is safe. However, if you are detail oriented, you probably have been somewhat explicit in how you define your event file target. This brings us to the next topic – defining the path.

Defining The Path

While a bit of bird-walk, it is necessary to cover this topic at this juncture. This brief discussion will help to understand some of the other issues with retrieving the path consistently.

When defining the path of the event path, there is a bit of flexibility in how one can define the file to be used. You can either declare the file as just the file name, or you can define the file as the folder path along with the file name. Both methods are completely legitimate. Unfortunately, this flexibility is what causes some of the pain with retrieving the file path consistently.

Let’s take a look at two quick, and acceptable, ways to add an event file to an XE Session. I will be re-using a session from a previous article for these examples.

This is a pretty standard format I use for creating my sessions. The section I want to highlight though is the set of the filename near the end of the script. I will generally define the entire path for my xel files when I create a session – just as I have done in this example. Defining this path helps me to know exactly where I am putting the session for starters. Equally as important is that this path is easier to retrieve from metadata because I have explicitly defined the path.

Take this next example of the same session but with one minor difference.

The minor difference in this example is just in how the filename was defined. It is technically accurate and acceptable to only use the filename instead of the path and filename as I did in the previous example. The problem here comes from the retrieval of the path from metadata.

Getting the Path

In the previous two examples, I showed two methods of defining the filename within a session. In the former example, this means I can more easily find the path. However, if I use the script in the beginning of the article to find my xel file paths, then I have made an inaccurate assumption. This leads to an inconsistency in how the file path is fetched. In the latter example, the assumptions concerning the default log path would be valid and could be applied here.

Since the assumptions do not prove to be consistent or accurate across the board, we need to both evaluate how the data for each would look and we would need to see how to retrieve this path more consistently. First up is looking at how the data for each of these example sessions would be stored in metadata.

In the preceding query, I am looking at the deployed session metadata for two sessions, both of which start with “AuditSelect”. In the sys.server_event_session_fields view, there is an attribute called name that contains the value “filename”. This value will only exist if the session has an event file target defined for the session. Running the query will yield the following results (on my system).

eventfile_meta

In this result set, I can see there is one of each of the two event file definition methods I described in the previous section. In the green highlight you will see that I have the entire filepath. In the red highlight, you will only see the filename. Nothing special has been done to this data in the return as you can verify from the posted script. These are the actual stored values. So this would indeed seem like we have a bit of a problem, right?

Well, this is where we get to be a little bit imaginative and employ a script such as the following:

Wow, that is considerably longer than the first example in the article. That is true! This script accounts for a few of the conditions that cause inconsistencies (but not yet all of them) in retrieving the file path for a deployed event session. The key here is to understand this will fetch the path whether you enter a complete path or not while creating the session. I also through a check in there to confirm that the session has a file target as well as a check to ensure the session is deployed to the server.

Looking at the output, I get results that are somewhat friendly and easy to understand without any of the guesswork.

xelpath_result

What if the session has never been started and did not have the full path declared? Well, that is one of the inconsistencies I am working on still.

There is an alternative to this method as well. For the time being, this method would also be recommended in the event the session being researched happens to be one of the system sessions that is “private”. This next code chunk will show two different methods to parse the file path from running session metadata in the DMVs.

Conclusion

In the article today, I have shown some of the internals to retrieving file paths for Extended Event Sessions. I dove into metadata to pull out the path for the session and discussed some concerns for some of these methods. In the end, you have a few viable options to help retrieve the file path in a more consistent fashion.

T-SQL Tuesday #081: Recap

Comments: 4 Comments
Published on: August 18, 2016

Sharpen Something

sqlskillsharpener_pigIn case you missed it (many did), TSQL Tuesday was a challenging event this month. I invited people to do a put a little more into writing a post than what they may usually do. There were some very good reasons for this. If you are interested, take a look back at the invite and see if you maybe want to give it a go outside the bounds of TSQL Tuesday. You can check out the original post here.

Before I get into the nitty gritty I have a confession. This topic was a reminder for myself as much as it was a challenge to others to help me continue to drive and improve in various areas as I see fit.

There are other dirty little secrets too. Some may become apparent as you read through the recap.

Recap of the Event

One of the tricks to becoming and staying a top tier data talent or professional is a perpetual cycle to learn, adapt, change, and evolve. We must be in a continual cycle of self evaluation and self modification. Let’s call this by something else – we must be agile. There I said the five letter word. Think about it in broad strokes with your career – it is a development process with perpetual evaluation, review and tweaks.

Now think about the invite and see how that fits with what I just said or with the, cough cough, agile flow. You start (albeit very basically) with a need for enhancement, then you plan which pieces of the enhancement you can accomplish, you then do the work (whether successful or not), then after you deliver the work you conclude with a retrospective (what went well and what needs to change). Yes! I do feel rather dirty for sneaking this on everybody like this. That said, when you think about the model and apply it in broad strokes to your career path – it has merit.

Another way of viewing this is to think in terms of the following flowchart to help improve your personal mindset or maybe improve your personal mental power. The process is repetitive and follows a natural course. Once you have acted on some plan, you must review the performance and results and then gauge where your mindset needs to go from there to improve.

mindset_improvement_560

Rob Farley Steve Jones Robert Davis Kenneth Fisher Kennie Nybo Pontoppidan Mala Mahadevan Wayne Sheffield Jason Brimhall

In other Words

Did you just look at the picture or did you explore the picture? If you hover the picture, you will find there are links to this months participants. There were only eight so not a ton of exploration is necessary.

Here are my thoughts on each of the posts submitted this month:

Wayne Sheffield (blog | twitter) – You can find his link in the big arrow that restarts the cycle. I put his link here because he ran into a ton of blockers during his experiment and he is at a spot of practically restarting – again. This is not the first time he has restarted in his quest to learn more about Availability Groups. Wayne fully admits he is deficient in AG and states near the end of the post that he had to humble himself going through this exercise. That is awesome! We could all use a little humility on a more regular basis.

Mala Mahadevan (blog | twitter) – You can find her link in the “Results” circle. The reason for this choice is that Mala discusses her midlife crisis – erm career change. MTSQL2sDay150x150ala held out for quite a while looking for just the right opportunity. When it came, she snatched it up. Along with that career change, she has implemented a plan to become more active in blogging and to learn more and more through various avenues. The increase in blogging and the ability to stick to her guns resulted in a new job/career she seems to be happy with at the moment.

Robert Davis (blog | twitter) – Robert found himself placed in the performance circle thanks to his article involving a third party backup utility that should be heavy on the performance side. Robert needed something interesting to push him to reacquaint himself with this tool. Once he found that project that required just a touch of ingenuity, performance and a way to avoid the GUI, Robert found himself right at home with a great solution for his environment.

Kennie Nybo Pontoppidan (blog | twitter) – Kennie landed in the Actions node mostly because he decided to take the challenge and act on his long time desire to get better at the new temporal features. To do that, he decided to read a book by Snodgrass which seriously sounded like something from Harry Potter to me. Kennie outlines a bunch of information that he learned from the book such as tracking time based data from either a transaction or valid-time perspective.

personal_growth_brain

Kenneth Fisher (blog | twitter) – I placed this one into the behavior node. Maybe it is a bit of a stretch, but it seems to make sense since he discussed some behavioral differences between Azure DB and SQL Server. Things just do not work exactly the same between the two. You will need to understand these differences if you find yourself in a spot where you must work with both.

Steve Jones (blog | twitter) – When looking through the image, you will find that Steve landed solidly in the mindset node. When I read his contribution, I got the full impression that his mind was 100% in the right place. He set out to learn something and try to get better at it. Additionally, he blogged about a topic that is near and dear to me – Extended Events. Have I mentioned before that I have a lot of content about XE? You can read a bunch of it here. Like Wayne, Steve was humble near the end of his article. He notes that he was clumsy as he started working with XE but that he is glad he did it as well. Read his article. He gave me a great idea of another use for XE and I am sure it may sound good to you too!

Rob Farley (blog | twitter) – I planted Rob firmly on the attitude node. It seems clear to me that Rob had loads of attitude throughout his article about Operational Analytics. The attitude I perceived was that of humility and yearning. Rob feels like he has a lot to learn and his attitude is in the right place it seems to keep him going while he tries to learn more in the field of Operational Analytics.

My Contribution link can be found by clicking on any spot in the image that is not already described. I wrote about my experiences with trying to pick up a little on JSON.

That is a wrap of all eight contributions. If you did not contribute this month, I recommend that you still try to do something with the challenge issued with this months TSQL Tuesday.

Edit: Added links to the articles with each persons name in the event this page is being viewed with Firefox. There seems to be an issue with the links in the image map within Firefox.

T-SQL Tuesday #081: Getting Sharper

Comments: No Comments
Published on: August 16, 2016

Sharpen Something

sqlskillsharpener_pigThis month I am the host of the TSQL Tuesday blog party. In the invite, which can be read here, I asked people to decide on something to work on, plan out and then report the success/failure.

Not only am I the host, but I am a member this month. In my invite (and the reminder) I provided a few examples of what I was really looking for from participants this month. It became apparent that the topic may have been over thought. So, for my contribution, I decided to do something extremely simple.

There is so much about SQL Server that it would not be feasible nor should it be expected that one single person should know everything about the product. That said, within SQL Server alone, all of us have something to learn and improve upon within our skill-set. If we extend out to the professional development realm, we have even more we can explore as a skill sharpening experiment for this month.

I am going to keep it strictly within the SQL Server realm this month. I have chosen to develop my skills a little more with the topic of JSON. I should be an expert in JSON, but since it is spelled incorrectly – maybe I have something to learn. That said, I really do love being in the database now – haha.

JSON

katanaLet’s just get this out there right now – I suck at JSON. I suck at XML. The idea of querying a non-normalized document to get the data is not very endearing to me. It is for that reason that I have written utilities or scripts to help generate my XML shredding scripts – as
can be seen here.

Knowing that I have this allergy to features similar to XML, I need to build up some resistance to the allergy through a little learning and a little practice. Based on that, my plan is pretty simple:

  1. Read up on JSON
  2. Find some tutorials on JSON
  3. Practice using the feature
  4. Potentially do something destructive with JSON

With that plan set before me, it is time to sharpen some skills and then slice, dice, and maybe shred some JSON.

Sharpening

Nothing in this entire process was actually too terribly difficult. That is an important notion to understand. My plan was very lacking in detail and really just had broad strokes. This helps me to be adaptable to changing demands and time constraints. I dare say the combination of broad strokes and a very limited scope also allowed me an opportunity for easier success.

Researching JSON was pretty straight forward. This really meant a few google searches. There was a little bit of time spent reading material from other blogs, a little bit from BOL and a little bit from msdn. Nothing extravagant here. I did also have the opportunity to review some slides from a Microsoft presentation on the topic. Again, not terribly difficult or demanding in effort or time requirement. This research covers both steps one and two in the plan.

Now comes the more difficult task. It was time to put some of what had been seen and read to practice. A little experimentation was necessary. I have two easy enough looking examples that I was able to construct to start experimenting with in my learning endeavors.

Here is the first example. This is a bit more basic in construct. (Updated to use an image since the json was messing with the rss feed and causing malformed xml.)

json_xmpl

And some basic results:

basic_json

Pretty slick. Better yet is that this is many times easier than XML.

How about something a little different like the following:

Admittedly, this one is a bit more of a hack. In my defense, I am still learning how to work with this type of stuff. At any rate, I had an array of values for one  of the attributes. The kludge I used reads up to 3 values from that array and returns those values into individual attributes. I am still learning in this area so I can live with this for now.

array_json

The last part of the plan involved doing something destructive. Why? Well just for the fun of it. I was unable to get to this stage but it is still in the plans.

TSQL2sDay150x150Report on The Successes and Failures

 

I have written about some of the successes and failures along the way thus far. Overall, I would rate this a successful endeavor. The big reason for it being a success is because I do feel I learned more about json within SQL Server than I had prior to the experiment.

Taking a bite sized chunk of learning and acting on it sure makes it a lot easier to learn a new concept or to learn more about such a vast topic such as SQL Server.

*Note: This is a late publish because the post didn’t auto post. This is a tad late but I discovered it as I was prepping the roundup.

Audit All Select Statements

audit_selectLegislation and regulation sometimes dictates that certain activities must be tracked within a database. On occasion, it will be required that queries be audited and tracked to reach compliance with the legislation or regulation. To achieve this compliance, this article will demonstrate how to use Extended Events to audit statements being executed within the database.

Over the course of a few recent articles, I have included little hints here and there alluding to this article. In fact, now would be a good time to review one of these articles in preparation for what I will be sharing today. Take a moment to refresh your memory on this article about finding the right event – here.

Backstory

I have to be a little honest here. Prior to somebody asking how they could possibly achieve a statement audit via extended events, I had not considered it as a tool for the job. I would have relied on Audit (which is Extended Event related), or some home grown set of triggers. In this particular request, Audit was not fulfilling the want and custom triggers was not an option. Another option might have included the purchase of third party software but there are times when budget does not allow for nice expensive shiny software.

So, with a little prodding, I hopped into the metadata and poked around a bit to see what I could come up with to achieve this low-budget audit solution.

Low-Budget Audit

Using the handy scripts I have shown previously (and that I just linked to), I was able to explore the Extended Events metadata and find just what may work properly. The requirements in this case were that it needed to be done in XEvents and that it must capture SELECT statements. To find the events that seemed plausible for this task, I plugged the term “SELECT” into my queries as follows:

From this query, there was really only one event that made any sense for my task – “degree_of_parallelism”. There was another event returned in the result set, but it made no sense to me as a possible candidate event for auditing select statements (it was ucs_transmitter_reclassify). From the results, I then took the keyword associated to degree_of_parallelism to see what else might be pertinent. I plugged that keyword “execution” into the following query:

The results of the previous query will be quite a bit more substantial. Some make sense to include in the audit and some do not. I opted to not included any of the events to keep things as clean and simple as possible.

More Requirements

I now have the base events covered that I want to use for my audit. That said, my base requirements are just not extensive enough. When I audit something, I really want to know who did it, when it was done, where it originated and what was it that they did. The degree_of_parallelism event will capture all of the select statements but it does not meet these additional requirements.

In order to meet the additional requirements, I will attach a handful of actions to the degree_of_parallelism event. The addition of the actions will provide all of the data I want and need. Combine the event and actions together, I came up with the following session definition.

This is a very simple session to pull together. I have added a few things in on my predicate to help limit the scope of the session to the AdventureWorks2014 database and to make sure I am not trapping events related to the code completion tools shown. Notice here also that there is a potential chance to optimize this session. Can you find it? Hint: It may be in the predicate. Second hint: re-read this article about predicates.

Now the trick to what makes this work to audit only the select statements is right there in the predicate. I have instructed the session to ignore any statement_type that is not a value of 1. As it turns out, statement_type of 1 is a select statement. To see how these values map, here is a quick query and the correlated results.

dop_statementtype

Based on these values, should you want to audit a different query type, just change the predicate to use the map_key value that corresponds to the desired statement type.

Testing

Testing this session is rather simple. Here is a quick and dirty script to help test it.

Conclusion

Building low cost solutions is a common requirement for the data professional. When given the opportunity, try things out to see what you can build to provide the solution. In this case, I have demonstrated how Extended Events, however unlikely a candidate, can provide a working solution to help audit any select statements occurring within your database.

This has been another article in the 60 Days of XE series. If you have missed any of the articles, or just want a refresher, check out the TOC.

BIT9 PathNames

We live in a day and age when security (data, network, server, etc) is seemingly at the forefront of the daily news. In our quest to improve security we seemingly always find more and more products that are supposed to help with that security. Sometimes, those products require a data repository. As it happens, it is not uncommon for that repository to be in a database. Today, I am looking into a specific issue with the Bit9 product.

Why?

There are some very good reasons as a matter of fact. One really big reason is that I could find no decent information about this issue. The bigger reason is the seemingly indifferent level of response and the delays I saw in response from the Bit9 support channels.

In this specific case, the first response from their support channels took more than two days. Subsequent responses were more than a week later. For the client, this actually caused delays in a project they were working on. Was the issue significant? It was not an outage causing issue, but it was one that did seem to continue to grow and cause concerns with disk space.

The Issue

bit9It may be appropriate to discuss what Bit9 does prior to breaking into the details about the issue. If you have never had any experience with this product before, you may be running a product from a competitor such as Symantec. Bit9 is a an endpoint protection tool from Carbon Black. This helps to prevent against malware and endpoint attacks. In short it is a security based tool to protect your computing enterprise.

Go ahead and click the image to link to their site if you wish to learn more about how it works. For me, the nitty gritty specifics on how it works is a bit outside the scope of this article.

In short, Bit9 records a record of every filename and every filepath for every client machine. It stores these in a database in SQL Server. This shouldn’t be too much of an issue. For one instance serving about 300 client machines we saw about 1.2 million paths being stored. One another instance with far fewer clients and far fewer actual paths on those clients (physically checked) we had 114 million paths stored in the database. The difference in size was 20GB vs ~160GB. When the server is installed with all defaults, you can imagine how this looked for this instance – a very bloated OS volume.

Investigating this issue from a database perspective, I looked to find what was consuming so much space in the database. To do that, I ran my tablespace script and discovered the following:

bit9_tablespacehl

A significant standout there with the pathnames table. Looking at the definition of the table I see something far more interesting and disturbing all at once.

pathnames_table

I see a table with three columns and two large string fields. Each of these fields has a non-clustered index on it. This may or may not be such a big problem (other than the fact that the size of the non-clustered indexes on this table are much larger than the data) except that each of the string fields is an exact duplicate of the other. That’s right. Within this table, the data is duplicated into this second string field and each field has its own index. Not only does it appear that I have a ton of duplicated data, it appears I have entirely useless indexes (neither had been touched for a read since the server had been up)

I inquired about this to the folks at Bit9, both from a design perspective and from an archival perspective. Absolute silence on the design (expected). The inquiry about archival (or purge) of non-essential data did fetch a response – albeit a painfully slow response. The basic question is: “Is it safe to purge or archive old or unnecessary data.” The response we received was “run this and we will tell you what to do next.”

Great, they sent a script to help determine the state of data within the database. I am not posting their script here. Suffice it to say that the script they sent was not very pretty. They query about 20 tables, union the results from those tables, then perform a not in operation to see how many of the pathnames are invalid. No problem. Executing the script did reveal the following:

orphan_result

There seems to be the problem. 95.89% of the rows being stored in the pathnames table are orphaned records! This is a bit of a problem. The software does not appear to manage removal of invalid paths. From here, I knew what the course of action needed to be and acted on it. A big piece of the equation was provided from the results of the script. Another piece was provided in the makeup of the script. From these pieces of information, I created a purge script to help manage the orphan problem. I then put that script into an agent job and set it to run on a weekly basis.

If you find yourself in the same boat, here is the script needed to purge the data. If running this, I recommend disabling the two non-clustered indexes and then performing the delete (especially if you sit at 96% orphaned and over 100 million rows). After deleting the mass amount of orphans, go ahead and rebuild the indexes to stay in compliance with the software contract until Bit9 responds about the index requirement and the schema of the table.

Yep! It is just like that. The worst part of the whole thing is actually the delete. The delete operation will touch about 37 indexed views to update and delete. Be prepared for a long running process if you have a situation like I encountered.

For kicks, here is what that plan would look like in one of my favorite tools (SQL Sentry Plan Explorer).

bit9_deleteplan

What is your pain scale and story?

pain_scale

«page 1 of 106








Calendar
August 2016
M T W T F S S
« Jul    
1234567
891011121314
15161718192021
22232425262728
293031  
Content
SQLHelp

SQLHelp


Welcome , today is Sunday, August 28, 2016