Unicode, Emojis and Databases Oh My!

Categories: News, Professional, Scripts, SSC
Comments: No Comments
Published on: January 5, 2020

Over the past several (or pervious if you have been following along) articles, I have explored the use of special characters in certain database scenarios. Using these special characters brings certain fun and flare to your hum drum naming schemes. In addition to the fun, there comes a little bit of extra peculiarities to ensure everything works as expected.

While figuring out some examples, I found myself searching for good reliable sources constantly. I also found myself trying to find various emojis that would work well to represent one idea or another. Sometimes, the effort paid off, and sometimes it seemed frivolous.

If only there was an easy comprehensive source to search through all available unicode characters for SQL Server. Then a thought occurred to me. Why not create my own database with these characters?

Laying the Groundwork

After determining that a database would work best locally to make it easier to research these characters, I quickly determined that I would need to figure out how to write a powershell script so I could scrape a list of characters from a website. That should be easy, right?

With a sample such as that, I just need to figure out how to pull the image for each character and the unicode value of each. Next task to figure out is where can i find some decent powershell examples on the web that would do a similar task.

In addition to figuring out how to do the powershell end of this, there is the database side of things. What should the table design be for this kind of data? Which values and attributes should be stored? Am I overthinking this? Is this too much work?

Well, this is indeed far too much work. Especially given this awesome gem I found while trying to find the exact reference link I was looking for in one of the pervious articles. You see, as it turns out Solomon Rutzky had already done all (and I really do mean all) of the heavy lifting for this when he wrote a function that will do exactly what I was looking to do – here. The function that Solomon created will return all 188,657 code points that are usable in SQL Server.

Now, instead of building out a script to import the data from somewhere else, I just need to employ the use of this fabulous script and move on with all the oodles of time I was spared.

Put a bow on it

Playing around with emojis in a database is a fun endeavor. Not only is it fun to play with for personal growth, but it does have some business advantages. We live in an era were these emojis are popping up everywhere and are even being stored long term in many databases. I view it as a great opportunity to improve your skill set and better your career for the future.

Interested in learning about some deep technical information instead? Check these out!

Want to learn more about your indexes? Try this index maintenance article or this index size article.

This is the twelfth article in the 2019 “12 Days of Christmas” series. For the full list of articles, please visit this page.

*Previous has been intentionally mis-spelled throughout this article as “pervious” as an ode to a fat finger mistake in a recent article.

Creative XE Sessions – Funky Data

In the previous article, I explored the opportunity of using special characters (or emojis) within Extended Event Sessions. Creating a session with these characters in the name is not terribly complicated. However, it is a bit more of a pain on the data review side (as mentioned in the previous article).

Just because it gets to be a little more complicated is seldom a deterrent for somebody such as myself. In fact, many times it is more of a challenge. These challenges are fun. With this particular challenge, and as I promised, we are going to see how we can solve the problems posed by having emojis within the object names (which causes pseudo funky data to be trapped in the session data).

Get the Funk Out!

As was demonstrated in the last article, when we have special characters in the database names and that data is captured in an XE Session, we end up with some smelly funky cheese type data that is fairly unpleasant to our desires. We have to do something quickly with that funky data in order to make it useful.

Unlike funky cheese that may have turned, we are not going to discard this data as being unwanted or un-useful. We still have a use for it. We just need to transform it a little bit before it becomes a little more palatable.

Let’s go back to a query we used to parse the session data from that last article. We need to modify this query to make it more capable of handling this new data experience.

Knowing that we have certain patterns that need to be recognized and altered, we can take advantage of the replace function. The first two patterns to find are: “&#” and then “;“. The next step is a little less known. What do we replace those known patterns with? Well, as it turns out, and as is shown in the above query results window, we know that we are looking for two surrogate code points (High and Low) that represent the character in question. When we do a little TSQL trickery and use NCHAR with those surrogate values, we can render the character to display in the desired format. Let’s see that in query form (it will help that explanation quite a bit).

The results of this query will look like the following.

Easy peasy right? Well, sort of easy. We can easily render that funky data we are seeing in the session details to something more human friendly. However, in order to display that data in human friendly format, we have to execute it in a sort. This means we have to do something a little more with the query – we have to convert some part of the query to a dynamic SQL statement. Here is how we can do that!

In this new version of the query, I have a few things going on. First, I am pattern matching and doing a replacement for those aforementioned terms. Next, I am dumping the results into a temp table and removing some extra data (unnecessary stuff due to the pattern matching and replacement). From the temp table, I then build a dynamic query to pull it all together and use that NCHAR trick to put the two surrogate values together to visually represent the character we expect. In the end, I have results that look like this.

Now, from a TSQL query, I have a visual representation of the Database Name that matches what I should be seeing within the SSMS object tree. This makes more logical sense to the person reviewing the data and is far easier to correlate to the correct database.

What about the histogram target that was demonstrated in that other article? Let’s look at the changes for that one as well.

The results are now similar to those we fetched for the event_file target (similar in display format). Again, this makes it easier to digest for us humans.

A little more effort on the preparation end with the TSQL queries, makes for an easier time going forward on the data review end of these XEvent Sessions.

Put a bow on it

Playing around with emojis in a database is a fun endeavor. Not only is it fun to play with for personal growth, but it does have some business advantages. We live in an era were these emojis are popping up everywhere and are even being stored long term in many databases. I view it as a great opportunity to improve your skill set and better your career for the future.

Interested in learning about some deep technical information instead? Check these out!

Want to learn more about your indexes? Try this index maintenance article or this index size article.

This is the eleventh article in the 2019 “12 Days of Christmas” series. For the full list of articles, please visit this page.

Creative Extended Event Sessions

In the previous article, I showed how to take advantage of collations to add more unique and interesting names to databases. In that article, I alluded to the better collation to use but failed to be very explicit about it.

As Solomon Rutzky pleads in many articles, the best default collation to use is not any collation that starts with SQL, but rather the collation one should use is Latin1_General_100_CI_AS_SC. I agree. Let’s not use the old default collations anymore and start using more modern collations – even if Microsoft does not change the default, you should change your defaults in your builds to represent a better more modern collation!

Fun with Sessions

Let’s suppose I strongly suspect I am having an issue with the ghost cleanup process for a specific database called ‘👻’. I want to prove whether the suspicions are accurate or not so I resort to an old tried and true XE session to validate. Here is that session.

Now, let’s try to validate my suspicions.

The best part is yet to come. With the event session running, I need to execute the preceding query. The query may take several minutes to complete – don’t worry yet. After the query completes, it is time to check the event session data to try and confirm the suspicions.

This is where things start to get a little interesting! If I try to open the session data to view it from the GUI, I will see something like the following.

Well, that is not very useful! Let’s try it from a TSQL query instead!

Well, that seems to be a little better. At least I get some data back at this point. Sadly, the database name is poorly displayed. That said, the database name that is displayed is accurate – just not very useful. As it turns out, XE is lagging a bit behind in the ability to display the characters that require certain code points (seemingly any that require multiple code points or above a certain range). I have not yet tested the full limitation, just understand that there is a limitation.

What if I tried a different target such as an event_file target?

The event session will start successfully. In addition, I can confirm that the event file is created on disk.

Then I will rerun the experiment to test the ghost_cleanup process against the 💩 database. If I query the event session with TSQL, I will see something like this.

If I look at the event session from within the GUI, I will see something slightly more usable.

Peeking at the file data (instead of displaying it in tabular format) from tsql commands, I will see something like this.

Stay tuned to see how we can resolve this issue where the 💩 and 👻 are not being translated properly in these TSQL queries for these sessions.

Put a bow on it

A picture is worth a thousand words, right? Sometimes, a picture for an event session just may be able to say it better than 50-60 letters used to try and describe the session. Maybe you just want to try it out for fun. Either way, the use of these characters in an event session can be a little difficult when looking to review the data.

Interested in learning about some deep technical information instead? Check these out!

Want to learn more about your indexes? Try this index maintenance article or this index size article.

This is the tenth article in the 2019 “12 Days of Christmas” series. For the full list of articles, please visit this page.

Creative Database Naming

Categories: News, Professional, Scripts, SSC
Comments: 1 Comment
Published on: January 2, 2020

Every now and again, we as DBAs need to get away from the daily grind of the hum drum tasks. We need to find a way to have a little fun (the kind that would be non-harmful and lacking in mal-intent).

Sometimes, this fun can be had through learning opportunities. Sometimes, we can have a little fun through diving deep into SQL Server in some way. At least I know that proves to be enlightening and enjoyable for myself. Sometimes, it is just fun for the sake of a good laugh. What if we could find an opportunity to have a little fun that incorporates learning and a laugh or two? BINGO!

Enter the opportunity to play with emojis, collations, and database (or object) names.

Fun with Characters

Let’s figure we have a requirement to create a database with sensitive data. Due to the sensitivity of the data, it is classified confidential (for your eyes only, don’t talk about it and plug your ears if somebody starts talking about it). This is so sensitive that an apt name for the database could be anything like 🙈 or 🙉 or 🙊. Being smart, you know there are two more databases coming down the line so you only want to pick one of those for the name and not all three (though all three could make sense for a single database name).

Being on top of your game, you prep for all three databases in advance. Here is the script to create the three new databases.

In executing the script, you receive the following results:

Msg 1801, Level 16, State 3, Line 15
Database ‘🙈’ already exists. Choose a different database name.
Msg 1801, Level 16, State 3, Line 17
Database ‘🙉’ already exists. Choose a different database name.
Msg 1801, Level 16, State 3, Line 19
Database ‘🙊’ already exists. Choose a different database name.

You look up and down through your SSMS window and even query sys.databases to ensure those databases are factually not present. You are able to confirm that none or currently present on the server yet the script continues to fail. How could this be? Clearly, the images are each very different.

From the database review, you know there is only one other database with an emoji style name – 👺. What could it be? Well, as it turns out, it is the collation that was chosen for the SQL Server install. The chosen collation doesn’t support a large enough range  of code points so most emojis end up getting mapped to the same thing when stored in SQL Server (topic for much larger article and covered very completely by Solomon Rutzky on his blog –  – here). The short of the issue is the use of supplementary characters and splitting the code point into 2 surrogate codes. For most emojis, this surrogate system results in the “high” surrogate key being the same. Let’s take a look at it.

Notice how each of the monkeys plus the devil mask all translate to the same code point when using the SQL_Latin1_General_CP1_CI_AS (default collation)? This is the root of the problem. If we change to a different server with a better collation (e.g. Latin1_General_100_CI_AS_SC) then we will see vastly different results.

First, let’s validate that the collation on this other instance truly is displaying the correct code points (and it is) instead of just the high surrogate key.

Then let’s go ahead and create each of those three high security databases (seenoevil, speaknoevil, and hearnoevil). Et voila! We now have each of those new databases created and they are indeed unique as we desired (as you can see in the preceding image).

If you want to learn all of the hard core nitty gritty behind this, I recommend reading the blog by Solomon Rutzky – here. Solomon knows his 💩 about the collations and can answer or figure out how to answer just about anything you could throw at him about the topic. His blog is a WEALTH of in-depth knowledge with all sorts of tests and proofs. Check it out!

Put a bow on it

Working as a DBA and creating databases doesn’t have to be the same old hum drum day in and day out. This article showed how to have a little fun while creating databases and database objects. At the same time, I showed how the collation of your server can impact some of your naming schemes.

Interested in learning about some deep technical information instead? Check these out!

Want to learn more about your indexes? Try this index maintenance article or this index size article.

This is the ninth article in the 2019 “12 Days of Christmas” series. For the full list of articles, please visit this page.

 

*Post Mortem from Solomon who is the collation genius

And, that brings us to the next fun fact: being able to use more than 1 emoji for database names has nothing to do with whether or not SQL Server sees a surrogate pair as being a single Supplementary Character or merely two surrogate code points. Using SCA collations only affects the behavior of the built-in functions, even though the official documentation says that they help in sorting and comparison (the doc is wrong and I haven’t had time to submit a correction). Being able to name more than 1 database one or more emoji characters is entirely based on how the characters compare to each other, and that is a function of sort weights, which are managed separately. Sort weights do map to code points, but they can change in value depending on the locale/culture being used, as well as the sensitivities selected (i.e. case, accent, etc). The problem you run into with at least most of the version 80 collations (i.e. all Windows collations without a version number in their names, and all SQL Server collations) is simply that they never defined any sort weights for either supplementary code points, or even the surrogate code points used to create surrogate pairs. And with no sort weights defined, their sort value is always 0, and since they are all “0”, they all equate to not only each other, but to anything else with a sort weight of “0”, even an empty string.
Starting with the version 90 collations (even without the “_SC” flag in the name), sort weights were added to the surrogate code points (not to supplementary characters) so that they could at least be distinguished from each other in sorting and comparison operations. And this is why:
  1. you were not able to create the 3 monkey DBs using SQL_Latin1_General_CP1_CI_AS

  2. you were able to create the 3 monkey DBs using Latin1_General_100_CI_AS_SC

  3. you can create those 3 monkey DBs using SQL_Latin1_General_CP850_BIN2, even with that being a SQL Server collation, because binary collations don’t use sort weights, but instead just go by each byte or code unit (depending on BIN or BIN2, respectively)

Cannot Open Backup Device

Your success as a DBA is directly relational to your ability to ensure proper Database backups are occurring. There is a whole slew of considerations that go into effect when discussing what a proper Database backup actually is. To drill it down to its most basic definition, a proper database backup is one that fulfills business requirements for recovery, retention, and has been routinely tested to ensure the backup can be restored.

Remember, a backup that cannot be restored is factually not a backup. Why? Well, going back to the definition of a backup, it is a fail-safe mechanism to be implemented in the event of a failure. If it cannot be implemented (in this case restored), then it provides no value because it cannot be used.

What about when you run into a problem even getting to the point of getting a successful backup operation? For instance, this error can be a bit troubling and may just cause you to lose plenty of time troubleshooting it.

Msg 3201, Level 16, State 1, Line 3
Cannot open backup device ‘\\SomeServer\C$\MSSQL14\MSSQL\DATA\Full_Backup2.bak’. Operating system error 1326(The user name or password is incorrect.).
Msg 3013, Level 16, State 1, Line 3
BACKUP DATABASE is terminating abnormally.

Sure, the error seems to report plenty of adequate information, but you may be surprised at the pain this particular error can cause.

Backup Device Permissions

If you search long enough you will come across numerous articles or forum posts that all pretty much have the same direction. They all point to the problem being a permissions issue. There are a few concerns with the permissions stigma however. In a recent issue for a client, we could verify that permissions were indeed not at fault. To ensure the permissions were not at fault, we granted local admin in the OS as well as sysadmin within SQL Server for the service account. In addition, we also tried a proxy account with full blown access in Windows and in SQL.

To take it a step further, we also ensured that the special permissions for service accounts were also in effect. Those permissions are:

  1. Permission to bypass traverse checking (SeChangeNotifyPrivilege)
  2. Permission to replace a process-level token (SeAssignPrimaryTokenPrivilege)
  3. Permission to adjust memory quotas for a process (SeIncreaseQuotaPrivilege)
  4. Permission to access this computer from the network (SeNetworkLogonRight)

None of this seemed to have any effect when trying to perform a database backup in SQLServer. In our case, we also ensured that the service account had full blown access to the folders as well.

Following the traditional troubleshooting measures, I created a virtual backup device as well only to get the same error. I tried to use the UNC from a remote server and it worked perfectly fine for all accounts in question. What could be going on? Then, I figured why not try the UNC from the server itself. Finally, a clue! Suddenly I was getting prompted for my credentials but my credentials would not work no matter what. Just for giggles, I verified my account was not locked out, and indeed my account was working just fine.

As it turns out, sometimes the problem is not really a permissions issue but it is a veiled security issue. You see, if I use the actual server name instead of the CNAME in all cases above, suddenly everything worked. The issue turns out to be more along the lines of a Windows issue and is documented here (still valid on Windows 2016).

Here is the basic SQL backup test script that led us down the path to the CNAME being the issue.

Put a bow on it

When running into error 1326, it makes plenty of sense to try to create a backup dump device (only as a testing exercise) as well as test the connectivity to the UNC path from the local server instead of a remote server. In my case, the problem was the local server was blocking access to a CName version of the server.

Interested in learning about some deep technical information instead? Check these out!

Want to learn more about your indexes? Try this index maintenance article or this index size article.

This is the eighth article in the 2019 “12 Days of Christmas” series. For the full list of articles, please visit this page.

«page 2 of 137»

Calendar
February 2020
M T W T F S S
« Jan    
 12
3456789
10111213141516
17181920212223
242526272829  

Welcome , today is Saturday, February 22, 2020