Hiding (mostly) in plain sight: Dynamic Data Masking

Masks on!

One of the more recent additions to SQL Server security features is the Dynamic Data Masking (DDM), included with the 2016 version. Like the Transparent Data Encryption I blogged about recently, DDM is a feature that is relatively easy to implement, and doesn’t require a lot of changes to the application. And just like pretty much everything is easy in a real life, it too has some limitations.

Dynamic Data Masking

DDM works by obfuscating parts of the query results based on the function used and the permissions the user have. The best way to think of the feature is, that rather than encrypting the data or changing it inside the database, it’s just limiting your view on it. Because of this, it’s not really the feature that will provide lot of extra security by itself and should be used together with other security implementations.

DDM will also work out-of-the-box with many applications due to fact that the database contents are not changed, just the query results.

There are however quite a few cases where it can simplify processes a great deal with minimum effort. Let’s say f.ex. that you have a situation where you need to allow an IT staff or a Developer to view tables that has Personally Identifiable Information (PII) data that they shouldn’t see in some of the columns. Then DDM might be the solution for that.

Let’s see how to implement something like that!

Implementing Dynamic Data Masking

Implementing DDM is quite easy even into existing tables, let’s use StackOverflow database and it’s User table as an example. There are few columns that look like good candidates for this purpose, DisplayName and Location

Now I see it!

Adding a data mask is pretty much just executing two commands. First we add masking with default function to DisplayName

ALTER TABLE dbo.Users
ALTER COLUMN DisplayName ADD MASKED WITH (FUNCTION = 'default()');

After which we then add one more mask to Location using the partial function. I’ll have bit more information about the various functions later on.

ALTER TABLE dbo.Users
ALTER COLUMN Location ADD MASKED WITH (FUNCTION = 'partial(5, "YYYY", 0)');

And there you have it! Almost anyway, you’ll still need to do some work with the permissions as the users with UNMASK are the only ones who can view the data as it is. So let’s say that we have a developer we want to grant access to view some of the data, but mask the important bits.

CREATE USER [Developer] WITHOUT LOGIN
GRANT SELECT ON dbo.Users TO Developer;

To simplify this a bit, I am just creating a user but not a login. It’ll be enough for the demo purposes. In real life, you’d be creating the login and then adding it as a user to the database. Let’s see what happens when we run a query using the Developer user.

EXECUTE AS USER = 'Developer';
SELECT TOP 20
       U.Id
     , U.DisplayName
     , U.LastAccessDate
     , U.Location
FROM dbo.Users AS U
WHERE Id > 0;
REVERT;

And there it is, masking the two columns just the way we set it up. Default function we used for DisplayName replaces the text with 4 X’s and partial masking we added for Location keeps first 5 letters, then replacing everything after that with 4 Y’s.

And now I don’t! At least not all of it…

To allow users to see the actual values, we need to give them the UNMASK permissions. Let’s create another database user for this purpose.

CREATE USER [SuperDeveloper] WITHOUT LOGIN;
GRANT SELECT ON dbo.Users TO [SuperDeveloper];
GRANT UNMASK TO [SuperDeveloper];

Less surprisingly, when we run the following query

EXECUTE AS USER = 'SuperDeveloper';
SELECT TOP 20
       U.Id
     , U.DisplayName
     , U.LastAccessDate
     , U.Location
FROM dbo.Users AS U
WHERE Id > 0;
REVERT;

We once again see everything without the masking applied. And that’s pretty much it. Next let’s look at briefly at the different functions we can use for the DDM.

Dynamic Data Masking Functions

We’ve already looked at two of the masking functions in DDM, the default and the partial. There are also two other functions called random and email. The email function is quite easy to guess on what it is used for, emails! It exposes first letter of an email address and replaces rest with XXX@XXX.COM, so if your email would be john.doe@nowhere.com, it’d appear jXXX@XXXX.com after being masked.

Random function is meant for masking numeric values and it does this by, as you probably already guessed it, using random numbers. All these different masking functions are actually quite nicely documented by Microsoft, so for the curious readers: You can find the full details of the masking functions behind this link.

Limitations of Dynamic Data Masking

As already mentioned earlier, DDM has very good things going for it. It’s relatively straightforward and easy to implement and in most cases it’s also transparent to the applications. However, things that sound this great usually have some drawbacks, and this same applies to DDM.

The weak part of the DDM is, that it’s only changing how the data is displayed. This means that you can just brute force your way around it. Let’s say that we want to find out users who are located in New York.

EXECUTE AS USER = 'Developer';
SELECT TOP 20
       U.Id
     , U.DisplayName
     , U.LastAccessDate
     , U.Location
FROM dbo.Users AS U
WHERE Location = 'New York'
REVERT;
I wonder where these users are located in…

DDM isn’t actually blocking you from finding these things out, it’s just making it little more inconvenient. While this was quite a simple example, you could also find out things like salaries hidden behind random masking by using BETWEEN in the WHERE clause.

So while definitely a useful feature in many cases, don’t rely on this feature alone to keep your data safe and protected.

Author: Mika Sutinen

Hi, My name is Mika Sutinen and I'm a Cloud DBA Team Lead at the company called Nordcloud. I've been working in IT-industry for two decades, and I've spend most of my career working with healthcare information systems. I've worked with SQL Server for most of my career, starting with version 6.5 a long, long time ago. My other interests are high availability, everything related to performance (testing, monitoring, etc), Windows operating systems and I'm currently learning more about Azure.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.