I was recently involved in a query tuning work where we used synthetic, rather than production data, to validate the results of our query and index tuning work. We faced some issues with the generated data that had quite a severe impact on our testing, and that prompted me into writing this blog post. Lets start by first defining what is synthetic data. In my view synthetic data is data that resembles actual production data, but is artificial/generated. I have seen similar (and also more detailed) definitions elsewhere and I think it is a good one.
I also like to point out that there are plenty of good reasons for using synthetic data in testing, as production data is often strictly regulated and not easily available for testing purposes. However, you need to be certain that the synthetic data you are using is similar to what you have in production.
PASS Summit 2017 is now well behind us and there has been good time to reflect on this years conference. First of all, I have to say that Seattle as a venue is a good choice, even though for some of us that is a long way to travel. To me, it is about 20 hours from SEA-TAC to my home with the flights and driving, doable but not necessarily pleasant. As for the conference itself, I feel that it keeps getting better every year. It has definitely changed a quite a bit from my first Summit back in 2013. But so is the world where data platform professionals live, and the products we work with.
Here are some highlights from the PASS Summit 2017!
As we all know there are many features in SQL Server that have been deprecated over the time by Microsoft for one reason or another. In fact, there is a long list of features that are deprecated in the latest SQL Server 2017 release.
It is far less often that any of these features make a comeback, however that can apparently happen, as I just witnessed last week.
I was recently looking at some Execution Plans with a co-worker and we ended up discussing the different types of joins in a SQL Server and what implications they might have when it comes to query performance. While many of us are familiar with writing joins, as we usually don’t query just a single table, there are quite few things about the physical joins that may not quite obvious. In this post our focus will be in the physical joins, but we will also very briefly look at the different types of logical joins also.
SQL Server has had the Data Compression feature for a while, ever since the version 2008, so it is hardly a new thing. However as it has been Enterprise Edition feature until SQL Server 2016 Service Pack 1, it is not something you see employed very often. Technically speaking, you could also compress data before 2008 by using NTFS file level compression on a read only data. However with the implementation of SQL Server Data Compression you could now do it inside the database on a page or a row level.
For the last blog post of the year 2016 I chose something that has been bothering me a bit as of late. Over the past couple months I have come across a number of cases where, after migrating databases to a new server, the end users are reporting increasingly bad performance. What has been common to these situations is that the new server has more and faster CPU cores, more memory, faster disks and should offer better performance, not worse.
Now the first thing I usually do when troubleshooting performance issues, is to check what kind of hardware we have and how the SQL Server has been configured. This is a really simple step to start with: Run msinfo32 to get the hardware details and then query sys.configurations to see how the SQL Server is configured. In these cases, all of the SQL Server configurations were left to their default settings. The result: Systems running to dangerously low levels of available memory which leads to extensive paging of the memory to the disk, and some really wild-looking query parallelization issues.
One of the new interesting features in SQL Server 2016 is the system-versioning with a catchy name, Temporal Tables. The name of the feature offers a hint to its purpose, to collect and to keep historical information about the data changes inside the database over a period of time. Instead of having just the latest data, we now have a way to know what it has been in any previous point in time. The most obvious use case for Temporal Tables is auditing, or at least it is to me, but they can be used for other purposes as well. Some examples are going back in time in case you need to perform a data recovery from an error or a data change operation gone bad, or you might want to use it for reporting purposes.
There are few things about the Temporal Tables that are good to know. First of all the work needed to keep track of the changes is done by the database engine when the system-versioning feature is enabled. Another thing some of you might appreciate is that the feature is available also in Standard Edition. And finally my favorite, it’s actually rather easy to start using the system-versioning as we will discover shortly.