I was recently involved in a query tuning work where we used synthetic, rather than production data, to validate the results of our query and index tuning work. We faced some issues with the generated data that had quite a severe impact on our testing, and that prompted me into writing this blog post. Lets start by first defining what is synthetic data. In my view synthetic data is data that resembles actual production data, but is artificial/generated. I have seen similar (and also more detailed) definitions elsewhere and I think it is a good one.
I also like to point out that there are plenty of good reasons for using synthetic data in testing, as production data is often strictly regulated and not easily available for testing purposes. However, you need to be certain that the synthetic data you are using is similar to what you have in production.
SQL Server has had the Data Compression feature for a while, ever since the version 2008, so it is hardly a new thing. However as it has been Enterprise Edition feature until SQL Server 2016 Service Pack 1, it is not something you see employed very often. Technically speaking, you could also compress data before 2008 by using NTFS file level compression on a read only data. However with the implementation of SQL Server Data Compression you could now do it inside the database on a page or a row level.
I was reading this blog post by Thomas LaRock (@SQLRockstar) about a database design mistakes and I’ll warmly recommend that anyone who is involved with database design should read it as well. It also got me into thinking about one database design issue which, in my opinion, is not taken into consideration often enough; Database Administration.