Perils of synthetic test data

Data in it’s natural habitat

I was recently involved in a query tuning work where we used synthetic, rather than production data, to validate the results of our query and index tuning work. We faced some issues with the generated data that had quite a severe impact on our testing, and that prompted me into writing this blog post. Lets start by first defining what is synthetic data. In my view synthetic data is data that resembles actual production data, but is artificial/generated. I have seen similar (and also more detailed) definitions elsewhere and I think it is a good one.

I also like to point out that there are plenty of good reasons for using synthetic data in testing, as production data is often strictly regulated and not easily available for testing purposes.  However, you need to be certain that the synthetic data you are using is similar to what you have in production.

Continue reading “Perils of synthetic test data”

Basics of the blockchain technology

Blockchain image

While I normally blog about SQL Server or topics that closely relate to it in some way, I decided to make a small exception this time. Today, I will be writing about blockchain. Granted it’s not a huge jump outside my usual themes, as we’re still talking about database technology. So why I am writing about the blockchain, is it because it’s new and cool technology that everyone else is talking about? Admittedly that is part of, but I also wanted to have some of my thoughts and questions about the blockchain in an easy to find place.

Continue reading “Basics of the blockchain technology”