Perils of synthetic test data

binary-code-475664_640
Data, data, data

I was recently involved in a query tuning work where we used synthetic, rather than production data, to validate the results of our query and index tuning work. We faced some issues with the generated data that had quite a severe impact on our testing, and that prompted me into writing this blog post. Lets start by first defining what is synthetic data. In my view synthetic data is data that resembles actual production data, but is artificial/generated. I have seen similar (and also more detailed) definitions elsewhere and I think it is a good one.

I also like to point out that there are plenty of good reasons for using synthetic data in testing, as production data is often strictly regulated and not easily available for testing purposes.  However, you need to be certain that the synthetic data you are using is similar to what you have in production.

Continue reading “Perils of synthetic test data”

SQL Server and Joins

puzzle-2-1197936
A join!

I was recently looking at some Execution Plans with a co-worker and we ended up discussing the different types of joins in a SQL Server and what implications they might have when it comes to query performance. While many of us are familiar with writing joins, as we usually don’t query just a single table, there are quite few things about the physical joins that may not quite obvious. In this post our focus will be in the physical joins, but we will also very briefly look at the different types of logical joins also.

 

Continue reading “SQL Server and Joins”