SQL Server and Joins

Published by

on

puzzle-2-1197936
A join!

I was recently looking at some Execution Plans with a co-worker and we ended up discussing the different types of joins in a SQL Server and what implications they might have when it comes to query performance. While many of us are familiar with writing joins, as we usually don’t query just a single table, there are quite few things about the physical joins that may not quite obvious. In this post our focus will be in the physical joins, but we will also very briefly look at the different types of logical joins also.

 

Logical and Physical Joins

As mentioned already, joins can be put into two different categories, logical and physical. The logical joins are the ones we write in our T-SQL queries to connect rows from two or more tables together. The physical joins on the other hand are done by the SQL Server engine in response to our logical joins.

Logical Joins

As we write our queries we commonly use one of the following type of joins to get the results we want:

  • INNER JOIN
  • OUTER JOIN
  • CROSS APPLY
  • CROSS JOIN

There are also two joins that lack explicit syntax and they are called the semi-join and the anti-semi-join. I do have another post draft about the logical joins, so I will not go into deeper details in this post. Instead, we will now look at the…

Physical Joins

As mentioned before, physical joins are created by the SQL Server engine as it goes on to create an execution plan for our T-SQL queries. There are three types of physical joins and they are:

  • Nested Loop
  • Hash
  • Merge

Nested Loop Join

Nested Loop will join the tables by making the one with least rows as an outer table to optimize performance. For each row in this outer table a one-by-one comparison is done to all the rows in the inner table. If there is a match, the row is added to the result set. Nested Loop joins will perform best when the outer table is relatively small, and in case of large inner table, when it has proper indexing.

Hash Join

Hash joins are performed in two phases called Build and Probe. During the Build phase all the rows are read from the input, which is often the table with the least rows in it. This process creates an in-memory hash table. In the Probe phase rows from the second input are read and hashes are created using the same function, after which the results are matched to hash table. If there is a match, the results will be added to output.

While hash joins perform best with very large data sets, as it can parallelize and scale better than the two other joins, it is also the most intensive workload to perform. There are few reasons to this. Sometimes the hashing functions can be complex, out-of-data statistics can cause bad estimates and if your hash table is too large to fit into memory, it might be spooled to TEMPDB increasing I/O load.

Merge Join

Last physical join we look at is the merge join. As the name suggest, in merge join we merge two sorted lists together. Merge joins are most common when two tables are joined by already sorted predicates. They might also occur even when one of the join predicates needs to be sorted. With merge join two input rows are read and compared simultaneously, one row at the time. Equal rows are joined and with rows that are not, the lesser of the inputs is discarded.

Merge joins are very fast, however if there are sort operations involved they can be expensive. With good indexing, merge join is often the fastest algorithm.

And that’s it on the topic of the joins!

One response to “SQL Server and Joins”

  1. […] back I wrote this blog post about SQL Server joins, the focus back then being in the physical ones. So I figured that […]

Leave a Reply

WordPress.com.

Discover more from SQLStarters

Subscribe now to keep reading and get access to the full archive.

Continue reading