Hash join

The hash join is an example of a join algorithm and is used in the implementation of a relational database management system. All variants of hash join algorithms involve building hash tables from the tuples of one or both of the joined relations, and subsequently probing those tables so that only tuples with the same hash code need to be compared for equality in equijoins.
Hash joins are typically more efficient than nested loops joins, except when the probe side of the join is very small. They require an equijoin predicate.

Classic hash join

The classic hash join algorithm for an inner join of two relations proceeds as follows:

First, prepare a hash table using the contents of one relation, ideally whichever one is smaller after applying local predicates. This relation is called the build side of the join. The hash table entries are mappings from the value of the join attribute to the remaining attributes of that row.
Once the hash table is built, scan the other relation. For each row of the probe relation, find the relevant rows from the build relation by looking in the hash table.

The first phase is usually called the "build" phase, while the second is called the "probe" phase. Similarly, the join relation on which the hash table is built is called the "build" input, whereas the other input is called the "probe" input.
This algorithm is simple, but it requires that the smaller join relation fits into memory, which is sometimes not the case. A simple approach to handling this situation proceeds as follows:

For each tuple in the build input
# Add to the in-memory hash table
# If the size of the hash table equals the maximum in-memory size:
## Scan the probe input, and add matching join tuples to the output relation
## Reset the hash table, and continue scanning the build input
Do a final scan of the probe input and add the resulting join tuples to the output relation

This is essentially the same as the block nested loop join algorithm. This algorithm scans eventually more times than necessary.

Grace hash join

A better approach is known as the "grace hash join", after the GRACE database machine for which it was first implemented.
This algorithm avoids rescanning the entire relation by first partitioning both and via a hash function, and writing these partitions out to disk. The algorithm then loads pairs of partitions into memory, builds a hash table for the smaller partitioned relation, and probes the other relation for matches with the current hash table. Because the partitions were formed by hashing on the join key, it must be the case that any join output tuples must belong to the same partition.
It is possible that one or more of the partitions still does not fit into the available memory, in which case the algorithm is recursively applied: an additional orthogonal hash function is chosen to hash the large partition into sub-partitions, which are then processed as before. Since this is expensive, the algorithm tries to reduce the chance that it will occur by forming the smallest partitions possible during the initial partitioning phase.

Hybrid hash join

The hybrid hash join algorithm is a refinement of the grace hash join which takes advantage of more available memory. During the partitioning phase, the hybrid hash join uses the available memory for two purposes:

To hold the current output buffer page for each of the partitions
To hold an entire partition in-memory, known as "partition 0"

Because partition 0 is never written to or read from disk, the hybrid hash join typically performs fewer I/O operations than the grace hash join. Note that this algorithm is memory-sensitive, because there are two competing demands for memory. Choosing too large a hash table might cause the algorithm to recurse because one of the non-zero partitions is too large to fit into memory.

Hash anti-join

Hash joins can also be evaluated for an anti-join predicate. Depending on the sizes of the tables, different algorithms can be applied:

Hash left anti-join

Prepare a hash table for the NOT IN side of the join.
Scan the other table, selecting any rows where the join attribute hashes to an empty entry in the hash table.

This is more efficient when the NOT IN table is smaller than the FROM table

Hash right anti-join

Prepare a hash table for the FROM side of the join.
Scan the NOT IN table, removing the corresponding records from the hash table on each hash hit
Return everything that left in the hash table

This is more efficient when the NOT IN table is larger than the FROM table

Hash semi-join

Hash semi-join is used to return the records found in the other table. Unlike plain join, it returns each matching record from the leading table only once, not regarding how many matches are there in the IN table.
As with the anti-join, semi-join can also be left and right:

Hash left semi-join

Prepare a hash table for the IN side of the join.
Scan the other table, returning any rows that produce a hash hit.

The records are returned right after they produced a hit. The actual records from the hash table are ignored.
This is more efficient when the IN table is smaller than the FROM table

Hash right semi-join

Prepare a hash table for the FROM side of the join.
Scan the IN table, returning the corresponding records from the hash table and removing them

With this algorithm, each record from the hash table can only be returned once, since it is removed after being returned.
This is more efficient when the IN table is larger than the FROM table

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...