Table of Contents
- What's so great about hash tables?
- What do hash tables look like IRL?
- How do hash tables work?
- So what's the catch?
- How do I implement a hash table?
- Where can I learn more?
What's so great about hash tables?
Hash tables are an excellent data structure to use when you're working with 2 related types of data. This relationship is established by using key-value pairs, where one data type is assigned as the key, and the other data type is assigned as the value.
Hash tables are ideal when you need to insert
, search
, and delete
objects from your dataset often. In fact, hash tables perform these operations so efficiently that they have an average time complexity of O(1)βthat's right, constant time!
What do hash tables look like IRL?
Hash tables have many real-world applications, but here are some practical examples.
- Mapping a price to an item
- Mapping web addresses to IP addresses (DNS Resolution)
- Mapping an ID number to an employee
- Mapping a hotel room number to a guest
- Mapping a piece of music to a composer (see implementation below)
How do hash tables work?
Hash tables go by many names: dictionary, associative array, map, hash map, hash, etc. They're built into most programming languages but it's important to understand how they work under the hood, even if you never have to implement one from scratch.
Hash tables are powered by 2 components: a hash function and an array table. Here's a breakdown of how these components process and store data.
- A key-value pair is passed to the hash function.
- The hash function hashes the key to find an index location.
- The value is stored in the hash table at the hashed index location.
- Repeat Steps 1-3 until all key/value pairs are stored.
- When the table gets full, the array is doubled in size. All previously stored data is re-hashed and stored in (potentially) new index locations.
Using the hash table structure that's built into a programming language means that you don't have to worry about:
- choosing a hash function
- handling collisions
- resizing your array
Why are hash tables so fast?
When you need to look up a value, the hash table will perform a direct lookup of the key that corresponds to that value. There is no need to sort or iterate through a list to search
for the data!
Tip: If you're interested in building your own hash functions and tables, check out William Fiset's Easy to Advanced Data Structures course. You'll learn how to choose the right hash function for your use case, as well as different techniques for managing collisions.
So what's the catch?
While hash tables provide many benefits, they're not suited for every situation. Here are a few important things to keep in mind.
- Hash tables aren't meant for ordered datasets.
- Hash tables won't work if you want to allow duplicate entries.
- Collisions will occur. Make sure your keys are unique to reduce the chance of collisions.
- Performance will slow as the hash table fills up.
How do I implement a hash table?
This one's for my fellow musicians! Let's set up a music catalogue that stores a piece of music as the key, and a composer as the value.
As part of my 2021 goals, I implemented the example below in both Python and JavaScript.
Example 1: Music catalogue in Python
#############################################
#create a hash table for the music catalogue
music_catalogue = dict()
#add a piece of music to the catalogue
music_catalogue["Symphony No. 5"] = "Beethoven"
#function to search the catalogue for a composer
def composer_search(piece):
#retrieve the name of the composer if s/he is in the catalogue
#this step isn't necessary but it's easier to visualize
composer = music_catalogue.get(piece)
#if the composer is in the catalogue, print his/her name
if (composer):
print(piece + " was written by " + composer)
else:
print("Error: composer not found!")
Now let's try out this catalogue by calling the composer_search
function.
########################################################
#someone wants to find out who composed "Symphony No. 5"
piece = "Symphony No. 5"
#check the registry for "Symphony No. 5"
composer_search(piece)
The console output will read: Symphony No. 5 was written by Beethoven
.
Example 2: Music catalogue in JavaScript
/////////////////////////////////////////////
//create a hash table for the music catalogue
musicCatalogue = {};
//add a piece of music to the catalogue
musicCatalogue["Symphony No. 5"] = "Beethoven";
//function to search the hash table for a composer
function composerSearch(piece) {
//retrieve the name of the composer if s/he is in the catalogue
//this step isn't necessary but it's easier to visualize
const composer = musicCatalogue[piece];
//if the composer is in the catalogue, print his/her name
if (composer) {
console.log(`${piece} was written by ${composer}`);
} else {
console.log(`Error: composer not found!`);
}
}
Now let's try out this catalogue by calling the composerSearch
function.
/////////////////////////////////////////////////////////
//someone wants to find out who composed "Canon in D"
const piece = "Canon in D"
//check the registry for "Canon in D"
composerSearch(piece);
The console output will read: Error: composer not found!
Where can I learn more?
Here are some great resources to dive deeper into hash tables and hash functions.
Books:
- Grokking Algorithms by Aditya Bhargava
- Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein
Courses:
- Easy to Advanced Data Structures by William Fiset
Blogs:
- Taking Hash Tables Off The Shelf by BaseCS
- Hashing Out Hash Functions by BaseCS
Happy Coding! π©π½βπ»