Getting the best performance from the Hash Table



930 views Hash Table Internals



Hash Tables are designed to give a constant time performance and to do this, it needs to have a large number of slots available. So, which factors decide its performance?

Load Factor

Load Factor is a quantification that makes it simple for us to tell how loaded the Hash Table is, and it is just a simple division of the number of keys and the number of slots in the hash table.

As the load factor increases, the performance of the Hash Table decreases. It happens purely because it takes longer for us to do a slot lookup and find an empty slot to place the key.

The Best Strategy

Every probing strategy or collision resolution strategy has its merit and demerit, and they all perform the best in a certain condition and the worse in others. Let’s take a detailed look.

Chained Hashing

Chained Hashing is costly, as it requires us to do a linear traversal of the linked list to find the key we are looking for. As the collisions increase, the lookup time shoots up, degrading the performance.

Chained Hashing is not cache-friendly, as it requires us to do random lookups in the memory while hopping from one linked list node to another.

Double Hashing

Evaluating two hash functions requires extra CPU cycles that could get taxing. Double hashing is also not cache-friendly, as it requires us to jump across the Hash Table to hunt an empty slot.

The optimal strategy is contextual. If the performance of the Hash Table is critical, then we need to experiment, tune, and evaluate the best that fits us.

Lookup Time vs Load Factor

Lookup Time is the most critical metric in evaluating the performance of the Hash Table; when we benchmark Lookup Time vs Load Factor, we would see

  • perf of Open Addressing degrades as the load factor increases
  • perf of Chained Hashing degrades gracefully with load factor
  • Linear Probing would be slower than Double Hashing
  • Probes required for Double Hashing would be shorter

Making Chained Hashing cache efficient

Chained Hashing is known for being cache-inefficient, as it requires us to traverse through linked list nodes that may be present across the heap. Can we somehow make it cache efficient?

To make Chained Hashing cache-friendly, we have to ensure that the nodes of the linked list are allocated contiguously instead of randomly. Hence, instead of allocating one node at a time, we allocate the space for 5 nodes (like an array) at a time and then form the linked list out of them.

This would make the linked list leverage the CPU cache well and ensure our iterations are efficient as the next nodes will be available in the CPU cache, not requiring us to fetch them from the main memory.


Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 38000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like



Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


Paid Courses

System Design for Beginners

A masterclass that helps early engineers and product managers become great at designing scalable systems.

300+ learners

Details →

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

1000+ learners

Details →

Redis Internals

Learn internals of Redis by re-implementing some of the core features in Golang.

98+ learners

Details →

Free Courses

Designing Microservices

A free playlist to help you understand Microservices and their high-level patterns in depth.

823+ learners

Details →

GitHub Outage Dissections

A free playlist to help you learn core engineering from outages that happened at GitHub.

651+ learners

Details →

Hash Table Internals

A free playlist to help you understand the internal workings and construction of Hash Tables.

1027+ learners

Details →

BitTorrent Internals

A free playlist to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

692+ learners

Details →