In this post, our goal is to create an understanding of how HashMaps work internally. More often that not, we use HashMaps without knowing much about its inner workings. However, in many interview settings, a developer is expected to know about it. Also, while writing real production code, it is a good practice to be aware of the advantages and disadvantages of the data structure.

Earlier, we looked into the basics of Java HashMaps and their usage in a detailed manner. In case you are not aware of HashMaps or have never used them before, you can refer to that post first and then start with this one.

1 – The Core Concept of a HashMap

The core concept of a HashMap is to fetch an entry from a list based on a key.

How would we implement such a mechanism?

One approach would be create a list and whenever a get request comes, we loop through every element in the list and return when the key matches. Such an approach would have time complexity of O(n). However, the defining quality of a HashMap is that get and put operations have constant time complexity.

array traversal
List Traversal

Clearly, the first approach does not serve our purpose. Therefore, we need a better approach.

1.1 – The Approach

Iterating over every element in a list leads to linear time complexity. In a HashMap, we avoid this traversal by calculating the position of an element based on its key.

Let’s assume we have only 26 places in our list. We assign the lowercase alphabets as the keys. So a points to position 1, b points to 2 and so on.

So, if we want to find the value for key c, we can simply compute the position. As we can see, c will point to position 3 and we can retrieve the value.

hashmap key resolution
Resolving Keys in HashMap

However, this approach would not be very effective. For starters, if we have a bigger key-space such as integer values, we would have 2,147,483,647 unique keys. However, in most cases, the number of elements we would like to store would be much less leading to a huge wastage of space. Also, even if we define a different key-space, we would find it difficult to size it properly.

HashMaps get around this problem by placing the elements in buckets. The number of buckets is known as the capacity of a HashMap.

how hashmaps work internally

1.2 – HashCode and Equals

In order to determine which bucket an element should be placed inside, HashMaps use something known as hashing. Basically, hashing is the key to how HashMaps work internally.

But what really is hashing?

In its simplest manner, hashing is a way to assign a unique code for any object after applying some sort of formula or algorithm to its properties.

A proper hash function should follow the below rule:

A Hash function should return the same hash code every time when the function is applied on same or equal objects. In other words, two equal objects must produce the same hash code each and every time.

Now, coming to the question of hashCode() and equals() method, every Java object inherits these methods from the Object class. The hashCode() function here produces a suitable hashing code by converting the internal address of an object into an integer thereby producing different codes for different objects.

In case we are defining our own Java class and intend to use such an object as a key in the HashMap, we need to provide an implementation of the hashCode() and equals() method.

See below code for reference:

@Override
public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Book book = (Book) o;
        return bookName.equals(book.bookName) && authorName.equals(book.authorName);
}

@Override
public int hashCode() {
        return Objects.hash(bookName, authorName);
}

2 – Handling Collisions in a HashMap Key

In a HashMap, equal keys must have the same hash code. However, it is also possible to have different keys produce the same hash code.

The above scenario will lead to collisions while determining the bucket where the element should be placed.

So, how do we get around this problem?

Well, the simple answer is that we don’t.

If two different keys produce the same hash code, we place them in the same bucket. If one bucket contains more than 1 element, the HashMap stores them in the form of a list. Typically, a linked list is used.

As of Java 8, if a bucket has more than 8 elements, the underlying data structure changes from a linked list to a balanced tree. This improves the performance of retrieval to O(log n) from O(n).

3 – Capacity and Load Factor

As we can see, having buckets with many elements degrades the performance of the HashMap. As a rule, we strive for constant time complexity but even then there would be cases when the worst case time complexity of fetching or inserting an element can be O(n) for linked lists and O(log n) for balanced trees.

Therefore, the whole premise of keeping a good performance is to ensure that collisions are less. To achieve this, the capacity of a HashMap is doubled when 75% of the buckets become non-empty. Here, 75% is the load factor and initial capacity is 16. These attributes can also be customised while instantiating a HashMap.

See below example:

Map<String, Book> booksByName = new HashMap<>(20, 0.60f);

With this, we have successfully completed understanding how HashMaps work internally.

If you have any comments or queries about this, please feel free to mention in the comments section below.


Saurabh Dashora

Saurabh is a Software Architect with over 12 years of experience. He has worked on large-scale distributed systems across various domains and organizations. He is also a passionate Technical Writer and loves sharing knowledge in the community.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *