Detailed explanation of the implementation principle of Java HashMap

Author：Eve Cole Update Time：2025-06-10 16:32:01

HashMap is a Map interface implementation based on hash tables, providing all optional mapping operations, and allowing null values and null construction, out of synchronization and no guarantee of mapping order. Let’s record the implementation principle of HashMap.

HashMap internal storage

In HashMap, all key-value pair relationships are stored by maintaining an array of instantaneous variables table (also known as bucket). The bucket is an array of Entry objects. The bucket size can be resized as needed, and the length must be a power of 2. The following code:

 /** * An empty entry array, the default value of the bucket*/ static final Entry<?,?>[] EMPTY_TABLE = {}; /** * Bucket, resize as needed, but must be a power of 2*/ transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

Initial capacity and load factor

HashMap has two parameters that affect performance, initial capacity and load factor. Capacity is the number of buckets in the hash table, the initial capacity is only the capacity of the hash table when it is created, and the load factor is a scale where the hash table can reach how full before its capacity automatically increases. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table must be rehashed (i.e., rebuild the internal data structure), and the reconstruction is created at twice the number of current capacity. The initial capacity and load factor can be set through the constructor. The default initial capacity is 16 entries, the maximum capacity is 2^30 entries, and the default load factor is 0.75

A bucket is like a bucket that stores water. Its default initial storage capacity is 16 units of water. When the water is poured to 16*0.75, the capacity will be expanded first to 32 units when the data is added next time. 0.75 is the load factor, and the initial capacity and load factor can be set when creating a bucket. The maximum capacity of a bucket is 2 units of water to the power of 30. When the number of initial capacity settings is greater than the maximum capacity, the maximum capacity shall prevail. When expanding, it will be returned directly if it is greater than or equal to the maximum capacity.

The following is part of the source code of HashMap, which defines the default initial capacity, load factor and some other constants:

 /** * The default initial capacity must be the power of 2. */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * Maximum capacity, if the initial capacity is greater than the maximum capacity through the constructor parameter, the capacity will also be used as the initial capacity* Must be the power of 2 and less than or equal to 30th power of 2*/ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The default load factor can be specified by the constructor*/ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * An empty array table, when the bucket is not initialized*/ static final Entry<?,?>[] EMPTY_TABLE = {}; /** * Bucket, stores all key-value pair entries, and can be resized as needed, and the length must be a power of 2*/ transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE; /** * The number of key-value pairs in the Map, the size will be +1 or -1 operation every time it is added or deleted. */ transient int size; /** * The critical value of load value, which needs to be resized is: (capacity * load factor). After each resize, the new capacity will be calculated using the new capacity. * @serial */ // If table == EMPTY_TABLE then this is the initial capacity at which the // table will be created when inflated. int threshold; /** * Load factor, if not specified in the constructor, the default load factor is used, * * @serial */ final float loadFactor; /** * Number of times HashMap structure modifications, change the number of maps in the HashMap or modify its internal structure when the structure is modified (for example, the * rehash method, rebuild the internal data structure). This field is used in * The iterators generated on the HashMap collection view are processed into fast failed*/transient int modCount;

Initial capacity and load factor performance adjustment

Typically, the default load factor (0.75) seeks a compromise in time and space costs. Although the load factor is too high, it also increases query cost (this is reflected in most HashMap operations, including get and put operations). When setting the initial capacity, the number of entries required in the map and its load factor should be taken into account in order to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, the rehash operation will not occur.

If many mapping relationships are to be stored in a HashMap instance, creating it with a large enough initial capacity will make the mapping relationship more efficiently stored relative to performing automatic rehash operations on demand to increase the capacity of the table.

The following is the code to rebuild the HashMap data structure:

 void resize(int newCapacity) { Entry[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { // If the capacity has reached the maximum limit, then set the load value and return directly to threshold = Integer.MAX_VALUE; return; } // Create a new table to store data Entry[] newTable = new Entry[newCapacity]; // Transfer the data from the old table to the new table, this step will take a lot of time to transfer(newTable, initHashSeedAsNeeded(newCapacity)); table = newTable; // Finally set the load value of the next resize threshold = (int) Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);}

HashMap construction method

The fourth construction method is to create a new HashMap with an existing map. Let's talk about it later. In fact, the first three construction methods are called in the final third method with two parameters. If no parameters are passed, the default value is used. The code is as follows:

 /** * Constructs an empty <tt>HashMap</tt> with the default initial capacity * (16) and the default load factor (0.75). */ public HashMap() { this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR); } /** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and the default load factor (0.75). * * @param initialCapacity the initial capacity. * @throws IllegalArgumentException if the initial capacity is negative. */ public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } /** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @param initialCapacity the initial capacity * @param loadFactor the load factor * @throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " +initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " +loadFactor); this.loadFactor = loadFactor; threshold = initialCapacity; init(); }

As can be seen from the above, in the constructor, if the initial capacity is greater than the maximum capacity, the maximum capacity will be replaced directly.

put method

Next, let’s take a look at the more important parts of HashMap

 /** * In this map, the specified value and the specified build are associated. If the map previously contained a mapping relationship of the key, the old value was replaced* * @param Specifies the key to be associated* @param Specifies the value to be associated* @return The old value associated with the key, if the key has no mapping relationship, it returns null (returning null may also mean that the mapping was associated with the key before) */ public V put(K key, V value) { if (table == EMPTY_TABLE) { inflateTable(threshold); } if (key == null) return putForNullKey(value); int hash = hash(key); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }

1. First, in the put method, first determine whether the bucket is in the default uninitialized state. If it is not initialized, call the inflateTable method to initialize it, and then determine whether the parameter key is null. If it is null, call putForNullKey specifically to put the data with key as null. The putForNullKey method is actually the same as the following step 3, except that the default storage location of the data with key as null is the first, that is, the default subscript is 0.

2. If the key is not null, call the hash() method to obtain the hash value of the key. You can calculate the position where the key can be placed in the bucket based on the hash value and the length of the bucket.

3. There is an attribute next in the Entry object, which can form a one-way linked list to store elements with the same hash value. Therefore, when the hash value of the key is calculated, the storage location will also be repeated. Just judge whether the element of the storage location and the next attribute list of the element is exactly the same as the hash value of the given key and key. If there is a completely consistent one, it means that it already exists, replace the old value and return the old value directly as the return value.

4. Increase the number of structural modifications by 1

5. Call the addEntry method to add the new key-value pair to the HashMap. The addEntity method first determines whether the current entry data is greater than or equal to the load value (the capacity of the bucket * load factor) and the specified position of the bucket is not null. If it is already greater than and the specified position is not null, the capacity of the adjusting bucket is twice the current capacity. The capacity of the adjusting bucket is referenced to the initial capacity and load factor performance adjustment directory above. Recalculate the Hash value and calculate the storage location. Call createEntry method to store it in the bucket

 void addEntry(int hash, K key, V value, int bucketIndex) { if ((size >= threshold) && (null != table[bucketIndex])) { resize(2 * table.length); hash = (null != key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); } createEntry(hash, key, value, bucketIndex); } void createEntry(int hash, K key, V value, int bucketIndex) { Entry<K,V> e = table[bucketIndex]; table[bucketIndex] = new Entry<>(hash, key, value, e); size++; } /** * Entry constructor method to create a new Entry. */ Entry(int h, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; }

6. In the createEntry method, first get the entry at the specified location, and then generate a new entry. When generating the entry, store the original entry into the next property of the newly generated entry (refer to the Entry construction method), and replace the entry at the specified location with the newly generated one.

Because when adding new entries, the hash value needs to be calculated, and when the length is not enough, the length needs to be adjusted. When there are elements in the calculated storage location, the linked list needs to be stored, so the efficiency of using HashMap to add new operations is not too high.

get method

First, let’s look at the source code of the get method:

 /** * Return the value mapped by the specified key; if this map does not contain any mapping relationship for this key, it returns null * Returning a null value does not necessarily mean that the map does not contain the mapping of the key, and it may also change the mapping to null. You can use the containsKey operation to distinguish between these two situations* @see #put(Object, Object) */ public V get(Object key) { if (key == null) return getForNullKey(); Entry<K,V> entry = getEntry(key); return null == entry ? null : entry.getValue(); } final Entry<K,V> getEntry(Object key) { if (size == 0) { return null; } int hash = (key == null) ? 0 : hash(key); for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } return null;}

The get method is simple to implement, and the following are a few steps:

First, determine whether the key is null. If it is null, call the getForNullKey method to get it. If it is not null, call the getEntry method to get it. The getForNullKey method is basically the same as getEntity, but there is only one step missing, that is, the storage location where the default key is null is the first, that is, the subscript is 0, and the location is not calculated.
The getEntity method calculates the hash value based on the key, and then calculates the storage location using the hash value and the length of the bucket.
getEntity to get the entry at the specified location as the start of the traversal, iterate through the next single linked list of the entry. If the hash value of the entry is consistent with the calculated hash value and the entry key is equal to the specified one, it returns the entry if the hash value of the entry is the same as the calculated hash value and the entry key is equal to the specified one.
According to the value returned by getEntity, the get method returns the corresponding value.

By looking at the source code of get, we can find that the get method calculates the storage location through the hash value of the key and the length of the bucket. Basically, you can locate the element you are looking for. Even if you traverse a few keys with repeated hash values, it is very fast. Because the hash value is relatively unique, HashMap is very fast for searching.

Custom object as key to HashMap

 class User { // ID number protected int idNumber; public User(int id){ idNumber = id; }}public class TestUser{ public static void main(String[] args) { Map<User, String> map = new HashMap<User, String>(); for (int i=0; i<5; i++) { map.put(new User(i), "name: " + i); } System.out.println("User3's name: " + map.get(new User(3))); }}Output: User3 Name: null

As mentioned above, when using a custom User class instance as a HashMap object, the name of User3 cannot be found when printing, because the User class automatically inherits the base class Object, so the hashCode method of Object will be automatically used to generate a hash value, and it uses the object's address to calculate the hash value by default. Therefore, the hash value of the first instance generated by new User(3) is different from the hash value of the second instance generated. However, if you only need to simply override the hashCode method, it will not work properly, unless you override the equals method at the same time, it is also part of the Object. HashMap uses equals() to determine whether the current key is the same as the keys present in the table. You can refer to the get or put method above.

The correct equals() method must meet the following 5 conditions: ---Refer to "Java Programming Thoughts" - page 489

Reflexive. For any x, x.equals(x) must return true
symmetry. For any x and y, if y.equals(x) returns true, then x.equals(y) also returns true.
Transitive. For any x, y, z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) must return true.
Consistency, for any x and y, if the information used for equivalent comparison in the object has not changed, then no matter how many times x.equals(y) is called, the returned result should be consistent, either true or false.
For any x that is not null, x.equals(null) must return false

Again : The default Object.equals() is just the address of the comparison object, so one new User(3) does not equal another new User(3). Therefore, if you want to use your own class as the key to HashMap, you must overload both hashCode() and equals().

The following code works normally:

 class User { // ID number protected int idNumber; public User(int id){ idNumber = id; } @Override public int hashCode() { return idNumber; } @Override public boolean equals(Object obj) { return obj instanceof User && (idNumber==((User)obj).idNumber); }}public class TestUser{ public static void main(String[] args) { Map<User, String> map = new HashMap<User, String>(); for (int i=0; i<5; i++) { map.put(new User(i), "Name: " + i); } System.out.println("User3's name: " + map.get(new User(3))); }}Output: User3's name: Name: 3

The above simply returns idNumber as the only discrimination in hashCode, and users can also implement their own methods according to their own business. In the equals method, instanceof will quietly check whether the object is null. If the parameter to the left of instanceof is null, false will be returned. If the parameter of equals() is not null and the type is correct, the comparison will be based on the actual idNumber in each object. As can be seen from the output, the current method is correct.

refer to:

"Java Programming Thoughts"

JDK 1.6 Chinese Help Manual

The above is all the content of this article. I hope that the content of this article will be of some help to everyone’s study or work. I also hope to support Wulin.com more!