Java Improvement (23) -----Detailed explanation of HashMap

Author：Eve Cole Update Time：2025-06-14 12:48:02

HashMap is also a collection we use a lot. It is an implementation of the Map interface based on hash tables and exists in the form of a key-value. In HashMap, key-value is always processed as a whole. The system will calculate the storage location of the key-value based on the hash algorithm. We can always quickly store and retrieve the value through the key. Let’s analyze the access to HashMap.

1. Definition

HashMap implements the Map interface and inherits AbstractMap. The Map interface defines the rules for key mapping to values, and the AbstractMap class provides the backbone implementation of the Map interface to minimize the work required to implement this interface. In fact, the AbstractMap class has implemented Map. I think it should be clearer to mark Map LZ here!

 public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable

2. Constructor function

HashMap provides three constructors:

HashMap(): Constructs an empty HashMap with the default initial capacity (16) and the default loading factor (0.75).

HashMap(int initialCapacity): Constructs an empty HashMap with specified initial capacity and default loading factor (0.75).

HashMap(int initialCapacity, float loadFactor): Constructs an empty HashMap with specified initial capacity and load factor.

Two parameters are mentioned here: initial capacity, loading factor. These two parameters are important parameters that affect the performance of HashMap. The capacity represents the number of buckets in the hash table. The initial capacity is the capacity when creating a hash table. The loading factor is a scale that the hash table can reach before its capacity automatically increases. It measures the degree of use of a hash table space. The larger the load factor indicates the higher the loading degree of the hash table, and vice versa. For hash tables using linked list method, the average time to find an element is O(1+a). Therefore, if the load factor is larger, the space will be more fully utilized, but the consequence is a decrease in search efficiency; if the load factor is too small, the data of the hash table will be too sparse, causing serious waste to the space. The default load factor of the system is 0.75, and we generally do not need to modify it.

HashMap is a data structure that supports fast access. To understand its performance, you must understand its data structure.

III. Data structure

We know that the two most commonly used structures in Java are arrays and simulated pointers (references). Almost all data structures can be implemented in combination with these two, and the same is true for HashMap. In fact, HashMap is a "linked list hash", as follows its data structure:

From the above figure, we can see whether the underlying implementation of HashMap is or an array, but every item in the array is a chain. The parameter initialCapacity represents the length of the array. The following is the source code of the HashMap constructor:

 public HashMap(int initialCapacity, float loadFactor) { //Initial capacity cannot be <0 if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); //Initial capacity cannot > Maximum capacity value, the maximum capacity of HashMap is 2^30 if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; //Load factor cannot be < 0 if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); // Calculate the n-power value of the smallest 2 greater than initialCapacity. int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; this.loadFactor = loadFactor; //Set the capacity limit of HashMap. When the capacity of HashMap reaches this limit, the capacity expansion operation will be performed threshold = (int) (capacity * loadFactor); //Initialize the table array table = new Entry[capacity]; init(); }

As can be seen from the source code, a table array will be initialized every time a new HashMap is created. The element of the table array is an Entry node.

 static class Entry<K,V> implements Map.Entry<K,V> { final K key; V value; Entry<K,V> next; final int hash; /** * Creates new entry. */ Entry(int h, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; } ..... }

Among them, Entry is the inner class of HashMap, which contains the key key, value value, next node next, and hash value. This is very important. It is precisely because Entry forms the items of the table array as a linked list.

The above briefly analyzes the data structure of HashMap, and below will explore how HashMap implements fast access.

4. Storage implementation: put(key,vlaue)

First, let's look at the source code

 public V put(K key, V value) { //When the key is null, call putForNullKey method to save the first position of null and table. This is why HashMap allows null if (key == null) return putForNullKey(value); //Calculate the hash value of the key int hash = hash(key.hashCode()); -----(1) //Calculate the position of the key hash value in the table array int i = indexFor(hash, table.length); -----(2) //Iterate from i and find the location where the key is saved for (Entry<K, V> e = table[i]; e != null; e = e.next) { Object k; //Judge whether there is the same hash value on the chain (the key is the same) //If the same exists, directly overwrite the value and return the old value if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; //Old value = new value e.value = value; e.recordAccess(this); return oldValue; //Return old value} } //Increase the number of modifications by 1 modCount++; //Add key and value to the i position addEntry(hash, key, value, i); return null; }

Through the source code, we can clearly see that the process of HashMap saving data is: first determine whether the key is null. If it is null, directly call the putForNullKey method. If it is not empty, first calculate the hash value of the key, and then search for the index position in the table array according to the hash value. If there is an element at this position, compare whether the same key exists. If it exists, overwrite the value of the original key, otherwise save the element at the head of the chain (the first element saved is placed at the end of the chain). If the table has no elements there, it will be saved directly. This process seems simple, but it actually has deep inside information. There are several points as follows:

1. Let’s look at the iteration first. The reason for iteration here is to prevent the existence of the same key value. If two hash values (keys) are found to be the same, the processing method of HashMap is to replace the old value with the new value. The key is not processed here, which explains that there are no two identical keys in HashMap.

2. In the view (1) and (2). This is the essence of HashMap. First of all, there is the hash method, which is a pure mathematical calculation, which is to calculate the hash value of h.

 static int hash(int h) { h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }

We know that for HashMap tables, the data distribution needs to be even (it is best to have only one element for each item, so that it can be found directly). It cannot be too tight or too loose. Too tight will lead to slow query speed, and too loose will waste space. After calculating the hash value, how can we ensure that the table elements are distributed equally? We will think of mold acquisition, but because the mold acquisition consumes a lot, HashMap handles it like this: call the indexFor method.

 static int indexFor(int h, int length) { return h & (length-1); }

The length of the underlying array of HashMap is always at the n-power of 2, and it exists in the constructor: capacity <<= 1; doing so can always ensure that the length of the underlying array of HashMap is at the n-power of 2. When length is to n power of 2, h&(length - 1) is equivalent to taking the modulus of length, and the speed is much faster than taking the modulus directly. This is an optimization of HashMap in terms of speed. As for why it is 2 to the nth power, the following explanation is.

Let’s go back to the indexFor method, which has only one statement: h&(length - 1). In addition to the above modulus operation, this sentence also has a very important responsibility: evenly distribute table data and make full use of space.

Here we assume that length is 16(2^n) and 15, and h is 5, 6, and 7.

When n=15, the results of 6 and 7 are the same, which means that their locations stored in the table are the same, that is, a collision occurs, and 6 and 7 will form a linked list at one location, which will lead to a decrease in query speed. It is true that only three numbers are analyzed here, but not many, so let’s look at 0-15.
From the chart above, we see a total of 8 collisions, and at the same time, we find that the wasted space is very large, with no records in 1, 3, 5, 7, 9, 11, 13, and 15 places, that is, no data is stored. This is because when they perform & operation with 14, the last bit of the result they get will always be 0, that is, it is impossible to store data at the locations of 0001, 0011, 0101, 0111, 1001, 1011, 1101, 1111, and 1111. The space is reduced and the chance of collision will be further increased, which will lead to slow query speed. When length = 16, length 1 = 15 is 1111. Then when performing the low-bit & operation, the value is always the same as the original hash value, and when performing the high-bit operation, its value is equal to its low-bit value. Therefore, when length = 2^n, the probability of collision between different hash values is relatively small, which will make the data distributed evenly in the table array and the query speed is faster.

Here we review the put process: When we want to add a pair of key-values to a HashMap, the system will first calculate the hash value of the key, and then confirm the location stored in the table based on the hash value. If there is no element at this position, insert it directly. Otherwise, iterate over the list of elements at that point and compare the hash value of its key accordingly. If the two hash values are equal and the key values are equal (e.hash == hash && ((k = e.key) == key || key.equals(k))), the value of the original node is overwritten with the value of the new Entry. If the two hash values are equal but the key values are not equal, insert the node into the header of the linked list. For the specific implementation process, see the addEntry method, as follows:

 void addEntry(int hash, K key, V value, int bucketIndex) { //Get the Entry Entry<K, V> e = table[bucketIndex]; //Put the newly created Entry into the bucketIndex index and let the new Entry point to the original Entry table[bucketIndex] = new Entry<K, V>(hash, key, value, e); //If the number of elements in the HashMap exceeds the limit, the capacity will be twice as large if (size++ >= threshold) resize(2 * table.length); }

There are two points to note in this method:

One is the generation of chains. This is a very elegant design. The system always adds a new Entry object to the bucketIndex. If there is an object at bucketIndex, the newly added Entry object will point to the original Entry object, forming an Entry chain. However, if there is no Entry object at bucketIndex, that is, e==null, then the newly added Entry object points to null, and there will be no Entry chain generated.

2. Expansion of capacity.

As the number of elements in HashMap increases, the probability of collision becomes greater and greater, and the length of the generated link list will become longer and longer. This will inevitably affect the speed of HashMap. In order to ensure the efficiency of HashMap, the system must expand the capacity at a certain critical point. This critical point is when the number of elements in HashMap is equal to the table array length* load factor. But scaling is a very time-consuming process because it requires recalculating the location of this data in the new table array and copying it. So if we have predicted the number of elements in HashMap, then the number of preset elements can effectively improve the performance of HashMap.

5. Reading implementation: get(key)

Compared with the storage of HashMap, withdrawing is relatively simple. Find the Entry at the index in the table array through the hash value of the key, and then return the value corresponding to the key.

 public V get(Object key) { // If null, call getForNullKey method to return the corresponding value if (key == null) return getForNullKey(); // Calculate its hash code based on the hashCode value of the key int hash = hash(key.hashCode()); // Take out the value at the specified index in the table array for (Entry<K, V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; //If the searched key is the same as the searched key, return the corresponding value if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } return null; }

Here, the value that can be quickly retrieved based on the key is not only inseparable from the data structure of HashMap, but also has a lot to do with Entry. As mentioned earlier, HashMap does not store the key and value separately in the stored procedure, but is processed as a whole key-value, which is the Entry object. At the same time, the value is only equivalent to the key's attachment. During the storage process, the system determines the storage location of the Entry in the table array based on the hashcode of the key, and in the process of fetching, the corresponding Entry object is also taken out according to the hashcode of the key.

Original link: http://www.cnblogs.com/chenssy/p/3521565.html

The above is all the content of this article. I hope it will be helpful to everyone's learning and I hope everyone will support Wulin.com more.