HashMap and HashSet are two important members of the Java Collection Framework. HashMap is a common implementation class for the Map interface, and HashSet is a common implementation class for the Set interface. Although the interface specifications implemented by HashMap and HashSet are different, their underlying Hash storage mechanism is exactly the same, and even HashSet itself uses HashMap to implement it.
In fact, there are many similarities between HashSet and HashMap. For HashSet, the system uses the Hash algorithm to determine the storage location of the collection elements, so as to ensure that the collection elements can be stored and retrieved quickly; for HashMap, the system key-value is processed as a whole, and the system always calculates the storage location of the key-value based on the Hash algorithm, so that the key-value pairs of the map can be stored and retrieved quickly.
Before introducing collection storage, it should be pointed out that although a collection claims to store Java objects, it does not actually put Java objects into the Set collection, but only retains references to these objects in the Set collection. That is to say, a Java collection is actually a collection of multiple reference variables that point to the actual Java object.
1. Basic features of HashMap
After reading the comments in the JDK source code HashMap.class, you can summarize many HashMap features.
HashMap allows both key and value to be null, while Hashtable does not.
HashMap is thread-insecure, while Hashtable is thread-safe
The order of elements in HashMap is not always the same, and the position of the same element may also change over time (the case of resize)
The time complexity of traversing a HashMap is proportional to its capacity and the number of existing elements. If you want to ensure the efficiency of traversal, the initial capacity cannot be set too high or the balance factor cannot be set too low.
Similar to the previous related List, since HashMap is thread-insecure, fail-fast will be generated when the iterator tries to change the container structure during the iteration process. A synchronized HashMap can be obtained through Collections.synchronizedMap(HashMap)
2. Hash table data structure analysis
Hash table (hash table, hash table) is a data structure that directly accesses memory storage locations based on keywords. That is to say, the hash table establishes a direct mapping between keywords and storage addresses
As shown in the figure below, the key obtains an index position of buckets through the hash function.
Obtaining index through hash function will inevitably occur the same situation, that is, conflict. Here are a few ways to resolve conflicts:
Open addressing: The basic idea of this method is to scan N positions under the table in sequence when encountering conflicts, and fill in if there is any free one. The specific algorithm will not be explained anymore, the following is a schematic diagram:
Separate chaining: The basic idea of this method is to string together Entry with the same index value in a linked list when encountering conflicts. The specific algorithm will not be explained anymore, the following is a schematic diagram:
The method to resolve conflicts in HashMap in JDK is to use the Separate chaining method.
3. HashMap source code analysis (JDK1.7)
1. HashMap read and write elements
Entry
The elements stored in HashMap are of Entry type. The source code of Entry in the source code is given below:
static class Entry<K,V> implements Map.Entry<K,V> { final K key; V value; Entry<K,V> next; int hash; Entry(int h, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; } //The get and set methods of key, value are omitted, get and set operations will be used in the subsequent iterators... public final boolean equals(Object o) { if (!(o instanceof Map.Entry)) return false; Map.Entry e = (Map.Entry)o; Object k1 = getKey(); Object k2 = e.getKey(); if (k1 == k2 || (k1 != null && k1.equals(k2))) { Object v1 = getValue(); Object v2 = e.getValue(); if (v1 == v2 || (v1 != null && v1.equals(v2))) return true; } return false; } //Here, do or operate the hashcode of Key and the hashcode of Value to obtain the hashcode of Entry public final int hashCode() { return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue()); } public final String toString() { return getKey() + "=" + getValue(); } /** * This method is invoked whenever the value in an entry is * overwritten by an invocation of put(k,v) for a key k that's already * in the HashMap. */ void recordAccess(HashMap<K,V> m) { } /** * This method is invoked whenever the entry is * removed from the table. */ void recordRemoval(HashMap<K,V> m) { } }An Entry includes key, value, hash and a reference to the next Entry. It is obvious that this is a single linked list, which implements the Map.Entry interface.
recordAcess(HashMap<K, V> and recordRemoval(HashMap<K, V>) are not implemented in HashMap. However, the two methods of LinkedHashMap are used to implement the LRU algorithm.
get: Read elements to get the corresponding Entry from HashMap. The following is the source code of get:
public V get(Object key) { //The situation where key is null if (key == null) return getForNullKey(); //Find Entry based on key Entry Entry<K,V> entry = getEntry(key); return null == entry ? null : entry.getValue(); }getForNullKey source code
private V getForNullKey() { if (size == 0) { return null; } //Transtraighten the conflict chain for (Entry<K,V> e = table[0]; e != null; e = e.next) { if (e.key == null) return e.value; } return null; }Entry with a key Null is stored in table[0], but the conflict chain in table[0] does not necessarily have a key as null, so it needs to be traversed.
Get entry according to key:
final Entry<K,V> getEntry(Object key) { if (size == 0) { return null; } int hash = (key == null) ? 0 : hash(key); //Get the index position in the table through hash, and then traverse the conflicting linked list to find the Key for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } return null; }The above is the process of HashMap reading an Entry and its source code. Time complexity O(1)
put: Write elements
The put operation in HashMap is relatively complicated because there will be a HashMap expansion operation during the put operation.
When a new element is written, if there is a key to write the element in HashMap, the operation of replacing the value is performed, which is equivalent to update. Here is the put source code:
public V put(K key, V value) { //If the table is empty, fill if (table == EMPTY_TABLE) according to the threshold of size { inflateTable(threshold); } //Fill the Entry with the key as Null if (key == null) return putForNullKey(value); //Generate hash to get the mapping of the index Index int hash = hash(key); int i = indexFor(hash, table.length); //Transulate the conflict chain of the current index to find whether there is a corresponding key for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; //If the corresponding key exists, replace oldValue and return oldValue if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } //The key of newly written Entry does not exist in the conflict chain modCount++; //Insert a new Entry addEntry(hash, key, value, i); return null; }addEntry and createEntry source code:
void addEntry(int hash, K key, V value, int bucketIndex) { //Before inserting a new Entry, first judge the size of the current HashMap and its threshold size, and select whether to expand if ((size >= threshold) && (null != table[bucketIndex])) { resize(2 * table.length); hash = (null != key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); } createEntry(hash, key, value, bucketIndex); } void createEntry(int hash, K key, V value, int bucketIndex) { Entry<K,V> e = table[bucketIndex]; //Head insertion method, the newly written entry is inserted into the front of the first Entry in the conflict chain at the current index position. table[bucketIndex] = new Entry<>(hash, key, value, e); size++; }The above is the process of writing an Entry into a HashMap and its source code. Time complexity O(1)
Remove remove element:
final Entry<K,V> removeEntryForKey(Object key) { if (size == 0) { return null; } // Calculate the hash value according to the key and get the index int hash = (key == null) ? 0 : hash(key); int i = indexFor(hash, table.length); //Delete the linked list, define two pointers, pre represents the predecessor Entry<K,V> prev = table[i]; Entry<K,V> e = prev; //Transtraight the conflict chain and delete all key while (e != null) { Entry<K,V> next = e.next; Object k; //If found (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) { modCount++; size--; // Finding the first node is the node to be deleted if (prev == e) table[i] = next; else prev.next = next; e.recordRemoval(this); return e; } prev = e; e = next; } return e; }The above is the process of HashMap deleting an Entry and its source code. Time complexity O(1)
2. HashMap hashing principle (hash function)
The implementation of the hash function in HashMap is done through hash(Object k) and indexFor(int h, int length). Let’s take a look at the source code below:
final int hash(Object k) { int h = hashSeed; if (0 != h && k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } h ^= k.hashCode(); // This function ensures that hashCodes that different only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). // In order to reduce the chance of conflict h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }Get the Index index source code:
static int indexFor(int h, int length) { // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2"; return h & (length-1); }HashMap maps keys to indexes within the interval of [0, table.length] through a hash function. There are generally two kinds of indexing methods:
hash(key) % table.length, where length must be a prime number. HashTable uses this implementation in JDK.
For specific reasons for using prime numbers, you can find relevant algorithm data proofs, which will not be stated here.
hash(key) & (table.length - 1 ) where length must be to the 2 exponential power. HashMap in JDK uses this implementation method.
Because the size of length is 2 exponential times, hash(key) & (table.length - 1) will always be between [0, length - 1]. However, if you do this alone, there will be a big problem with conflict, because the value of hashCode in JAVA is 32 bits. When the capacity of HashMap is small, for example, when 16, when doing XOR operation, the high bit is always discarded, but after the low bit operation, the probability of conflict occurs increases.
Therefore, in order to reduce the probability of conflict occurrence, many bit operations and exclusive OR operations are performed in the code.
3. HashMap memory allocation strategy
Member variable capacity and loadFactor
The required capacity Capacity is an exponential multiple of 2 in HashMap, and the default capacity is 1 << 4 = 16. There is also a balance factor (loadFactor) in HashMap. Excessively high factors will reduce storage space, but the time for searching (lookup, including put and get methods in HashMap) will increase. The default value of loadFactor is 0.75, which is the optimal value given by weighing the time complexity and space complexity.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 static final int MAXIMUM_CAPACITY = 1 << 30; static final float DEFAULT_LOAD_FACTOR = 0.75f;
HashMap constructor
The construction of HashMap is to set capacity and the initial value of loadFactor
public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; threshold = initialCapacity; init(); } I have said before that capacity in HashMap must be exponential times of 2, and there is no limit in the constructor. So how to ensure that capacity value is exponential times of 2?
During put operation, the source code will determine whether the current hash table is an empty table. If so, call inflateTable(int toSize)
private void inflateTable(int toSize) { // Find a power of 2 >= toSize int capacity = roundUpToPowerOf2(toSize); threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1); table = new Entry[capacity]; initHashSeedAsNeeded(capacity); }where roundUpToPowerOf2 is to obtain the n power of 2 greater than or equal to the given parameter
private static int roundUpToPowerOf2(int number) { // assert number >= 0 : "number must be non-negative"; return number >= MAXIMUM_CAPACITY ? MAXIMUM_CAPACITY : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1; }Integer.hightestOneBit(int) is an operation that retains the highest bit of a given parameter and changes the remaining 0. Simply put, it is to change the parameter int to n powers less than or equal to its maximum 2.
If number is n power of 2, the highest bit is at the original second high bit after decrement 1, and then shift 1 bit left to still be positioned to the highest bit position. If number is not n power of 2, the highest bit is still the original highest bit after decrement 1 bit left to shift 1 bit left to be
Expand capacity:
HashMap will have a resize behavior when put operation. The specific source code is as follows:
void resize(int newCapacity) { Entry[] oldTable = table; int oldCapacity = oldTable.length; //The hash table has reached its maximum capacity, 1 << 30 if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return; } Entry[] newTable = new Entry[newCapacity]; //Transfer the Entry in the oldTable to the newTable//The return value of initHashSeedAsNeeded determines whether to recalculate the hash value transfer(newTable, initHashSeedAsNeeded(newCapacity)); table = newTable; //Recalculate threshold threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); }void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; //Transweep oldTable for (Entry<K,V> e : table) { //Transweep conflict chain while(null != e) { Entry<K,V> next = e.next; if (rehash) { //Recalculate the hash value e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); //Insert the element into the head, the header insertion method e.next = newTable[i]; newTable[i] = e; e = next; } } }The above is the entire process of HashMap memory allocation. In summary, when hashMap puts an Entry, it will check the current capacity and threshold size to select whether to expand the capacity. The size of each expansion is 2 * table.length. During the expansion period, it will determine whether the hash value needs to be recalculated based on initHashSeedAsNeeded.
4. HashMap iterator
Iterators such as ValueIterator, KeyIterator, EntryIterator and other in HashMap are all based on HashIterator. Let's take a look at its source code:
private abstract class HashIterator<E> implements Iterator<E> { Entry<K,V> next; // next entry to return int expectedModCount; // For fast-fail int index; // current slot, table index Entry<K,V> current; // current entry HashIterator() { expectedModCount = modCount; // Find the first Entry in the hash table if (size > 0) { Entry[] t = table; while (index < t.length && (next = t[index++]) == null) ; } } public final boolean hasNext() { return next != null; } final Entry<K,V> nextEntry() { //HashMap is non-thread-safe, and when traversing, still determine whether there is any modification of the table structure if (modCount != expectedModCount) throw new ConcurrentModificationException(); Entry<K,V> e = next; if (e == null) throw new NoSuchElementException(); if ((next = e.next) == null) { //Find the next Entry Entry[] t = table; while (index < t.length && (next = t[index++]) == null) ; } current = e; return e; } public void remove() { if (current == null) throw new IllegalStateException(); if (modCount != expectedModCount) throw new ConcurrentModificationException(); Object k = current.key; current = null; HashMap.this.removeEntryForKey(k); expectedModCount = modCount; } }The three iterators, Key, Value, and Entry, are encapsulated and become three collection perspectives: keySet, values, and entrySet. These three collection perspectives support the remove, removeAll, and clear operations of HashMap, and do not support add and addAll operations.