The principle of hashset removal of duplicate values

Author：Eve Cole Update Time：2025-07-29 09:32:01

Set in Java is a set that does not contain duplicate elements, or to be precise, an element pair that does not contain e1.equals(e2). null is allowed in Set. Set cannot guarantee the order of elements in the set.

When adding an element to set, if the specified element does not exist, the addition is successful. That is, if the element e1 does not exist in the set (e==null?e1==null:e.queals(e1)), then e1 can be added to the set.

Here is a set implementation class HashSet as an example, and briefly introduce the principle of set not to repeat implementation:

 package com.darren.test.overide;public class CustomString {private String value;public CustomString() {this("");}public CustomString(String value) {this.value = value;}}

 package com.darren.test.overide;import java.util.HashSet;import java.util.Set;public class HashSetTest {public static void main(String[] args) {String a = new String("A");String b = new String("A");CustomString c = new CustomString("B");CustomString d = new CustomString("B");System.out.println("a.equals(b) == " + a.equals(b));System.out.println("c.equals(d) == " + c.equals(d));Set<Object> set = new HashSet<Object>();set.add(a);set.add(b);set.add(c);set.add(d);System.out.println("set.size() == " + set.size());for (Object object : set) {System.out.println(object);}}}

The operation results are as follows:

 a.equals(b) == true c.equals(d) == false set.size() == 3 com.darren.test.overide.CustomString@2c39d2 A com.darren.test.overide.CustomString@5795ce

Maybe you have seen the key, that's right, it's the equals method. It is still inappropriate to say this, but to be precise, it should be the equals and hashcode methods. Why do you say that? Let's change the CustomString class and test it:

 package com.darren.test.overide;public class CustomString {private String value;public CustomString() {this("");}public CustomString(String value) {this.value = value;}@Override public Boolean equals(Object obj) {if (this == obj) {return true;} else if (obj instanceof CustomString) {CustomString customString = (CustomString) obj;return customString.value.equals(value);} else {return false;}}}

Test results:

 a.equals(b) == true c.equals(d) == true set.size() == 3 com.darren.test.overide.CustomString@12504e0 A com.darren.test.overide.CustomString@1630eb6

The return value of equals this time is true, but the size of set is still 3.

Let's continue to change

 package com.darren.test.overide;public class CustomString {private String value;public CustomString() {this("");}public CustomString(String value) {this.value = value;}@Override public int hashCode() {// return super.hashCode(); return 1;}}

Look at the results again:

 a.equals(b) == true c.equals(d) == false set.size() == 3 com.darren.test.overide.CustomString@1 com.darren.test.overide.CustomString@1 A

Only rewrite hashCode method, not rewrite equals method, either

Finally, change it

 package com.darren.test.overide;public class CustomString {private String value;public CustomString() {this("");}public CustomString(String value) {this.value = value;}@Override public Boolean equals(Object obj) {if (this == obj) {return true;} else if (obj instanceof CustomString) {CustomString customString = (CustomString) obj;return customString.value.equals(value);} else {return false;}}@Override public int hashCode() {// return super.hashCode(); return 1;}}

Final results:

 a.equals(b) == true c.equals(d) == true set.size() == 2 com.darren.test.overide.CustomString@1 A

OK, it is proved that you need to rewrite the equals method and hashCode method, and see the principle:

Convention for hashCode in java.lnag.Object:

1. During an application execution, if the information used to compare the equals method of an object is not modified, the hashCode method is called multiple times on the object, and it must consistently return the same integer.

2. If the two objects are equal according to the equals(Objecto) method, calling the hashCode method of either of the two objects must produce the same integer result.

3. If the two objects are not equal according to the equals(Objecto) method, then the hashCode method of either of the two objects is called, and no different integer results are required. But if it can be different, it may improve the performance of the hash table.

In HashSet, the basic operations are implemented by the HashMap layer, because the HashSet layer uses HashMap to store data. When adding an element to a HashSet, first calculate the hashcode value of the element, and then use this (the hashcode of the element)% (the size of the HashMap collection) + 1 to calculate the storage location of this element. If this position is empty, add the element; if it is not empty, use the equals method to compare whether the elements are equal, and if equals are equal, do not add it, otherwise find a blank space to add it.

The following is part of the source code of HashSet:

 package java.util;public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable {static final long serialVersionUID = -5024744406713321676L;// The underlying layer uses HashMap to save all elements in the HashSet. private transient HashMap<E,Object> map;// Define a virtual Object object as the value of HashMap, and define this object as static final. private static final Object PRESENT = new Object();/** * The default parameterless constructor constructs an empty HashSet. * * In fact, the underlying layer will initialize an empty HashMap and use the default initial capacity of 16 and the loading factor of 0.75. */public HashSet() {map = new HashMap<E,Object>();}/** * Construct a new set containing the elements in the specified collection. * * The actual underlying layer uses the default load factor 0.75 and is sufficient to contain the initial capacity of all elements in the specified * collection to create a HashMap. * @param c The elements in it will be stored in the collection in this set. */public HashSet(Collection< extends E> c) {map = new HashMap<E,Object>(Math.max((int) (c.size()/.75f) + 1, 16));addAll(c);}/** * Construct an empty HashSet with the specified initialCapacity and loadFactor. * * The actual underlying layer constructs an empty HashMap with corresponding parameters. * @param initialCapacity Initial capacity. * @param loadFactor load factor. */public HashSet(int initialCapacity, float loadFactor) {map = new HashMap<E,Object>(initialCapacity, loadFactor);}/** * Construct an empty HashSet with the specified initialCapacity. * * In fact, the underlying layer constructs an empty HashMap with the corresponding parameters and load factor loadFactor of 0.75. * @param initialCapacity Initial capacity. */public HashSet(int initialCapacity) {map = new HashMap<E,Object>(initialCapacity);}/** * Construct a new empty link hash collection with the specified initialCapacity and loadFactor. * This constructor is package access permission and is not exposed to the public. It is actually just support for LinkedHashSet. * * In fact, the underlying layer will construct an empty LinkedHashMap instance with the specified parameters to implement it. * @param initialCapacity Initial capacity. * @param loadFactor load factor. * @param dummy tag. */HashSet(int initialCapacity, float loadFactor, Boolean dummy) {map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);}/** * Returns the iterator that iterates the elements in this set. The order of return elements is not specific. * * The underlying layer actually calls the keySet of the underlying HashMap to return all keys. * The elements in the HashSet can be seen, but are stored on the key of the underlying HashMap, and the value is identified by a static final object object. * @return Iterator that iterates over elements in this set. */@Override public Iterator<E> iterator() {return map.keySet().iterator();}/** * Return the number of elements in this set (the capacity of the set). * * The underlying layer actually calls the size() method of HashMap to return the number of Entry, and gets the number of elements in the Set. * @return the number of elements in this set (capacity of set). */@Override public int size() {return map.size();}/** * Return true if this set does not contain any elements. * * The underlying layer actually calls isEmpty() of HashMap to determine whether the HashSet is empty. * @return Return true if this set does not contain any elements. */@Override public Boolean isEmpty() {return map.isEmpty();}/** * Return true if this set contains the specified element. * More specifically, true is returned if and only if this set contains an e element that satisfies (o==null ? e==null : o.equals(e)) *. * * The containsKey of the underlying actual call to HashMap determines whether it contains the specified key. * @param o The existence of the element in this set has been tested. * @return Return true if this set contains the specified element. */@Override public Boolean contains(Object o) {return map.containsKey(o);}/** * If the specified element is not included in this set, add the specified element. * More specifically, if this set does not contain element e2 that satisfies (e==null ? e2==null : e.equals(e2)) *, the specified element e is added to this set. * If this set already contains the element, the call does not change the set and returns false. * * The underlying layer will actually put the element as a key into the HashMap. * Since HashMap's put() method adds a key-value pair, when the key in the HashMap's new Entry * is the same as the key of the original Entry in the collection (hashCode() returns equal, and it also returns true through equals comparison), * The value of the newly added Entry will overwrite the original Entry's value, but the key will not change any. * Therefore, if an existing element is added to the HashSet, the newly added collection element will not be put into the HashMap, and * The original element will not change any, which satisfies the feature of non-repetition of elements in the Set. * @param e Elements that will be added to this set. * @return Return true if this set does not contain the specified element. */@Override public Boolean add(E e) {return map.put(e, PRESENT)==null;}/** * If the specified element exists in this set, it will be removed. * More specifically, if this set contains an element e that satisfies (o==null ? e==null : o.equals(e)), * will remove it. Return true if this set already contains the element (or: true if this set changes due to the call). (Once the call returns, this set no longer contains the element). * * The underlying layer actually calls the remove method of HashMap to delete the specified Entry. * @param o Object that needs to be removed if it exists in this set. * @return Return true if set contains the specified element. */@Override public Boolean remove(Object o) {return map.remove(o)==PRESENT;}/** * Remove all elements from this set. After this call returns, the set will be empty. * * The underlying layer actually calls the clear method of HashMap to clear all elements in Entry. */@Override public void clear() {map.clear();}}

Summarize

The above is all the content of this article about the analysis of the principle of hashset removal of duplicate values. I hope it will be helpful to everyone. Interested friends can continue to refer to other related topics on this site. If there are any shortcomings, please leave a message to point it out. Thank you friends for your support for this site!