The hashCode() and equals() methods can be said to be a major feature of Java's completely object-oriented. It facilitates our programming and also brings a lot of dangers. In this article, we will discuss how to correctly understand and use these two methods.
If you decide to rewrite the equals() method, then you must be clear about the risks brought by doing so and make sure you can write a robust equals() method. One thing you must note is that after rewriting equals(), you must rewrite the hashCode() method. The specific reasons will be explained later.
Let's first take a look at the description of the equals() method in JavaSE 7 Specification:
・It is reflexive: for any non-null reference value x, x.equals(x) should return true .
・It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true .
・It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true , then x.equals(z) should return true.
・It is consistent: for any non-null reference values x and y , multiple invocations of x.equals(y) consistently return true or consistently return false , provided no information used in equals comparisons on the objects is modified.
・For any non-null reference value x, x.equals(null) should return false .
This passage uses a lot of numerology in discrete mathematics. Let me give a brief explanation:
1. Reflexivity: A.equals(A) must return true.
2. Symmetry: If A.equals(B) returns true, then B.equals(A) also needs to return true.
3. Transmission: If A.equals(B) is true and B.equals(C) is true, then A.equals(C) must also be true. To put it bluntly, A = B, B = C, then A = C.
4. Consistency: As long as the state of the A and B objects does not change, A.equals(B) must always return true.
5. A.equals(null) to return false.
I believe that as long as people who are not professional in mathematics will not call the above things. In actual application, we only need to rewrite the equals() method according to certain steps. For convenience of explanation, we first define a programmer class (Coder):
class Coder { private String name; private int age; // getters and setters }What we want is that if the names and ages of the two programmer objects are the same, then we think that these two programmers are the same. At this time, we have to rewrite its equals() method. Because the default equals() actually determines whether two references point to the same object intrinsic, it is equivalent to == . When rewriting, follow the following three steps:
1. Determine whether it is equal to yourself.
if(other == this) return true;
2. Use the instanceof operator to determine whether other is an object of type Coder.
if(!(other instanceof Coder)) return false;
3. Compare the data domains, name and age you customize in the Coder class, and you must not miss one.
Coder o = (Coder)other; return o.name.equals(name) && o.age == age;
Seeing this, someone may ask, there is a cast in step 3. If someone passes an object of Integer class into this equals, will he throw a ClassCastException? This worry is actually redundant. Because we have made the judgment of instanceof in the second step, if other is a non-Coder object, or even other is null, then false will be directly returned in this step, so that the subsequent code will not get the opportunity to be executed.
The above three steps are also the steps recommended in <Effective Java>, which can basically ensure that there is no mistake.
In JavaSE 7 Specification,
"Note that it is generally necessary to override the hashCode method whenever this method (equals) is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes."
If you rewrite the equals() method, then remember to rewrite the hashCode() method. We have learned hash tables in the university computer data structure courses. The hashCode() method serves the hash table.
When we use a collection class that starts with Hash like Hash, such as HashMap and HashSet, hashCode() will be called implicitly to create a hash mapping relationship. We will explain this later. Here we will focus on the writing of the hashCode() method first.
<Effective Java> provides a writing method that can avoid hash conflicts to the greatest extent, but I personally think that it is not necessary to do so much trouble for general applications. If you need to store tens of thousands or millions of objects in your application, you should strictly follow the methods given in the book. If you are writing a small and medium-sized application, then the following principles are sufficient:
It is necessary to ensure that all members of the Coder object can be reflected in hashCode.
For this example, we can write this:
@Override public int hashCode() { int result = 17; result = result * 31 + name.hashCode(); result = result * 31 + age; return result; }Where int result = 17 you can also change it to 20, 50, etc. Seeing this, I was suddenly curious and wanted to see how the hashCode() method in the String class is implemented. Check the documentation and know:
"Returns a hash code for this string. The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)"
Calculate the ASCII code of each character to the power n - 1 and then add it. It can be seen that Sun is very strict in the implementation of hashCode. This can avoid the same hashCode in the two different Strings to the greatest extent.
The concept of bucket is referenced in Oracle's Hash Table implementation. As shown in the figure below:
As can be seen from the above figure, the hash table with bucket is roughly equivalent to a combination of a hash table and a linked list. That is, a linked list will be hung on each bucket, and each node of the linked list will be used to store objects. Java uses hashCode() method to determine which bucket an object should be located, and then searches it in the corresponding linked list. Ideally, if your hashCode() method is written robust enough, then each bucket will have only one node, which will achieve the constant-level time complexity of the search operation. That is, no matter which piece of memory your object is placed in, I can immediately locate the area through hashCode() without traversing and searching from beginning to end. This is also the main application of hash tables.
like:
When we call the put(Object o) method of HashSet, we will first locate it in the corresponding bucket according to the return value of o.hashCode(). If there are no nodes in the bucket, then put o here. If there are already nodes, then hang o to the end of the linked list. Similarly, when calling contains(Object o), Java will locate the corresponding bucket through the return value of hashCode(), and then call the equals() method in turn at the nodes in the corresponding linked list to determine whether the object in the node is the object you want.
Let’s use an example to experience this process:
Let's create two new Coder objects first:
Coder c1 = new Coder("bruce", 10); Coder c2 = new Coder("bruce", 10);Assume that we have rewrite the equals() method of Coder without rewriting the hashCode() method:
@Override public boolean equals(Object other) { System.out.println("equals method invoked!"); if(other == this) return true; if(!(other instanceof Coder)) return false; Coder o = (Coder)other; return o.name.equals(name) && o.age == age; }Then we construct a HashSet and put the c1 object into the set:
Set<Coder> set = new HashSet<Coder>(); set.add(c1);
Execute again:
System.out.println(set.contains(c2));
We expect the contains(c2) method to return true, but in fact it returns false.
The name and age of c1 and c2 are the same. Why do I call contains(c2) and return false after putting c1 into a HashSet? This is the hashCode() that is causing trouble. Because you have not rewrite the hashCode() method, when HashSet looks for c2, it will look for it in different buckets. For example, if c1 is put into the bucket 05, it is searched in the bucket 06 when searching for c2, so of course it cannot be found. Therefore, the purpose of our rewriting hashCode() is that when A.equals(B) returns true, the hashCode() of A and B should return the same value.
Do I ask hashCode() to return a fixed number line every time
Someone might rewrite it like this:
@Override public int hashCode() { return 10; }If this is the case, HashMap, HashSet and other collection classes will lose their "hash meaning". In the words of <Effective Java>, the hash table degenerates into a linked list. If hashCode() returns the same number every time, then all objects will be placed in the same bucket, and each time you perform a search operation, it will traverse the linked list, which will completely lose the function of hashing. So it is better to provide a robust hashCode() as a good idea.
The above is all the detailed introduction of this article about rewriting hashCode() and equals() methods. I hope it will be helpful to everyone. Interested friends can continue to refer to other related topics on this site. If there are any shortcomings, please leave a message to point it out. Thank you friends for your support for this site!