Overview
In the Java language, a data collection framework is provided, which defines some abstract data types such as List and Set. Each abstract data type has its own specific implementation, and the underlying layer adopts different implementation methods, such as ArrayList and LinkedList.
In addition, Java also provides several different ways to traverse data sets. Developers must clearly understand the characteristics, applicable occasions, and performance in different underlying implementations. Let’s analyze this content in detail below.
How are data elements stored in memory?
There are two main storage methods for data elements in memory:
1. Sequential storage, Random Access (Direct Access):
In this way, adjacent data elements are stored in adjacent memory addresses, and the entire memory address is continuous. The memory address can be directly calculated based on the location of the element and read it directly. The average time complexity of reading an element at a specific location is O(1). Normally, only sets implemented based on arrays have this feature. ArrayList is represented in Java.
2. Chain storage, Sequential Access:
In this way, each data element is not required to be in an adjacent position in memory, and each data element contains the memory address of its next element. The memory address cannot be calculated directly based on the location of the element, and the elements can only be read in order. The average time complexity of reading an element at a specific location is O(n). It is mainly represented by linked lists.
LinkedList is represented in Java.
What are the traversal methods provided in Java?
1. Traditional for loop traversal, based on counter:
The traverser itself maintains a counter outside the collection, and then reads the elements at each position in sequence, and stops when the last element is read. The main thing is to read the element according to its position. This is also the most primitive collection traversal method.
The writing method is:
for (int i = 0; i < list.size(); i++) {list.get(i);}2. Iterator traversal, Iterator:
Iterator was originally an OO design pattern, with its main purpose being to block the characteristics of different data sets and to traverse the collection's interfaces uniformly. As an OO language, Java naturally supports the Iterator mode in Collections.
The writing method is:
Iterator iterator = list.iterator(); while (iterator.hasNext()) {iterator.next();}3. Foreach loop traversal:
Iterator and counters that are explicitly declared are blocked.
Advantages: The code is concise and not prone to errors.
Disadvantages: Only simple traversals can be done, and data collections cannot be operated (delete, replace) during the traversal process.
The writing method is:
for (ElementType element : list) {}What is the implementation principle of each traversal method?
1. Traditional for loop traversal, based on counter:
The traverser itself maintains a counter outside the collection, and then reads the elements at each position in sequence, and stops when the last element is read. The main thing is to read the element according to its position.
2. Iterator traversal, Iterator:
Each specific implementation data set generally needs to provide a corresponding Iterator. Compared with traditional for loops, Iterator bans explicit traversal counters. Therefore, iterator based on sequential storage of collections can directly access data by location. The Iterator based on chain storage sets, normal implementations require the current traversal location to be saved. Then move the pointer forward or backward according to the current position.
3. Foreach loop traversal:
According to the decompiled bytecode, it can be found that foreach also uses the Iterator method to implement it, but the Java compiler has helped us generate these codes.
How does each traversal method perform for different storage methods?
1. Traditional for loop traversal, based on counter:
Because it is based on the position of the element, read by position. So we can know that for sequential storage, because the average time complexity of reading elements at a particular location is O(1), the average time complexity of traversing the entire set is O(n). For chain storage, because the average time complexity of reading elements at a specific location is O(n), the average time complexity of traversing the entire set is O(n2) (the square of n).
ArrayList code read by position: read directly by element position.
transient Object[] elementData;public E get(int index) {rangeCheck(index);return elementData(index);}E elementData(int index) {return (E) elementData[index];}LinkedList code read by position: each time you need to read backwards from the 0th element. In fact, it has also made small optimizations internally.
transient int size = 0;transient Node<E> first;transient Node<E> last;public E get(int index) {checkElementIndex(index);return node(index).item;}Node<E> node(int index) {if (index < (size >> 1)) { //The query position is in the first half of the linked list, and look for Node<E> x = first;for (int i = 0; i < index; i++)x = x.next;return x;} else { //The query position is in the second half of the linked list, and look for Node<E> x = last;for (int i = size - 1; i > index; i--)x = x.prev;return x;}}2. Iterator traversal, Iterator:
Then, for collections of RandomAccess type, there is not much meaning. Instead, some additional operations will add additional running time. However, it is of great significance for the Sequential Access collection, because iterator maintains the current traversal position, so every time you traversal, you do not need to start looking up from the first element of the collection. Just move the pointer one by one. In this way, the time complexity of traversing the entire set is reduced to O(n);
(This is only used LinkedList as an example) The iterator of LinkedList is implemented internally, which is to maintain the current traversal position, and then operate the pointer to move:
Code:
public E next() {checkForComodification();if (!hasNext())throw new NoSuchElementException();lastReturned = next;next = next.next;nextIndex++;return lastReturned.item;}public E previous() {checkForComodification();if (!hasPrevious())throw new NoSuchElementException();lastReturned = next = (next == null) ? last : next.prev;nextIndex--;return lastReturned.item;}3. Foreach loop traversal:
By analyzing the Java bytecode, we can see that the internal implementation principle of foreach is also implemented through Iterator, but this Iterator is generated by the Java compiler for us, so we don’t need to write it manually. However, because type conversion checks are required every time, it takes a little longer than Iterator. The time complexity is the same as that of Iterator.
Bytecode using Iterator:
Code:new # // class java/util/ArrayListdupinvokespecial # // Method java/util/ArrayList."<init>":()Vastore_aload_invokeinterface #, // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;store_goto aload_invokeinterface #, // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;popaload_invokeinterface #, // InterfaceMethod java/util/Iterator.hasNext:()Zifne return
Use foreach's bytecode:
Code:new # // class java/util/ArrayListdupinvokespecial # // Method java/util/ArrayList."<init>":()Vastore_aload_invokeinterface #, // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;store_goto aload_invokeinterface #, // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;checkcast # // class loop/Modelastore_aload_invokeinterface #, // InterfaceMethod java/util/Iterator.hasNext:()Zifne return
In what occasions do each traversal method apply?
1. Traditional for loop traversal, based on counter:
Sequential storage: high read performance. Suitable for traversal sequential storage collections.
Chain storage: The time complexity is too high and is not suitable for traversing collections of chain storage.
2. Iterator traversal, Iterator:
Sequential storage: If you don’t care too much about time, it is recommended to choose this method. After all, the code is more concise and also prevents the problem of Off-By-One.
Chain storage: It is of great significance. The average time complexity is reduced to O(n), which is quite attractive, so this traversal method is recommended.
3. Foreach loop traversal:
Foreach only makes the code more concise, but it has some disadvantages, that is, it cannot operate data collections (deletion, etc.) during the traversal process, so it is not used in some occasions. Moreover, it is implemented based on Iterator, but due to the problem of type conversion, it will be a little slower than using Iterator directly. Fortunately, the time complexity is the same. So how to choose, refer to the above two methods to make a compromise choice.
What are the best practices for Java?
In the Java data collection framework, a RandomAccess interface is provided, which has no methods, just a tag. It is usually used by the implementation of the List interface to mark whether the implementation of the List supports Random Access.
A data set implements this interface, which means it supports Random Access, and the average time complexity of reading elements by location is O(1). For example, ArrayList.
If this interface is not implemented, it means that Random Access is not supported. For example, LinkedList.
So it seems that JDK developers have also noticed this problem. The recommended way is to first determine whether Random Access is supported, that is, list instanceof RandomAccess.
for example:
if (list instanceof RandomAccess) {//Use traditional for loop traversal. } else {//Use Iterator or foreach. }The above is the analysis of Java traversal collection method (implementation principle, algorithm performance, and applicable occasions) introduced by the editor. I hope it will be helpful to everyone!