We have been waiting for a long time for lambda to bring the concept of closures to Java, but if we don't use it in collections, we lose a lot of value. The problem of migrating existing interfaces to lambda style has been solved through default methods. In this article, we will deeply analyze the bulk data operation (bulk operation) in Java collections and unravel the mystery of the most powerful role of lambda.
1.About JSR335
JSR is the abbreviation of Java Specification Requests, which means Java specification request. The main improvement of the Java 8 version is the Lambda project (JSR 335), which aims to make Java easier to write code for multi-core processors. JSR 335=lambda expression + interface improvement (default method) + batch data operation. Together with the previous two articles, we have completely learned the relevant content of JSR335.
2. External vs. internal iteration
In the past, Java collections could not express internal iteration, but only provided one way of external iteration, that is, for or while loop.
Copy the code code as follows:
List persons = asList(new Person("Joe"), new Person("Jim"), new Person("John"));
for (Person p : persons) {
p.setLastName("Doe");
}
The above example is our previous approach, which is the so-called external iteration. The loop is a fixed sequence loop. In today's multi-core era, if we want to loop in parallel, we have to modify the above code. How much the efficiency can be improved is still uncertain, and it will bring certain risks (thread safety issues, etc.).
To describe internal iteration, we need to use a class library like Lambda. Let’s rewrite the above loop using lambda and Collection.forEach.
Copy the code as follows: persons.forEach(p->p.setLastName("Doe"));
Now the jdk library controls the loop. We don't need to care about how the last name is set to each person object. The library can decide how to do it according to the running environment, parallel, out-of-order or lazy loading. This is internal iteration, and the client passes the behavior p.setLastName as data into the api.
In fact, internal iteration is not closely related to the batch operation of collections. With its help, we can feel the changes in grammatical expression. The really interesting thing related to batch operations is the new stream API. The new java.util.stream package has been added to JDK 8.
3.Stream API
Stream only represents a data flow and has no data structure, so it can no longer be traversed after it has been traversed once (this needs to be paid attention to when programming, unlike Collection, there is still data in it no matter how many times it is traversed). The source can be Collection, array, io, etc.
3.1 Intermediate and end-point methods
Streaming provides an interface for operating big data, making data operations easier and faster. It has methods such as filtering, mapping, and reducing the number of traversals. These methods are divided into two types: intermediate methods and terminal methods. The "stream" abstraction should be continuous by nature. Intermediate methods always return a Stream, so if we want to get the final result If so, endpoint operations must be used to collect the final results produced by the stream. The difference between these two methods is to look at its return value. If it is a Stream, it is an intermediate method, otherwise it is an end method. Please refer to Stream's API for details.
Briefly introduce several intermediate methods (filter, map) and end-point methods (collect, sum)
3.1.1Filter
Implementing filtering functions in data streams is the most natural operation we can think of. The Stream interface exposes a filter method, which accepts a Predicate implementation representing an operation to use a lambda expression that defines filter conditions.
Copy the code code as follows:
List persons = …
Stream personsOver18 = persons.stream().filter(p -> p.getAge() > 18);//Filter people over 18 years old
3.1.2Map
Suppose we filter some data now, such as when converting objects. The Map operation allows us to execute an implementation of a Function (the generic T and R of Function<T, R> represent execution input and execution results respectively), which accepts input parameters and returns them. First, let's see how to describe it as an anonymous inner class:
Copy the code code as follows:
Stream adult = persons
.stream()
.filter(p -> p.getAge() > 18)
.map(new Function() {
@Override
public Adult apply(Person person) {
return new Adult(person);//Convert a person over 18 years old to an adult
}
});
Now, convert the above example into a lambda expression:
Copy the code code as follows:
Stream map = persons.stream()
.filter(p -> p.getAge() > 18)
.map(person -> new Adult(person));
3.1.3Count
The count method is the end point method of a stream, which can make the final statistics of the stream results and return an int. For example, let's calculate the total number of people aged 18 or above:
Copy the code code as follows:
int countOfAdult=persons.stream()
.filter(p -> p.getAge() > 18)
.map(person -> new Adult(person))
.count();
3.1.4Collect
The collect method is also an end-point method of a stream, which can collect the final results.
Copy the code code as follows:
List adultList= persons.stream()
.filter(p -> p.getAge() > 18)
.map(person -> new Adult(person))
.collect(Collectors.toList());
Or if we want to use a specific implementation class to collect the results:
Copy the code code as follows:
List adultList = persons
.stream()
.filter(p -> p.getAge() > 18)
.map(person -> new Adult(person))
.collect(Collectors.toCollection(ArrayList::new));
Due to limited space, other intermediate methods and end-point methods will not be introduced one by one. After reading the above examples, you only need to understand the difference between these two methods, and you can decide to use them according to your needs later.
3.2 Sequential flow and parallel flow
Each Stream has two modes: sequential execution and parallel execution.
Sequence flow:
Copy the code code as follows:
List <Person> people = list.getStream.collect(Collectors.toList());
Parallel streams:
Copy the code code as follows:
List <Person> people = list.getStream.parallel().collect(Collectors.toList());
As the name implies, when using the sequential method to traverse, each item is read before the next item is read. When using parallel traversal, the array will be divided into multiple segments, each of which is processed in a different thread, and then the results are output together.
3.2.1 Parallel stream principle:
Copy the code code as follows:
List originalList = someData;
split1 = originalList(0, mid);//Divide the data into small parts
split2 = originalList(mid,end);
new Runnable(split1.process());//Execute operations in small parts
new Runnable(split2.process());
List revisedList = split1 + split2;//Merge the results
3.2.2 Comparison of sequential and parallel performance tests
If it is a multi-core machine, theoretically the parallel stream will be twice as fast as the sequential stream. The following is the test code
Copy the code code as follows:
long t0 = System.nanoTime();
//Initialize an integer stream with a range of 1 million and find a number that can be divisible by 2. toArray() is the end point method
int a[]=IntStream.range(0, 1_000_000).filter(p -> p % 2==0).toArray();
long t1 = System.nanoTime();
//Same function as above, here we use parallel stream to calculate
int b[]=IntStream.range(0, 1_000_000).parallel().filter(p -> p % 2==0).toArray();
long t2 = System.nanoTime();
//The results of my local machine are serial: 0.06s, parallel 0.02s, which proves that parallel flow is indeed faster than sequential flow.
System.out.printf("serial: %.2fs, parallel %.2fs%n", (t1 - t0) * 1e-9, (t2 - t1) * 1e-9);
3.3 About Folk/Join framework
Application hardware parallelism is available in Java 7. One of the new features of the java.util.concurrent package is a fork-join style parallel decomposition framework. It is also very powerful and efficient. Interested students can study it. I won’t go into details here. Compared to Stream.parallel(), I prefer the latter.
4. Summary
Without lambda, Stream is quite awkward to use. It will generate a large number of anonymous internal classes, such as the 3.1.2map example above. If there is no default method, changes to the collection framework will inevitably cause a lot of changes, so lambda+default method makes the jdk library More powerful and flexible, the improvements of Stream and collection framework are the best proof.