Detailed explanation of Java8 Collect method to collect Stream

Author：Eve Cole Update Time：2025-08-13 17:00:03

Collection, Collections, collect, Collector, Collectos

Collection is the ancestor interface of Java collections.
Collections is a tool class under the java.util package, which connotates various static methods for processing collections.
java.util.stream.Stream#collect(java.util.stream.Collector<? super T,A,R>) is a function of Stream that is responsible for collecting streams.
java.util.stream.Collector is an interface for collecting functions that declares the functions of a collector.
java.util.Comparators is a collector tool class with a series of collector implementations built in.

The function of the collector

You can think of Java8 streams as fancy and lazy dataset iterators. They support two types of operations: intermediate operations (eg filter, map) and terminal operations (such as count, findFirst, forEach, reduce). Intermediate operations can be connected to convert one stream into another. These operations do not consume streams, and the purpose is to create a pipeline. In contrast, terminal operations consume classes, resulting in a final result. collect is a reduction operation, just like reduce, it can accept various methods as parameters, and accumulate elements in the stream into a summary result. The specific approach is defined by defining a new Collector interface.

Predefined collectors

The following is a brief demonstration of the basic built-in collector. The simulated data source is as follows:

 final ArrayList<Dish> dishes = Lists.newArrayList( new Dish("pork", false, 800, Type.MEAT), new Dish("beef", false, 700, Type.MEAT), new Dish("chicken", false, 400, Type.MEAT), new Dish("french fries", true, 530, Type.OTHER), new Dish("rice", true, 350, Type.OTHER), new Dish("season fruit", true, 120, Type.OTHER), new Dish("pizza", true, 550, Type.OTHER), new Dish("prawns", false, 300, Type.FISH), new Dish("salmon", false, 450, Type.FISH));

Maximum value, minimum value, average value

 // Why return Optional? What to do if the stream is null? Optinal makes a lot of sense at this time Optional<Dish> mostCalorieDish = dishes.stream().max(Comparator.comparingInt(Dish::getCalories));Optional<Dish> minCalorieDish = dishes.stream().min(Comparator.comparingInt(Dish::getCalories));Double avgCalories = dishes.stream().collect(Collectors.avagingInt(Dish::getCalories));IntSummaryStatistics summaryStatistics = dishes.stream().collect(Collectors.summarizingInt(Dish::getCalories));double average = summaryStatistics.getAverage(); long count = summaryStatistics.getCount();int max = summaryStatistics.getMax();int min = summaryStatistics.getMin(); long sum = summaryStatistics.getSum();

These simple statistical indicators have Collectors built-in collector functions, especially for numeric type unboxing functions, which will be much less expensive than directly operating the packaging type.

Connect the collector

Want to put together the elements of Stream?

 //Directly connect String join1 = dishes.stream().map(Dish::getName).collect(Collectors.joining());//Comma String join2 = dishes.stream().map(Dish::getName).collect(Collectors.joining(", "));

toList

 List<String> names = dishes.stream().map(Dish::getName).collect(toList());

Map the original Stream into a single element stream and collect it as a List.

toSet

 Set<Type> types = dishes.stream().map(Dish::getType).collect(Collectors.toSet());

Collect Type as a set, and you can repeat it.

toMap

 Map<Type, Dish> byType = dishes.stream().collect(toMap(Dish::getType, d -> d));

Sometimes it may be necessary to convert an array into a map for cache, which facilitates multiple calculations and acquisitions. toMap provides the generation functions of methods k and v. (Note that the above demo is a pit, you cannot use it like this!!! Please use toMap(Function, Function, BinaryOperator))

The above are almost the most commonly used collectors, and they are basically enough. But as a beginner, understanding takes time. To truly understand why this can be used to collect, you must check the internal implementation. You can see that these collectors are based on java.util.stream.Collectors.CollectorImpl, which is an implementation class of Collector mentioned at the beginning. The custom collector will learn the specific usage later.

Custom Reduction

The previous few are special cases of the reduction process defined by the reduce factory method. In fact, Collectors.reducing can be used to create a collector. For example, seek sum

 Integer totalCalories = dishes.stream().collect(reducing(0, Dish::getCalories, (i, j) -> i + j));//Use built-in function instead of arrow function Integer totalCalories2 = dishes.stream().collect(reducing(0, Dish::getCalories, Integer::sum));

Of course, you can also use reduce directly

 Optional<Integer> totalCalories3 = dishes.stream().map(Dish::getCalories).reduce(Integer::sum);

Although it is OK, if you consider efficiency, you should still choose the following

 int sum = dishes.stream().mapToInt(Dish::getCalories).sum();

Choose the best solution according to the situation

As mentioned above, functional programming usually provides multiple ways to perform the same operation. Using collector collect is more complex than using stream APIs. The advantage is that collect can provide a higher level of abstraction and generalization, and is easier to reuse and customize.

Our advice is to explore different solutions to the problem at hand as much as possible, always choose the most professional one, which is generally the best decision in terms of readability and performance.

In addition to receiving an initial value, reducing can also use the first item as the initial value

 Optional<Dish> mostCalorieDish = dishes.stream() .collect(reducing((d1, d2) -> d1.getCalories() > d2.getCalories() ? d1 : d2));

Reducing

The usage of reducing is quite complicated, and the goal is to merge two values into one value.

 public static <T, U> Collector<T, ?, U> reducing(U identity, Function<? super T, ? extends U> mapper, BinaryOperator<U> op)

First, I saw 3 generics.

U is the type of return value. For example, the heat calculated in the above demo, U is Integer.

Regarding T, T is the element type in Stream. From the Function function, we can know that the function of mapper is to receive a parameter T and then return a result U. Corresponding to Dish in demo.

?In the middle of the generic list with the return value Collector, this represents the container type. A collector of course needs a container to store data. Here? This means that the container type is uncertain. In fact, the container here is U[].

About the parameters:

identity is the initial value of the return value type, which can be understood as the starting point of the accumulator.

mapper is the function of map, and its significance lies in converting Stream streams into the type stream you want.

op is the core function, and its function is how to deal with two variables. Among them, the first variable is the cumulative value, which can be understood as sum, and the second variable is the next element to be calculated. Thus, the accumulation is achieved.

There is also an overloaded method to omit the first parameter, which means that the first parameter in the Stream is used as the initial value.

 public static <T> Collector<T, ?, Optional<T>> reducing(BinaryOperator<T> op)

Let’s look at the difference between the return value. T represents the input value and the return value type, that is, the input value type and the output value type are the same. Another difference is Optional. This is because there is no initial value, and the first parameter may be null. When the Stream element is null, it is very meaningful to return Optional.

Looking at the parameter list, only BinaryOperator is left. BinaryOperator is a triple function interface, the goal is to calculate two parameters of the same type and return values of the same type. It can be understood as 1>2? 1:2, that is, find the maximum value of two numbers. Finding the maximum value is a relatively easy to understand statement. You can customize the lambda expression to select the return value. Then, here, it is to receive the element type T of two Streams and return the return value of type T. It is also OK to use sum to understand.

In the above demo, it is found that the functions of reduce and collect are almost the same, both return a final result. For example, we can use reduce toList effect:

 //Manually implement toListCollector --- Abuse of reduce, immutable regulations---cannot parallel List<Integer> calories = dishes.stream().map(Dish::getCalories) .reduce(new ArrayList<Integer>(), (List<Integer> l, Integer e) -> { l.add(e); return l; }, (List<Integer> l1, List<Integer> l2) -> { l1.addAll(l2); return l1; } );

Let me explain the above practices.

 <U> U reduce(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner);

U is the return value type, here is List

BiFunction<U, ? super T, U> accumulator is an accumulator, and its goal is to accumulate values and calculation rules for individual elements. Here is the operation of List and elements, and finally return List. That is, add an element to the list.

BinaryOperator<U> combiner is a combiner, and the goal is to merge two variables of return value types into one. Here is the merger of two lists.
There are two problems with this solution: one is a semantic problem and the other is a practical problem. The semantic problem is that the reduce method aims to combine two values to generate a new value, which is an immutable reduction. Instead, the design of the collect method is to change the container and accumulate the results to be output. This means that the above code snippet is abusing the reduce method because it changes the List as an accumulator in place. The wrong semantics to use the reduce method also create a practical problem: this reduction cannot work in parallel, because concurrent modification of the same data structure by multiple threads may destroy the List itself. In this case, if you want thread safety, you need to allocate a new List at a time, and object allocation will in turn affect performance. This is why collect is suitable for expressing reductions on mutable containers, and more importantly, it is suitable for parallel operations.

Summary: reduce is suitable for immutable container reduction, collect is suitable for mutable container reduction. collect is suitable for parallelism.

Grouping

The database often encounters the need for group summing, and provides the group by primitive. In Java, if you follow the instructional style (manually write loops), it will be very cumbersome and prone to errors. Java 8 provides functional solutions.

For example, group dish by type. Similar to the previous toMap, but the grouping value is not a dish, but a List.

 Map<Type, List<Dish>> dishesByType = dishes.stream().collect(groupingBy(Dish::getType));

here

 public static <T, K> Collector<T, ?, Map<K, List<T>>> groupingBy(Function<? super T, ? extends K> classifier)

The parameter classifier is a Function, designed to receive one parameter and convert it to another type. The demo above is to convert the stream element dish into type Type, and then group the stream according to the Type. Its internal grouping is implemented through HashMap. groupingBy(classifier, HashMap::new, downstream);

In addition to grouping according to the property function of the stream element itself, you can also customize the grouping basis, such as grouping according to the heat range.

Since you already know that the parameter of groupingBy is Function and the parameter type of Function is Dish, you can customize the classifier as:

 private CaloricLevel getCaloricLevel(Dish d) { if (d.getCalories() <= 400) { return CaloricLevel.DIET; } else if (d.getCalories() <= 700) { return CaloricLevel.NORMAL; } else { return CaloricLevel.FAT; }}

Just pass in the parameters

 Map<CaloricLevel, List<Dish>> dishesByLevel = dishes.stream() .collect(groupingBy(this::getCaloricLevel));

Multi-level grouping

groupingBy also overloads several other methods, such as

 public static <T, K, A, D> Collector<T, ?, Map<K, D>> groupingBy(Function<? super T, ? extends K> classifier, Collector<? super T, A, D> downstream)

There are many generics and horrors. Let’s get a brief understanding. A classifier is also a classifier, which receives the element type of the stream and returns a basis for which you want to group, that is, the cardinality of the grouping basis. So T represents the current element type of the stream, and K represents the element type of the grouping. The second parameter is downstream, and the downstream is a collector Collector. This collector element type is a subclass of T, the container type container is A, and the reduction return value type is D. That is to say, the K of the group is provided through the classifier, and the value of the group is reduced through the collector of the second parameter. Just so happens that the source code of the previous demo is:

 public static <T, K> Collector<T, ?, Map<K, List<T>>> groupingBy(Function<? super T, ? extends K> classifier) { return groupingBy(classifier, toList()); }

Use toList as the reduce collector, and the final result is a List<Dish>, so the value type of the group ends is List<Dish>. Then, the value type can be analogously determined by the reduce collector, and there are tens of millions of reduce collectors. For example, I want to group the value again, and grouping is also a kind of reduce.

 //Multi-level grouping Map<Type, Map<CaloricLevel, List<Dish>>> byTypeAndCalory = dishes.stream().collect( groupingBy(Dish::getType, groupingBy(this::getCaloricLevel)));byTypeAndCalory.forEach((type, byCalory) -> { System.out.println("------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- System.out.println("/t" + level); System.out.println("/t/t" + dishList); });});

The verification results are:

 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [Dish(name=beef, vegetarian=false, calories=700, type=MEAT)]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- type=OTHER)]

Summary: The core parameters of groupingBy are K generators and V generators. The V generator can be any type of collector Collector.

For example, the V generator can calculate the number, thus implementing the select count(*) in the SQL statement from table A group by Type

 Map<Type, Long> typesCount = dishes.stream().collect(groupingBy(Dish::getType, counting()));System.out.println(typesCount);----------{FISH=2, MEAT=3, OTHER=4}

SQL search group highest score select MAX(id) from table A group by Type

 Map<Type, Optional<Dish>> mostCaloricByType = dishes.stream() .collect(groupingBy(Dish::getType, maxBy(Comparator.comparingInt(Dish::getCalories))));

Optional here makes no sense, because it is certainly not null. Then I had to take it out. Using collectingAndThen

 Map<Type, Dish> mostCaloricByType = dishes.stream() .collect(groupingBy(Dish::getType, collectingAndThen(maxBy(Comparator.comparingInt(Dish::getCalories)), Optional::get)));

It seems that the result is coming out here, but IDEA does not agree. It compiles the yellow alarm and changes it to:

 Map<Type, Dish> mostCaloricByType = dishes.stream() .collect(toMap(Dish::getType, Function.identity(), BinaryOperator.maxBy(comparingInt(Dish::getCalories))));

Yes, groupingBy becomes toMap, key is still Type, value is still Dish, but there is one more parameter! ! Here we respond to the pit at the beginning. The toMap demonstration at the beginning is for easy understanding. If it is really used, it will be killed. We know that reorganizing a List into a Map will inevitably face the same problem. When K is the same, does v override or ignore it? The previous demo method is to insert k again and throw an exception directly when k is present:

 java.lang.IllegalStateException: Duplicate key Dish(name=pork, vegetable=false, calories=800, type=MEAT) at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)

The correct way is to provide functions for handling conflicts. In this demo, the principle of handling conflicts is to find the largest one, which just meets our requirements for grouping and finding the largest one. (I really don’t want to do Java 8 functional learning anymore, I feel that there are pitfalls of performance problems everywhere)

Continue database sql mapping, select sum(score) from table a group by Type

 Map<Type, Integer> totalCaloriesByType = dishes.stream() .collect(groupingBy(Dish::getType, summingInt(Dish::getCalories)));

However, another collector that is often used in conjunction with groupingBy is generated by the mapping method. This method receives two parameters: one function transforms elements in the stream, and the other collects the transformed result objects. The purpose is to apply a mapping function to each input element before accumulation, so that the collector that receives elements of a particular type can adapt to different types of objects. Let me look at a practical example of using this collector. For example, you want to get what CaloricLevels are there in the menu for each type of Dish. We can combine groupingBy and mapping collectors as follows:

 Map<Type, Set<CaloricLevel>> caloricLevelsByType = dishes.stream() .collect(groupingBy(Dish::getType, mapping(this::getCaloricLevel, toSet())));

The toSet here uses the HashSet by default, and you can also manually specify the specific implementation toCollection (HashSet::new)

Partition

Partitioning is a special case of grouping: a predicate (a function that returns a Boolean value) is used as a classification function, which is called a partition function. The partition function returns a boolean value, which means the key type of the grouped Map is Boolean, so it can be divided into up to two groups: true or false. For example, if you are a vegetarian, you might want to separate the menu by vegetarian and non-vegetarian:

 Map<Boolean, List<Dish>> partitionedMenu = dishes.stream().collect(partitioningBy(Dish::isVegetarian));

Of course, using filter can achieve the same effect:

 List<Dish> vegetarianDishes = dishes.stream().filter(Dish::isVegetarian).collect(Collectors.toList());

The advantage of partitioning is to save two copies, which is useful when you want to classify a list. At the same time, like groupingBy, partitioningBy has overloaded methods, which can specify the type of grouping value.

 Map<Boolean, Map<Type, List<Dish>>> vegetarianDishesByType = dishes.stream() .collect(partitioningBy(Dish::isVegetarian, groupingBy(Dish::getType)));Map<Boolean, Integer> vegetarianDishesTotalCalories = dishes.stream() .collect(partitioningBy(Dish::isVegetarian, summingInt(Dish::getCalories)));Map<Boolean, Dish> mostCaloricPartitionedByVegetarian = dishes.stream() .collect(partitioningBy(Dish::isVegetarian, collectingAndThen(maxBy(comparingInt(Dish::getCalories)), Optional::get)));

As the last example of using the partitioningBy collector, we put the menu data model aside to see a more complex and interesting example: dividing arrays into prime and non-prime numbers.

First, define a prime partition function:

 private boolean isPrime(int candidate) { int candidateRoot = (int) Math.sqrt((double) candidate); return IntStream.rangeClosed(2, candidateRoot).noneMatch(i -> candidate % i == 0);}

Then find the prime and non-prime numbers from 1 to 100

 Map<Boolean, List<Integer>> partitionPrimes = IntStream.rangeClosed(2, 100).boxed() .collect(partitioningBy(this::isPrime));