Java Virtual Machine JVM Performance Optimization (3): Detailed explanation of garbage collection

Author：Eve Cole Update Time：2025-02-27 15:16:02

The garbage collection mechanism of the Java platform has significantly improved developer efficiency, but a poorly implemented garbage collector may consume too many application resources. In the third part of the Java virtual machine performance optimization series, Eva Andreasson introduces Java beginners to the memory model and garbage collection mechanism of the Java platform. She explains why fragmentation (not garbage collection) is the main problem in Java application performance, and why generational garbage collection and compaction are currently the main (but not the most innovative) ways to deal with Java application fragmentation.

The purpose of garbage collection (GC) is to release the memory occupied by Java objects that are no longer referenced by any active objects. It is the core part of the dynamic memory management mechanism of the Java virtual machine. During a typical garbage collection cycle, all objects that are still referenced (and therefore reachable) are retained, while those that are no longer referenced are released and the space they occupy is reclaimed for allocation. to new objects.

In order to understand the garbage collection mechanism and various garbage collection algorithms, you first need to know something about the Java platform memory model.

Garbage collection and the Java platform memory model

When you start a Java program from the command line and specify the startup parameter -Xmx (for example: java -Xmx:2g MyApp), the memory of the specified size is allocated to the Java process, which is the so-called Java heap. This dedicated memory address space is used to store objects created by Java programs (and sometimes the JVM). As the application runs and continuously allocates memory for new objects, the Java heap (that is, the dedicated memory address space) will slowly fill up.

Eventually the Java heap will fill up, which means that the memory allocation thread cannot find a contiguous space large enough to allocate memory for the new object. At this time, the JVM decides to notify the garbage collector and start garbage collection. Garbage collection can also be triggered by calling System.gc() in the program, but using System.gc() does not guarantee that garbage collection will be performed. Before any garbage collection, the garbage collection mechanism will first determine whether it is safe to perform garbage collection. When all active threads of the application are at a safe point, a garbage collection can be started. For example, garbage collection cannot be performed when memory is being allocated for an object, or garbage collection cannot be performed while CPU instructions are being optimized, because the context is likely to be lost and the final result will be incorrect.

The garbage collector cannot reclaim any object with active references, which would break the Java virtual machine specification. There is no need to recycle dead objects immediately, because dead objects will eventually be recycled by subsequent garbage collection. Although there are many ways to implement garbage collection, the above two points are the same for all garbage collection implementations. The real challenge of garbage collection is how to identify whether an object is alive and how to reclaim memory without affecting the application as much as possible. Therefore, the garbage collector has the following two goals:

1. Quickly release unreferenced memory to meet the application's memory allocation needs to avoid memory overflow.
2. Minimize the impact on running application performance (latency and throughput) when reclaiming memory.

Two types of garbage collection

In the first article of this series, I introduced two methods of garbage collection, namely reference counting and tracking collection. Next we explore these two approaches further and introduce some trace collection algorithms used in production environments.

Reference counting collector

The reference counting collector records the number of references pointing to each Java object. Once the number of references pointing to an object reaches 0, the object can be recycled immediately. This immediacy is the main advantage of a reference-counted collector, and there is almost no overhead in maintaining memory that no reference points to, but keeping track of the latest reference count for each object is expensive.

The main difficulty of the reference counting collector is how to ensure the accuracy of reference counting. Another well-known difficulty is how to deal with circular references. If two objects reference each other and are not referenced by other live objects, then the memory of the two objects will never be reclaimed because the number of references pointing to neither object is 0. Memory recycling of circular reference structures requires major analysis (Translator's Note: Global Analysis on the Java Heap), which will increase the complexity of the algorithm and thus bring additional overhead to the application.

trace collector

The tracing collector is based on the assumption that all live objects can be found by iterating references (references and references of references) to a known initial set of live objects. The initial set of active objects (also called root objects) can be determined by analyzing registers, global objects, and stack frames. After determining the initial set of objects, the tracking collector follows the reference relationships of these objects and marks the objects pointed to by the references as active objects in sequence, so that the set of known active objects continues to expand. This process continues until all referenced objects are marked as live objects, and the memory of those objects that have not been marked is reclaimed.

The tracking collector differs from the reference counting collector mainly in that it can handle circular reference structures. Most tracing collectors discover unreferenced objects in circular reference structures during the marking phase.

The tracing collector is the most commonly used memory management method in dynamic languages and is currently the most common method in Java. It has also been verified in production environments for many years. Below I will introduce the trace collector starting with some algorithms for implementing trace collection.

Trace collection algorithm

Copying garbage collectors and mark-sweep garbage collectors are nothing new, but they are still the two most common algorithms for implementing tracking collections today.

Copying the garbage collector

The traditional copying garbage collector uses two address spaces in the heap (ie, from space and to space). When garbage collection is performed, the active objects in the from space are copied to the to space. When all active objects in the from space are removed ( Translator's Note: After copying to the to space or the old generation), the entire from space can be recycled. When the space is allocated again, the to space will be used first (Translator's Note: That is, the to space of the previous round will be used as the new round of space. from space).

In the early implementation of this algorithm, the from space and the to space continuously changed their positions. That is to say, when the to space is full and garbage collection is triggered, the to space becomes the from space, as shown in Figure 1.

Figure 1 Traditional copy garbage collection sequence

The latest copy algorithm allows any address space in the heap to be used as to space and from space. This way they don't need to swap positions with each other, but just logically change positions.

The advantage of the copying collector is that the copied objects in the to space are arranged compactly and there is no fragmentation at all. Fragmentation is a common problem faced by other garbage collectors, and it is also the main issue I will discuss later.

Disadvantages of the copy collector

Generally speaking, the copying collector is stop-the-world, which means that as long as garbage collection is in progress, the application cannot execute. With this implementation, the more things you need to copy, the greater the impact on application performance. This is a disadvantage for applications that are response time sensitive. When using the copy collector, you also need to consider the worst scenario (that is, all objects in the from space are active objects). At this time, you need to prepare a large enough space for moving these active objects, so the to space must be large enough to Install all objects in the from space. Due to this limitation, the memory utilization of the copy algorithm is slightly insufficient (Translator's Note: In the worst case, the to space needs to be the same size as the from space, so only 50% utilization).

mark-clear collector

Most commercial JVMs deployed in enterprise production environments use a mark-sweep (or mark) collector because it does not replicate the impact of a garbage collector on application performance. The most famous mark collectors include CMS, G1, GenPar and DeterministicGC.

The mark-sweep collector tracks object references and marks each found object as live using a flag bit. This flag usually corresponds to an address or a group of addresses on the heap. For example: the active bit can be a bit in the object header (Translator's Note: bit) or a bit vector or a bitmap.

After the marking is completed, the cleanup phase is entered. The cleanup phase usually traverses the heap again (not just objects marked live, but the entire heap) to locate unmarked contiguous memory address spaces (unmarked memory is free and recyclable), and then the collector Organize them into free lists. The garbage collector can have multiple free lists (usually divided according to the size of the memory block). Some JVM (for example: JRockit Real Time) collectors even dynamically divide the free list based on application performance analysis and object size statistics.

After the cleanup phase, the application can allocate memory again. When allocating memory for a new object from the free list, the newly allocated memory block needs to fit the size of the new object, or the average object size of the thread, or the TLAB size of the application. Finding appropriately sized blocks of memory for new objects helps optimize memory and reduce fragmentation.

Mark - Clear Collector's Defects

The execution time of the mark phase depends on the number of active objects in the heap, while the execution time of the cleanup phase depends on the size of the heap. Therefore, for situations where the heap setting is large and there are many active objects in the heap, the mark-sweep algorithm will have a certain pause time.

For memory-intensive applications, you can adjust garbage collection parameters to suit various application scenarios and needs. In many cases, this adjustment at least postpones the risk posed by the mark/sweep phase to the application or service agreement SLA (SLA here refers to the response time the application needs to achieve). But tuning is only effective for specific loads and memory allocation rates. Load changes or modifications to the application itself require re-tuning.

Implementation of mark-sweep collector

There are at least two commercially proven methods for implementing mark-sweep garbage collection. One is parallel garbage collection and the other is concurrent (or most of the time concurrent) garbage collection.

Parallel collector

Parallel collection means that resources are used in parallel by garbage collection threads. Most commercial implementations of parallel collection are stop-the-world collectors, in which all application threads are paused until a garbage collection is completed. Because garbage collectors can use resources efficiently, they usually perform better in throughput benchmarks. Get high scores in, like SPECjbb. If throughput is critical to your application, a parallel garbage collector is a good choice.

The main cost of parallel collection (especially for production environments) is that application threads cannot function properly during garbage collection, just like the copying collector. Therefore, the use of parallel collectors will have a significant impact on applications that are sensitive to response time. Especially when there are many complex active object structures in the heap space, there are many object references that need to be tracked. (Remember the time it takes for the mark-sweep collector to reclaim memory depends on the time it takes to track the collection of live objects plus the time it takes to traverse the entire heap) With the parallel approach, the application is paused for the entire garbage collection time.

concurrent collector

Concurrent garbage collectors are more suitable for applications that are sensitive to response time. Concurrency means that the garbage collection thread and the application thread execute concurrently. The garbage collection thread does not own all resources, so it needs to decide when to start a garbage collection, allowing enough time to track the active object collection and reclaim the memory before the application memory overflows. If the garbage collection is not completed in time, the application will throw a memory overflow error. On the other hand, you do not want the garbage collection to take too long because it will consume the application's resources and affect the throughput. Maintaining this balance requires skill, so heuristics are used in determining when to start garbage collection and when to choose garbage collection optimizations.

Another difficulty is determining when it is safe to perform some operations (operations that require a complete and accurate heap snapshot), such as needing to know when the mark phase is complete so that the cleanup phase can be entered. This is not a problem for a stop-the-world parallel collector, because the world is already paused (Translator's Note: The application thread is paused, and the garbage collection thread monopolizes resources). But for concurrent collectors, it may not be safe to switch from the marking phase to the cleaning phase immediately. If an application thread modifies a piece of memory that has been tracked and marked by the garbage collector, new unmarked references may be generated. In some concurrent collection implementations, this can cause the application to get stuck in a loop of repeated annotations for a long time without being able to obtain free memory when the application needs this memory.

From the discussion so far we know that there are many garbage collectors and garbage collection algorithms, each suitable for specific application types and different loads. Not just different algorithms, but different algorithm implementations. Therefore, it is best to understand the needs of the application and its own characteristics before specifying a garbage collector. Next, we will introduce some pitfalls of the Java platform memory model. The pitfalls here refer to some assumptions that Java programmers are prone to make in a dynamically changing production environment that make application performance worse.

Why Tuning Can’t Replace Garbage Collection

Most Java programmers know that there are many options for optimizing Java programs. Several optional JVM, garbage collector and performance tuning parameters allow developers to spend a lot of time on endless performance tuning. This has led some people to conclude that garbage collection is bad, and that tuning to make garbage collections occur less frequently or last shorter is a good workaround, but this is risky.

Consider tuning for a specific application. Most tuning parameters (such as memory allocation rate, object size, response time) are based on the application's memory allocation rate (Translator's Note: or other parameters) based on the current test data volume. )Adjustment. It may ultimately lead to the following two results:

1. A use case that passes in testing fails in production.
2. Changes in data volume or changes in applications require re-tuning.

Tuning is iterative, and concurrent garbage collectors in particular may require a lot of tuning (especially in a production environment). Heuristics are needed to meet the needs of the application. In order to meet the worst case scenario, the result of tuning may be a very rigid configuration, which also leads to a lot of waste of resources. This tuning approach is a quixotic quest. In fact, the more you optimize the garbage collector to match a specific load, the further away you are from the dynamic nature of the Java runtime. After all, how many applications have a stable load, and how reliable can you expect the load to be?

So if you're not focusing on tuning, what can you do to prevent out-of-memory errors and improve response times? The first thing is to find the main factors affecting the performance of Java applications.

fragmentation

The factor that affects Java application performance is not the garbage collector, but fragmentation and how the garbage collector handles fragmentation. The so-called fragmentation is a state where there is free space in the heap space, but there is not a large enough contiguous memory space to allocate memory for new objects. As mentioned in the first article, memory fragmentation is either a TLAB of space remaining in the heap, or the space occupied by small objects that are released among long-lived objects.

Over time and as the application runs, this fragmentation becomes spread throughout the heap. In some cases, using statically tuned parameters can be worse because they fail to meet the dynamic needs of the application. Applications cannot efficiently utilize this fragmented space. Failure to do anything will result in successive garbage collections where the garbage collector attempts to free memory for allocation to new objects. In the worst case, even successive garbage collections cannot free more memory (too much fragmentation), and then the JVM has to throw a memory overflow error. You can resolve fragmentation by restarting the application so that the Java heap has contiguous memory space to allocate new objects. Restarting the program causes downtime, and after a while the Java heap will become full of fragments again, forcing another restart.

Out-of-memory errors that hang the process and logs showing that the garbage collector is overloaded indicate that the garbage collection is trying to free memory and that the heap is heavily fragmented. Some programmers will try to solve the fragmentation problem by optimizing the garbage collector again. But I think we should find more innovative ways to solve this problem. The following sections will focus on two solutions to fragmentation: generational garbage collection and compaction.

Generational garbage collection

You may have heard the theory that most objects in a production environment are short-lived. Generational garbage collection is a garbage collection strategy derived from this theory. In generational garbage collection, we divide the heap into different spaces (or generations), and each space stores objects of different ages. The so-called age of an object is the number of garbage collection cycles that the object has survived (that is, how old the object is). still referenced after garbage collection cycles).

When there is no remaining space in the new generation to allocate, the active objects in the new generation will be moved to the old generation (usually there are only two generations. Translator's Note: Only objects that meet a certain age will be moved to the old generation). Generations Garbage collection often uses a one-way copy collector. Some more modern JVMs use parallel collectors in the new generation. Of course, different garbage collection algorithms can be implemented for the new generation and the old generation. If you use a parallel collector or a copying collector, your young collector is a stop-the-world collector (see previous explanation).

The old generation is allocated to objects that have been moved out of the new generation. These objects have either been referenced for a long time or are referenced by some collection of objects in the new generation. Occasionally, large objects are allocated directly to the old generation because the cost of moving large objects is relatively high.

Generational garbage collection technology

In generational garbage collection, garbage collection runs less frequently in the old generation and more frequently in the new generation, and we also hope that the garbage collection cycle in the new generation will be shorter. In rare cases, the young generation may be collected more frequently than the old generation. This can happen if you make the young generation too large and most of the objects in your application live for a long time. In this case, if the old generation is set too small to accommodate all long-lived objects, the garbage collection of the old generation will also struggle to free up space for the objects that are moved in. However, generally speaking, generational garbage collection can enable applications to achieve better performance.

Another benefit of dividing the new generation is that it solves the fragmentation problem to some extent, or postpones the worst case scenario. Those small objects with short survival times may have caused fragmentation problems, but they are all cleaned up in the new generation garbage collection. Since long-lived objects are allocated more compact space when they are moved to the old generation, the old generation is also more compact. Over time (if your application runs long enough), the old generation will also become fragmented, requiring one or more full garbage collections to be run, and the JVM may also throw out-of-memory errors. But carving out a new generation postpones the worst-case scenario, which is enough for many applications. For most applications, it does reduce the frequency of stop-the-world garbage collection and the chance of out-of-memory errors.

Optimize generational garbage collection

As mentioned before, using generational garbage collection brings repeated tuning work, such as adjusting the young generation size, promotion rate, etc. I can't emphasize the trade-off for a specific application runtime: choosing a fixed size optimizes the application, but it also reduces the garbage collector's ability to cope with dynamic changes, which are inevitable.

The first principle for the new generation is to increase it as much as possible while ensuring the delay time during stop-the-world garbage collection, and at the same time, reserve enough space in the heap for long-term surviving objects. Here are some additional factors to consider when tuning a generational garbage collector:

1. Most of the new generation are stop-the-world garbage collectors. The larger the new generation setting, the longer the corresponding pause time. Therefore, for applications that are greatly affected by garbage collection pause times, carefully consider how large the young generation is.

2. Different garbage collection algorithms can be used on different generations. For example, parallel garbage collection is used in the young generation and concurrent garbage collection is used in the old generation.

3. When it is found that frequent promotion (Translator's Note: moving from the new generation to the old generation) fails, it means that there are too many fragments in the old generation, which means that there is not enough space in the old generation to store objects moved from the new generation. At this point you can adjust the promotion rate (i.e. adjust the promotion age), or ensure that the garbage collection algorithm in the old generation is doing the compression (discussed in the next paragraph) and adjust the compression to suit the load of the application. It is also possible to increase the heap size and each generation size, but this will further extend the pause time on the old generation. Know that fragmentation is inevitable.

4. Generational garbage collection is most suitable for such applications. They have many small objects with short survival times. Many objects are recycled in the first round of garbage collection cycle. For such applications, generational garbage collection can effectively reduce fragmentation and delay the impact of fragmentation.

compression

Although generational garbage collection delays the occurrence of fragmentation and out-of-memory errors, compression is the only real solution to the fragmentation problem. Compaction is a garbage collection strategy that releases contiguous blocks of memory by moving objects around, thus freeing up enough space to create new objects.

Moving objects and updating object references are stop-the-world operations that will bring a certain amount of consumption (with one exception, which will be discussed in the next article in this series). The more objects that survive, the longer the pause time caused by compaction. In situations where there is little remaining space and severe fragmentation (usually because the program has been running for a long time), there may be a pause of a few seconds in compacting areas with many live objects, and when approaching memory overflow, Compressing the entire heap can even take tens of seconds.

The pause time for compaction depends on the amount of memory that needs to be moved and the number of references that need to be updated. Statistical analysis shows that the larger the heap, the greater the number of live objects that need to be moved and references updated. The pause time is about 1 second for every 1GB to 2GB of live objects moved, and for a 4GB size heap it is likely that there are 25% live objects, so there will be occasional pauses of about 1 second.

Compression and Application Memory Wall

The application memory wall refers to the heap size that can be set before a pause caused by garbage collection (for example: compaction). Depending on the system and application, most Java application memory walls range from 4GB to 20GB. This is why most enterprise applications are deployed on multiple smaller JVMs rather than a few larger JVMs. Let's consider this: How many modern enterprise Java application designs and deployments are defined by the compression limitations of the JVM. In this case, in order to bypass the pause time of defragmenting the heap, we settled for a multi-instance deployment that was more expensive to manage. This is a bit strange considering the large storage capabilities of today's hardware and the need for increased memory for enterprise-class Java applications. Why only a few GB of memory is set for each instance. Concurrent compression will break down the memory wall, which is the topic of my next article.

Summarize

This article is an introductory article about garbage collection to help you understand the concepts and mechanisms of garbage collection, and hopefully motivate you to read further related articles. Many of the things discussed here have been around for a long time, and some new concepts will be introduced in the next article. For example, concurrent compression is currently implemented by Azul's Zing JVM. It is an emerging garbage collection technology that even attempts to redefine the Java memory model, especially as memory and processing power continue to improve today.

Here are some key points about garbage collection that I have summarized:

1. Different garbage collection algorithms and implementations adapt to different application needs. The tracking garbage collector is the most commonly used garbage collector in commercial Java virtual machines.

2. Parallel garbage collection uses all resources in parallel when performing garbage collection. It is usually a stop-the-world garbage collector and therefore has higher throughput, but the application's worker threads must wait for the garbage collection thread to complete, which has a certain impact on the application's response time.

3. Concurrent garbage collection: While the collection is being performed, the application worker thread is still running. A concurrent garbage collector needs to complete garbage collection before the application needs the memory.

4. Generational garbage collection helps delay fragmentation, but it cannot eliminate fragmentation. Generational garbage collection divides the heap into two spaces, one space for new objects and the other for old objects. Generational garbage collection is suitable for applications with many small objects that have short lifetimes.

5. Compression is the only way to solve fragmentation. Most garbage collectors perform compression in a stop-the-world manner. The longer the program runs, the more complex the object references are, and the more unevenly distributed the object sizes are, which will lead to longer compression times. The size of the heap also affects compaction time, since there may be more live objects and references that need to be updated.

6. Tuning helps delay memory overflow errors. But the result of over-tuning is a rigid configuration. Before you start tuning through a trial-and-error approach, make sure you understand the load on your production environment, your application's object types, and the characteristics of your object references. Configurations that are too rigid may not be able to handle dynamic loads, so be sure to understand the consequences when setting non-dynamic values.

The next article in this series is: An in-depth discussion of the C4 (Concurrent Continuously Compacting Collector) garbage collection algorithm, so stay tuned!

(Full text ends)