You must understand volatile in Java development

Author：Eve Cole Update Time：2025-08-13 01:16:05

Preface

The previous article talked about the CAS principle, which mentioned the Atomic* class. The mechanism for implementing atomic operations relies on the memory visibility characteristics of volatile. If you don't know CAS and Atomic* yet, it is recommended to take a look at what CAS spin lock we are talking about.

Three characteristics of concurrency

First of all, if we want to use volatile, it must be in a multi-threaded concurrency environment. There are three important characteristics in the concurrent scenario we often talk about: atomicity, visibility, and orderliness. Only when these three characteristics are met can the concurrent program be executed correctly, otherwise various problems will arise.

Atomicity, the CAS and Atomic* classes mentioned in the previous article can ensure the atomicity of simple operations. For some responsible operations, it can be implemented using synchronized or various locks.

Visibility refers to when multiple threads access the same variable, one thread modifies the value of the variable, and other threads can immediately see the modified value.

Ordering, the order of program execution is executed in the order of code, and instructions are prohibited from being reordered. It seems natural that this is not the case. Instruction reordering is the JVM to optimize instructions and improve program operation efficiency, and to improve parallelism as much as possible without affecting the execution results of a single-threaded program. However, in a multi-threaded environment, the order of some codes may cause logical incorrectness.

volatile implements two features, visibility and order. Therefore, in a multi-threaded environment, it is necessary to ensure the function of these two features, and the volatile keyword can be used.

How volatile guarantees visibility

When it comes to visibility, you need to understand the computer's processor and main memory. Because of multi-threading, no matter how many threads there are, it will eventually be carried out in a computer processor. Today's computers are basically multi-core, and some machines even have multi-processors. Let's take a look at the structure diagram of a multiprocessor:

This is a CPU with two processors, a quad-core. A processor corresponds to a physical slot, and multiple processors are connected through a QPI bus. A processor consists of multiple cores, and a multi-core shared L3 Cache between processors. A core contains registers, L1 Cache, L2 Cache.

During the execution of the program, the reading and writing of data must be involved. We all know that although the memory access speed is already very fast, it is still far inferior to the speed of CPU executing instructions. Therefore, in the kernel, L1, L2, and L3 level three caches are added. In this way, when the program is running, the required data is first copied from the main memory to the cache of the core, and after the operation is completed, it is then written to the main memory. The following figure is a schematic diagram of CPU accessing data, from registers to cache to main memory and even hard disks, the speed is getting slower and slower.

After understanding the CPU structure, let’s take a look at the specific process of program execution and take a simple self-increment operation as an example.

i=i+1;

When executing this statement, a thread running on a core copies the value of i to the cache where the core is located. After the operation is completed, it will be written back to the main memory. In a multi-threaded environment, each thread will have a corresponding working memory in the cache area on the running core, that is, each thread has its own private working cache area to store the replica data required for the operation. Then, let’s look at the problem of i+1. Assuming that the initial value of i is 0, there are two threads that execute this statement at the same time, and each thread needs three steps to execute:

1. Read the i value from the main memory to the thread working memory, that is, the corresponding kernel cache area;

2. Calculate the value of i+1;

3. Write the result value back to main memory;

After the two threads are executed 10,000 times each, the expected value should be 20,000. Unfortunately, the value of i is always less than 20,000. One of the reasons for this problem is the cache consistency problem. For this example, once a cache copy of a thread is modified, the cache copy of other threads should be invalidated immediately.

After using the volatile keyword, the following effects will be:

1. Every time the variable is modified, the processor cache (working memory) will be written back to the main memory;

2. Writing back to the main memory of a working memory will cause the processor cache (working memory) of other threads to be invalid.

Because volatile ensures memory visibility, it actually uses the MESI protocol that ensures cache consistency by CPU. There are many contents of the MESI protocol, so I won’t explain it here. Please check it out yourself. In short, the volatile keyword is used. When a thread's modification to the volatile variable will be written back to the main memory immediately, causing the cache line of other threads to be invalidated, and other threads are forced to use the variable again, it needs to be read from the main memory.

Then we modify the above i variable with volatile and execute it again, each thread will execute 10,000 times. Unfortunately, it is still less than 20,000. Why is this?

volatile utilises the CPU's MESI protocol to ensure visibility. However, note that volatile does not guarantee the atomicity of the operation, because this self-increment operation is divided into three steps. Suppose Thread 1 reads the i value from the main memory, assuming it is 10, and a blockage occurs at this time, but i has not been modified yet. At this time, Thread 2 also reads the i value from the main memory. At this time, the i value read by these two threads is the same, both 10, and then Thread 2 adds 1 to i and writes it back to the main memory immediately. At this time, according to the MESI protocol, the cache line corresponding to the working memory of thread 1 will be set to an invalid state, yes. However, please note that thread 1 has already copied the i value from main memory, and now it only takes the operation of adding 1 and writing back to main memory. Both threads add 1 on the basis of 10 and then write back to the main memory, so the final value of the main memory is only 11, not the expected 12.

Therefore, using volatile can ensure memory visibility, but it cannot guarantee atomicity. If atomicity is still needed, you can refer to this previous article.

How volatile ensures order

The Java memory model has some innate "orderline", that is, it can be guaranteed without any means. This is usually called the happens-before principle. If the execution order of two operations cannot be derived from the happens-before principle, then they cannot guarantee their orderliness and virtual machines can reorder them at will.

The following are 8 principles of happens-before, excerpted from "In-depth Understanding of Java Virtual Machines".

Program order rules: In a thread, in the order of code, the operations written in the previous ones occur first in the subsequent ones;
Locking rules: an unLock operation occurs first in the lock operation facing the same lock later;
volatile variable rules: The write operation to a variable occurs first and then the read operation facing the variable;
Transfer rules: If operation A occurs first in operation B, and operation B occurs first in operation C, it can be concluded that operation A occurs first in operation C;
Thread start rule: The start() method of the Thread object occurs first in each action of this thread;
Thread interrupt rule: The call to the thread interrupt() method occurs first when the code of the interrupted thread detects the occurrence of an interrupt event;
Thread termination rules: All operations in the thread occur first in the thread's termination detection. We can detect that the thread has terminated execution through the Thread.join() method ends and the return value of Thread.isAlive();
Object Ending Rules: The initialization completion of an object occurs first at the beginning of its finalize() method;

Here we will mainly talk about the rules of the volatile keyword, and give an example of double checking in the famous singleton pattern:

 class Singleton{ private volatile static Singleton instance = null; private Singleton() { } public static Singleton getInstance() { if(instance==null) { // step 1 synchronized (Singleton.class) { if(instance==null) // step 2 instance = new Singleton(); // step 3 } } return instance; } }

If the instance is not modified with volatile, what results may be produced? Suppose there are two threads calling the getInstance() method. Thread 1 executes step1 and finds that the instance is null, and then locks the Singleton class synchronously. Then determines whether the instance is null again, and finds that it is still null, and then executes step 3 and starts instantiating Singleton. During the instantiation process, thread 2 goes to step 1 and may find that the instance is not empty, but at this time, the instance may not be fully initialized.

What does it mean? The object is initialized in three steps, and is represented by the following pseudo-code:

 memory = allocate(); //1. Allocate the memory space of the object ctorInstance(memory); //2. Initialize the object instance = memory; //3. Set the memory space of the object pointing to the object

Because Step 2 and Step 3 need to depend on Step 1, and Step 2 and Step 3 do not have a dependency, it is possible that these two statements will undergo instruction re-arrangement, that is, or it is possible that Step 3 will be executed before Step 2. In this case, step 3 is executed, but step 2 has not been executed yet, that is, the instance instance has not been initialized yet. Just now, thread 2 judges that the instance is not null, so it directly returns the instance instance. However, at this time, instance is actually an incomplete object, so there will be problems when using it.

Using the volatile keyword means using the principle of "writing a variable modified by volatile, happens-before to read the variable at any subsequent time" corresponds to the initialization process above. Steps 2 and 3 are both writing instances, so they must occur later when reading instances, that is, there will be no possibility of returning an instance that is not completely initialized.

The underlying JVM is done through something called a "memory barrier". Memory barrier, also known as memory fence, is a set of processor instructions used to implement sequential restrictions on memory operations.

at last

Through the volatile keyword, we have learned about the visibility and orderliness in concurrent programming, which is of course just a simple understanding. For a deeper understanding, you have to rely on your classmates to study it yourself.

Related Articles

What are the CAS spin locks we are talking about

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support to Wulin.com.