Java Concurrent Programming Learning Notes: Synchronized Underlying Optimization

Author：Eve Cole Update Time：2025-03-29 10:16:01

1. Heavyweight lock

In the previous article, we introduced the usage of Synchronized and its implementation principles. Now we should know that Synchronized is implemented through a monitor lock inside the object. However, the monitor lock is essentially implemented by relying on the underlying operating system's Mutex Lock. The operating system needs to switch between threads from user state to core state. This is very costly, and the conversion between states takes a relatively long time. This is why Synchronized is inefficient. Therefore, we call this lock that relies on the implementation of the operating system Mutex Lock "a heavyweight lock". The core of the various optimizations made to Synchronized in JDK are to reduce the use of this heavyweight lock. After JDK1.6, in order to reduce the performance consumption caused by obtaining and releasing locks and improve performance, "lightweight locks" and "biased locks" were introduced.

2. Lightweight lock

There are four types of lock states: lock-free state, biased lock, lightweight lock and heavyweight lock. With the competition of locks, locks can be upgraded from biased locks to lightweight locks, and then upgraded heavyweight locks (but the upgrade of locks is one-way, which means that they can only upgrade from low to high, and there will be no degradation of locks). In JDK 1.6, bias lock and lightweight lock are enabled by default. We can also disable bias lock by -XX:-UseBiasedLocking. The lock state is saved in the object's header file, taking the 32-bit JDK as an example:

Lock status	25 bits			4bit	1bit	2bit
	23bit		2bit		Is it biased lock?	Lock flag bit
Lightweight lock	Pointer to lock records in the stack					00
Heavyweight lock	Pointer to mutex (heavyweight lock)					10
GC tags	null					11
Positive lock	Thread ID	Epoch		The subjects are of age	1	01
Lockless	hashCode of object			The subjects are of age	0	01

"Lightweight" is relative to traditional locks that use operating system mutexes. However, it is important to emphasize first that lightweight locks are not used to replace heavyweight locks. Its original intention is to reduce the performance consumption generated by the use of traditional heavyweight locks without multi-thread competition. Before explaining the execution process of lightweight locks, we first understand that the scenarios adapted to lightweight locks are the case where threads alternately execute synchronous blocks. If the same lock is accessed at the same time, the lightweight lock will expand into a heavyweight lock.

1. The locking process of lightweight lock

(1) When the code enters the synchronization block, if the lock state of the synchronization object is lock-free (the lock flag is "01" state, whether it is biased lock is "0"), the virtual machine will first create a space called Lock Record in the stack frame of the current thread to store the current copy of the lock object's current Mark Word, which is officially called Displaced Mark Word. At this time, the state of the thread stack and object header is shown in Figure 2.1.

(2) Copy the Mark Word in the object header and copy it to the lock record.

(3) After the copy is successful, the virtual machine will use CAS operations to try to update the object's Mark Word to a pointer to Lock Record, and point the owner pointer in the Lock record to object mark word. If the update is successful, then execute step (3), otherwise execute step (4).

(4) If this update action is successful, then the thread has the lock of the object, and the lock flag of the object Mark Word is set to "00", which means that the object is in a lightweight locked state. At this time, the state of the thread stack and the object head is shown in Figure 2.2.

(5) If this update operation fails, the virtual machine will first check whether the object's Mark Word points to the stack frame of the current thread. If so, it means that the current thread already has the lock of the object, and then it can directly enter the synchronization block to continue execution. Otherwise, multiple threads compete for locks, and the lightweight lock will expand into a heavyweight lock, and the state value of the lock flag will become "10". The pointer to the heavyweight lock (mutex) is stored in Mark Word, and the thread waiting for the lock will also enter the blocking state. The current thread tries to use spin to acquire the lock. The spin is to avoid blocking the thread and use a loop to acquire the lock.

Figure 2.1 The state of stack and object before the lightweight lock CAS operation

Figure 2.2 The state of stack and object after lightweight lock CAS operation

2. Unlocking process of lightweight lock:

(1) Try to replace the Displaced Mark Word object copied in the thread through CAS operation.

(2) If the replacement is successful, the entire synchronization process will be completed.

(3) If the replacement fails, it means that other threads have tried to acquire the lock (the lock has been expanded at this time), then the suspended thread must be awakened while releasing the lock.

3. Positive lock

The introduction of bias lock is to minimize unnecessary lightweight lock execution paths without multi-thread competition, because the acquisition and release of lightweight locks depend on multiple CAS atomic instructions, while bias locks only need to rely on one CAS atomic instructions when replacing ThreadID (because the bias lock must be cancelled once a multi-thread competition occurs, so the performance loss of the cancel operation of bias lock must be less than the saved performance consumption of CAS atomic instructions). As mentioned above, lightweight locks are used to improve performance when threads alternately execute synchronous blocks, while biased locks are used to further improve performance when only one thread executes synchronous blocks.

1. The biased lock acquisition process:

(1) Access whether the flag of the bias lock in Mark Word is set to 1, and whether the lock flag is 01 - confirm that it is a biasable state.

(2) If it is a biasable state, test whether the thread ID points to the current thread. If it is, enter step (5), otherwise enter step (3).

(3) If the thread ID does not point to the current thread, then the lock will be competed through CAS operation. If the competition is successful, set the thread ID in Mark Word to the current thread ID and execute (5); if the competition fails, execute (4).

(4) If CAS fails to acquire bias lock, it means there is competition. When the global safepoint is reached, the thread that obtains the bias lock is suspended. The bias lock is upgraded to a lightweight lock, and the thread blocked at the safepoint continues to execute the synchronization code.

(5) Execute the synchronization code.

2. Release of biased lock:

The revocation of biased lock is mentioned in the fourth step above. The biased lock will only release the lock when other threads try to compete for the biased lock, and the thread will not actively release the biased lock. The cancellation of the biased lock requires waiting for the global security point (no bytecode is being executed at this point in time). It will first pause the thread with the biased lock, determine whether the lock object is in a locked state, and then return to the unlocked (flag bit is "01") or lightweight lock (flag bit is "00") after undoing the biased lock.

3. Conversion between heavyweight lock, lightweight lock and bias lock

Figure 2.3 The conversion diagram of the three

This picture is mainly a summary of the above content. If you have a good understanding of the above content, the picture should be easy to understand.

4. Other optimizations

1. Adaptive Spinning: From the process of obtaining lightweight locks, we know that when a thread fails to perform CAS operation during the acquisition of lightweight locks, it is necessary to obtain the heavyweight lock through spin. The problem is that spin requires CPU consumption. If the lock cannot be obtained, the thread will be in a spin state and waste CPU resources in vain. The easiest way to solve this problem is to specify the number of spins, for example, let it cycle 10 times, and enter a blocking state if the lock is not obtained. But JDK adopts a smarter approach - adaptive spin. Simply put, if the thread succeeds, the number of spins will be more next time, and if the spin fails, the number of spins will be reduced.

2. Lock Coarsening: The concept of lock coarseness should be easier to understand, which is to merge the lock and unlock operations connected together multiple times into one time, expanding multiple continuous locks into a lock with a larger range. For example:

 package com.paddx.test.string;public class StringBufferTest { StringBuffer stringBuffer = new StringBuffer(); public void append(){ stringBuffer.append("a"); stringBuffer.append("b"); stringBuffer.append("c"); }}

Here, every time you call the stringBuffer.append method, locking and unlocking is required. If the virtual machine detects a series of locking and unlocking operations on the same object, it will merge it into a larger range of locking and unlocking operations, that is, locking is performed on the first append method, and unlocking is performed after the last append method is completed.

3. Lock Elimination: Lock Elimination means removing unnecessary locking operations. According to the code escape technique, if it is determined that a piece of code and the data on the heap will not escape from the current thread, then it can be considered that this piece of code is thread-safe and there is no need to lock it. Look at the following program:

 package com.paddx.test.concurrent;public class SynchronizedTest02 { public static void main(String[] args) { SynchronizedTest02 test02 = new SynchronizedTest02(); //Start the warm-up for (int i = 0; i < 10000; i++) { i++; } long start = System.currentTimeMillis(); for (int i = 0; i < 100000000; i++) { test02.append("abc", "def"); } System.out.println("Time=" + (System.currentTimeMillis() - start)); } public void append(String str1, String str2) { StringBuffer sb = new StringBuffer(); sb.append(str1).append(str2); }}

Although the append of StringBuffer is a synchronous method, the StringBuffer in this program belongs to a local variable and will not escape from the method. Therefore, this process is actually thread-safe and can eliminate the lock. Here is the result of my local execution:

To minimize the impact of other factors, bias lock (-XX:-UseBiasedLocking) is disabled here. Through the above program, it can be seen that the performance has been greatly improved after the lock is eliminated.

Note: The execution results may be different between different versions of JDK. The JDK version I use here is 1.6.

5. Summary

This article focuses on the optimization of Synchronized in JDk, such as lightweight locks and biased locks, but these two locks are not completely without shortcomings. For example, when competition is fierce, it will not only fail to improve efficiency, but will reduce efficiency, because there is an additional lock upgrade process. At this time, it is necessary to disable biased locks through -XX:-UseBiasedLocking. Here is a comparison of these locks:

Lock	advantage	shortcoming	Applicable scenarios
Positive lock	Locking and unlocking does not require additional consumption, and there is a nanosecond gap compared to performing an asynchronous method.	If there is lock competition between threads, it will cause additional lock revocation consumption.	Suitable for scenarios where only one thread accesses the synchronous block.
Lightweight lock	Competing threads will not block, which improves the response speed of the program.	If the thread that never gets the lock competition will consume the CPU.	Pursuing response time. Synchronous block execution is very fast.
Heavyweight lock	Thread competition does not use spin and does not consume CPU.	Thread blocking, slow response time.	Pursuing throughput. The synchronization block execution speed is relatively long.