Java High Concurrency Nine: Detailed explanation of lock optimization and precautions

Author：Eve Cole Update Time：2025-04-25 08:48:01

summary

This series is based on the course of refining numbers into gold, and in order to learn better, a series of records were made. This article mainly introduces: 1. Ideas and methods of lock optimization 2. Lock optimization in virtual machine 3. A case of wrong use of locks 4. ThreadLocal and its source code analysis

1. Ideas and methods of lock optimization

The level of concurrency is mentioned in the introduction to [High Concurrency Java 1].

Once a lock is used, it means that this is blocking, so the concurrency is generally a little lower than the lock-free situation.

The lock optimization mentioned here refers to how to prevent performance from becoming too poor in the case of blocking. But no matter how you optimize it, the performance will generally be a little worse than the lock-free situation.

It should be noted here that the tryLock in ReentrantLock mentioned in [High Concurrency Java V] JDK Concurrency Package 1 tends to be a lock-free method because it will not hang itself when the tryLock judges.

To summarize the ideas and methods of lock optimization, there are the following types.

Reduce lock holding time
Reduce locking particle size
Lock separation
Lock roughening
Lock elimination

1.1 Reduce lock holding time

 public synchronized void syncMethod(){ othercode1(); mutextMethod(); othercode2(); }

Like the above code, you need to get the lock before entering the method, and other threads have to wait outside.

The optimization point here is to reduce the waiting time of other threads, so it is only used to add locks on programs with thread safety requirements.

 public void syncMethod(){ othercode1(); synchronized(this) { mutextMethod(); } othercode2(); }

1.2 Reduce the locking particle size

Split large objects (this object may be accessed by many threads) into small objects, greatly increasing parallelism and reducing lock competition. Only by reducing the competition for locks and biasing towards locks, the success rate of lightweight locks will be improved.

The most typical case of reducing the lock granularity is ConcurrentHashMap. This is mentioned in [High Concurrency Java V] JDK Concurrency Package 1.

1.3 Lock separation

The most common lock separation is the read-write lock ReadWriteLock, which is separated into read-write locks and write locks according to the function. In this way, reading and reading are not mutually exclusive, reading and writing are mutually exclusive, which ensures thread safety and improves performance. For details, please check [High Concurrency Java V] JDK Concurrency Package 1.

The idea of separation of reading and writing can be extended, and as long as the operations do not affect each other, the lock can be separated.

For example, LinkedBlockingQueue

Take it out from the head and put the data from the tail. Of course it is also similar to the work theft in ForkJoinPool mentioned in [High Concurrency Java VI] JDK Concurrency Package 2.

1.4 Lock roughening

Generally speaking, in order to ensure effective concurrency between multiple threads, each thread will be required to hold the lock as short as possible, that is, the lock should be released immediately after using public resources. Only in this way can other threads waiting on this lock obtain resources to execute tasks as soon as possible. However, everything has a degree. If the same lock is constantly requested, synchronized and released, it will consume valuable resources of the system, which is not conducive to performance optimization.

For example:

 public void demoMethod(){ synchronized(lock){ //do sth. } //do other unwanted synchronization work, but can be executed quickly synchronized(lock){ //do sth. } }

In this case, according to the idea of lock roughening, it should be merged

 public void demoMethod(){ //Integrate into a lock request synchronized(lock){ //do sth. //Do other unwanted synchronization work, but it can be executed quickly} }

Of course, there is a prerequisite, the work in the middle that does not require synchronization will be executed quickly.
Let me give you another extreme example:

 for(int i=0;i<CIRCLE;i++){ synchronized(lock){ } }

Locks must be obtained in a loop. Although JDK will optimize this code internally, it is better to write it directly

 synchronized(lock){ for(int i=0;i<CIRCLE;i++){ } }

Of course, if there is a need to say that such a loop is too long and you need to give other threads not to wait too long, then you can only write it as the above. If there is no such similar requirements, it is better to write it directly into the following one.

1.5 Lock elimination

Lock elimination is a compiler level thing.

In the instant compiler, if objects that are not possible to be shared are found, the lock operation of these objects can be eliminated.

Maybe you will find it strange that since some objects cannot be accessed by multiple threads, why should I add locks? Wouldn't it be better to just not add locks when writing code.

But sometimes, these locks are not written by programmers. Some of them have locks in JDK implementations, such as classes such as Vector and StringBuffer. Many of their methods have locks. When we use methods of these classes without thread safety, when certain conditions are met, the compiler will remove the lock to improve performance.

for example:

 public static void main(String args[]) throws InterruptedException { long start = System.currentTimeMillis(); for (int i = 0; i < 2000000; i++) { createStringBuffer("JVM", "Diagnosis"); } long bufferCost = System.currentTimeMillis() - start; System.out.println("craeteStringBuffer: " + bufferCost + " ms"); } public static String createStringBuffer(String s1, String s2) { StringBuffer sb = new StringBuffer(); sb.append(s1); sb.append(s2); return sb.toString(); }

The StringBuffer.append in the above code is a synchronous operation, but the StringBuffer is a local variable, and the method does not return the StringBuffer, so it is impossible for multiple threads to access it.
Then the synchronization operation in StringBuffer is meaningless at this time.

The lock cancellation is set on the JVM parameters, of course, it needs to be in server mode:

-server -XX:+DoEscapeAnalysis -XX:+EliminateLocks

And turn on escape analysis. The function of escape analysis is to see if the variable is likely to escape from the scope.
For example, in the above StringBuffer, the return of craeteStringBuffer in the above code is a String, so this local variable StringBuffer will not be used elsewhere. If you change craeteStringBuffer to

 public static StringBuffer craeteStringBuffer(String s1, String s2) { StringBuffer sb = new StringBuffer(); sb.append(s1); sb.append(s2); return sb; }

Then after this StringBuffer is returned, it may be used anywhere else (for example, the main function will return the result and put it into map, etc.). Then the JVM escape analysis can be analyzed that this local variable StringBuffer escapes its scope.

Therefore, based on escape analysis, the JVM can judge that if the local variable StringBuffer does not escape its scope, it can be determined that the StringBuffer will not be accessed by multiple threads, and then these extra locks can be removed to improve performance.

When the JVM parameters are:

-server -XX:+DoEscapeAnalysis -XX:+EliminateLocks

Output:

craeteStringBuffer: 302 ms

The JVM parameters are:

-server -XX:+DoEscapeAnalysis -XX:-EliminateLocks

Output:

craeteStringBuffer: 660 ms

Obviously, the lock elimination effect is still very obvious.

2. Lock optimization in virtual machine

First, we need to introduce the object header. In the JVM, each object has an object header.

Mark Word, marker for object header, 32-bit (32-bit system).

Describe the hash, lock information, garbage collection tags, age

It will also save a pointer to the lock record, a pointer to the monitor, a biased lock thread ID, etc.

Simply put, the object header is to save some systematic information.

2.1 Positive lock

The so-called bias is eccentricity, that is, the lock will tend toward the thread that currently owns the lock.

In most cases, there is no competition (in most cases a synchronization block does not have multiple threads at the same time competition lock), so performance can be improved by biasing. That is, when there is no competition, when the thread that previously obtained the lock obtains the lock again, it will determine whether the lock is pointing to me, so the thread will not need to obtain the lock again and can directly enter the synchronization block.

The implementation of bias lock is to set the mark of the object header Mark as biased and write the thread ID to the object header Mark.

When other threads request the same lock, the bias mode ends

JVM enables bias locking by default -XX:+UseBiasedLocking

In fierce competition, biased locking will increase the system burden (the judgment of whether it is biased is added every time)

Example of biased lock:

 package test;import java.util.List;import java.util.Vector;public class Test { public static List<Integer> numberList = new Vector<Integer>(); public static void main(String[] args) throws InterruptedException { long begin = System.currentTimeMillis(); int count = 0; int startnum = 0; while (count < 10000000) { numberList.add(startnum); startnum += 2; count++; } long end = System.currentTimeMillis(); System.out.println(end - begin); }}

Vector is a thread-safe class that uses locking mechanism internally. Each time add, a lock request will be made. The above code only has one thread main and then repeatedly adds the lock request.
Use the following JVM parameters to set the bias lock:

-XX:+UseBiasedLocking -XX:BiasedLockingStartupDelay=0
BiasedLockingStartupDelay means that the bias lock is enabled after the system starts for a few seconds. The default is 4 seconds, because when the system starts up, the general data competition is relatively fierce. Enabled bias locks at this time will reduce performance.
Since here, in order to test the performance of the bias lock, the delay bias lock time is set to 0.

At this time the output is 9209

Turn off the bias lock below:

-XX:-UseBiasedLocking

Output is 9627

Generally, when there is no competition, the performance of enabling bias locks will be improved by about 5%.

2.2 Lightweight lock

Java's multi-threaded safety is implemented based on the Lock mechanism, and the performance of Lock is often not satisfactory.

The reason is that monitorenter and monitorexit, two bytecode primitives that control multithread synchronization, are implemented by JVM rely on mutex for operating system.

Mutex is a relatively resource-consuming operation that causes the thread to hang and needs to be rescheduled back to the original thread in a short period of time.

In order to optimize Java's Lock mechanism, the concept of lightweight lock has been introduced since Java6.

Lightweight locking is intended to reduce the chance of multi-threading entering mutex, not to replace mutex.

It uses the CPU primitive Compare-And-Swap (CAS, assembly instruction CMPXCHG), and tries to remedy before entering the mutex.

If the bias lock fails, the system will perform a lightweight lock operation. The purpose of its existence is to avoid utilizing mutex at the operating system level as much as possible, because that performance will be relatively poor. Because JVM itself is an application, I hope to solve the thread synchronization problem at the application level.

To sum up, lightweight lock is a fast locking method. Before entering mutex, use CAS operations to try to add locks. Try not to use mutex at the operating system level to improve performance.

Then when the bias lock fails, the steps of the lightweight lock:

1. Save the Mark pointer of the object header into the locked object (the object here refers to the locked object, such as synchronized (this){}, this is the object here).

lock->set_displaced_header(mark);

2. Set the object header as a pointer to the lock (in thread stack space).

 if (mark == (markOop) Atomic::cmpxchg_ptr(lock, obj()->mark_addr(),mark)) { TEVENT (slow_enter: release stacklock) ; return ; }

The lock is located in the thread stack. Therefore, to determine whether a thread holds this lock, just determine whether the space pointed to by the object header is in the address space of the thread stack.
If the lightweight lock fails, it means there is competition and upgrade to a heavyweight lock (regular lock), which is the synchronization method at the operating system level. In the absence of lock competition, lightweight locks reduce the performance loss caused by traditional locks using OS mutexes. When competition is very fierce (lightweight locks always fail), lightweight locks do a lot of extra operations, resulting in performance degradation.

2.3 Spin lock

When competition exists, because the lightweight lock attempt fails, it may be directly upgraded to a heavyweight lock to use the operating system level mutual exclusion. It is also possible to try the spin lock again.

If the thread can obtain the lock quickly, then you can not hang the thread at the OS layer, let the thread do a few empty operations (spin), and constantly try to get the lock (similar to tryLock). Of course, the number of loops is limited. When the number of loops reaches, it will still be upgraded to a heavyweight lock. Therefore, when each thread has little time to hold the lock, spin lock can try to avoid threads being suspended at the OS layer.

JDK1.6 -XX:+UseSpinning is enabled

In JDK1.7, remove this parameter and change it to a built-in implementation.

If the synchronization block is very long and the spin fails, the system performance will be degraded. If the synchronization block is very short and the spin is successful, it saves thread suspension switching time and improves system performance.

2.4 Positive lock, lightweight lock, spin lock summary

The above lock is not a Java language-level lock optimization method, but is built into the JVM.

First of all, biasing locks is to avoid the performance consumption of a thread when it repeatedly acquires/releases the same lock. If the same thread still acquires this lock, it will directly enter the synchronization block when trying to bias the locks, and there is no need to obtain the lock again.

Lightweight locks and spin locks are both intended to avoid direct calls to mutex operations at the operating system level, because suspending threads is a very resource-consuming operation.

In order to avoid using heavyweight locks (mutex at the operating system level), we will first try a lightweight lock. The lightweight lock will try to use CAS operation to obtain the lock. If the lightweight lock fails to obtain, it means there is competition. But maybe you will get the lock soon, and you will try spin locks, do a few empty loops on the thread, and try to get the locks every time you loop. If the spin lock also fails, it can only be upgraded to a heavyweight lock.

It can be seen that biased locks, lightweight locks, and spin locks are all optimistic locks.

3. A case of using locks incorrectly

 public class IntegerLock { static Integer i = 0; public static class AddThread extends Thread { public void run() { for (int k = 0; k < 100000; k++) { synchronized (i) { i++; } } } } public static void main(String[] args) throws InterruptedException { AddThread t1 = new AddThread(); AddThread t2 = new AddThread(); t1.start(); t2.start(); t1.join(); t2.join(); System.out.println(i); }}

A very basic mistake is that in the [High Concurrency Java VII] concurrency design pattern, Interger is final unchanged, and after each ++, a new Interger will be generated and assigned to i, so the locks competed for between the two threads are different. So it's not thread-safe.

4. ThreadLocal and its source code analysis

It may be a bit inappropriate to mention ThreadLocal here, but ThreadLocal is a way to replace locks. So it is still necessary to mention it.

The basic idea is that in a multi-thread, data conflicting needs to be locked. If ThreadLocal is used, an object instance is provided for each thread. Different threads only access their own objects, not other objects. This way there is no need for the lock to exist.

 package test;import java.text.ParseException;import java.text.SimpleDateFormat;import java.util.Date;import java.util.concurrent.ExecutorService;import java.util.concurrent.Executors;public class Test { private static final SimpleDateFormat sdf = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss"); public static class ParseDate implements Runnable { int i = 0; public ParseDate(int i) { this.i = i; } public void run() { try { Date t = sdf.parse("2016-02-16 17:00:" + i % 60); System.out.println(i + ":" + t); } catch (ParseException e) { e.printStackTrace(); } } } public static void main(String[] args) { ExecutorService es = Executors.newFixedThreadPool(10); for (int i = 0; i < 1000; i++) { es.execute(new ParseDate(i)); } }}

Since SimpleDateFormat is not thread-safe, the above code is used incorrectly. The easiest way is to define a class yourself and wrap it with synchronized (similar to Collections.synchronizedMap). This will cause problems when doing it at high concurrency. The contention on synchronized results in only one thread entering at a time, and the concurrency volume is very low.

This problem is solved by using ThreadLocal to encapsulate SimpleDateFormat.

 package test;import java.text.ParseException;import java.text.SimpleDateFormat;import java.util.Date;import java.util.concurrent.ExecutorService;import java.util.concurrent.Executors;public class Test { static ThreadLocal<SimpleDateFormat> tl = new ThreadLocal<SimpleDateFormat>(); public static class ParseDate implements Runnable { int i = 0; public ParseDate(int i) { this.i = i; } public void run() { try { if (tl.get() == null) { tl.set(new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")); } Date t = tl.get().parse("2016-02-16 17:00:" + i % 60); System.out.println(i + ":" + t); } catch (ParseException e) { e.printStackTrace(); } } } public static void main(String[] args) { ExecutorService es = Executors.newFixedThreadPool(10); for (int i = 0; i < 1000; i++) { es.execute(new ParseDate(i)); } }}

When each thread is running, it will determine whether the current thread has a SimpleDateFormat object.

if (tl.get() == null)

If not, new SimpleDateFormat will be bound to the current thread

tl.set(new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"));

Then use the SimpleDateFormat of the current thread to parse

tl.get().parse("2016-02-16 17:00:" + i % 60);

In the initial code, there was only one SimpleDateFormat, which used ThreadLocal, and a SimpleDateFormat was new for each thread.

It should be noted that you should not set a public SimpleDateFormat to each ThreadLocal here, as this is useless. Each need to be given new to a SimpleDateFormat.

In hibernate, there are typical applications for ThreadLocal.

Let's take a look at the source code implementation of ThreadLocal

First of all, there is a member variable in the Thread class:

ThreadLocal.ThreadLocalMap threadLocals = null;

And this Map is the key to the implementation of ThreadLocal

 public void set(T value) { Thread t = Thread.currentThread(); ThreadLocalMap map = getMap(t); if (map != null) map.set(this, value); else createMap(t, value); }

According to ThreadLocal, you can set and get the corresponding value.

The ThreadLocalMap implementation here is similar to HashMap, but there are differences in handling hash conflicts.

When a hash conflict occurs in ThreadLocalMap, it is not like HashMap to use linked lists to resolve the conflict, but to put the index ++ at the next index to resolve the conflict.