I remember when I first started learning Java, synchronized was encountered when I encountered multi-threading situation. Compared to us at that time, synchronized was so magical and powerful. At that time, we gave it a name "synchronization", which also became a good medicine for us to solve multi-threading situations. But as we learn, we know that synchronized is a heavyweight lock, and relative to the Lock, it will appear so bulky that we think it is not so efficient and slowly abandon it.
Admittedly, with the various optimizations made by Javas SE 1.6 on synchronized, synchronized won't appear that heavy. Below, follow LZ to explore the implementation mechanism of synchronized, how Java optimizes it, lock optimization mechanism, lock storage structure and upgrade process;
Implementation principle
Synchronized can ensure that the method or code block is running, and only one method can enter the critical area at the same time. At the same time, it can also ensure the memory visibility of shared variables.
Every object in Java can be used as a lock, which is the basis for synchronized to implement synchronization:
Common synchronization method, lock is the static synchronization method of the current instance object, lock is the current class object synchronization method block, lock is the object in parentheses When a thread accesses the synchronization code block, it first needs to get the lock to execute the synchronization code. When an exception is exited or thrown, the lock must be released. So how does it implement this mechanism? Let's first look at a simple code:
public class SynchronizedTest {public synchronized void test1(){}public void test2(){synchronized (this){}}}Use Javap tool to view the generated class file information to analyze the implementation of Synchronize
As can be seen from the above, the synchronization code block is implemented using monitorenter and monitorexit instructions. The synchronization method (it doesn't show that you need to look at the underlying JVM implementation) relies on the ACC_SYNCHRONIZED implementation on the method modifier.
Synchronous code block: The monitorenter instruction is inserted into the start position of the synchronization code block, and the monitorexit instruction is inserted into the end position of the synchronization code block. The JVM needs to ensure that each monitorenter has a monitorexit corresponding to it. Any object has a monitor associated with it, and when a monitor is held, it will be locked. When a thread executes the monitorenter instruction, it will try to obtain the monitor ownership corresponding to the object, that is, try to obtain the object's lock;
Synchronized method: The synchronized method will be translated into ordinary method calls and return instructions such as: invokevirtual and areturn instructions. There is no special instruction at the VM bytecode level to implement the synchronized modified method. Instead, the synchronized flag position 1 in the method access_flags field of the method is placed in the method table of the Class file, indicating that the method is a synchronized method and uses the object that calls the method or the Class belonging to the method to represent Klass as a lock object (refer to: http://www.VeVB.COM/article/129245.htm)
Let's continue to analyze it, but before we go deeper, we need to understand two important concepts: Java object header and Monitor.
Java object header, monitor
Java object headers and monitors are the basis for implementing synchronized! The following is a detailed introduction to these two concepts.
Java object header
The lock used for synchronized is located in the Java object header, so what is the Java object header? The object header of the Hotspot virtual machine mainly includes two parts of data: Mark Word (mark field) and Klass Pointer (type pointer). Klass Point is a pointer to the object's class metadata. The virtual machine uses this pointer to determine which class instance the object is. Mark Word is used to store the object's own runtime data. It is the key to implementing lightweight locks and biased locks, so the following will focus on it.
Mark Word.
Mark Word is used to store the runtime data of the object itself, such as hash code (HashCode), GC generation age, lock status flag, locks held by threads, biased thread ID, biased timestamp, etc. Java object headers generally occupy two machine codes (in a 32-bit virtual machine, 1 machine code is equal to 4 bytes, which is 32bit), but if the object is an array type, three machine codes are needed, because the JVM virtual machine can determine the size of the Java object through the metadata information of the Java object, but cannot confirm the size of the array from the metadata of the array, so a piece is used to record the length of the array. The following figure is the storage structure of the Java object header (32-bit virtual machine):
Object header information is an additional storage cost that is independent of the data defined by the object itself. However, considering the spatial efficiency of the virtual machine, Mark Word is designed as a non-fixed data structure to store as much data as possible in a very small space memory. It will reuse its own storage space according to the state of the object. That is to say, Mark Word will change with the operation of the program, and the change state is as follows (32-bit virtual machine):
A brief introduction to Java object headers, let's take a look at Monitor.
Monitor
What is Monitor? We can understand it as a synchronization tool or a synchronization mechanism, which is usually described as an object.
Just like everything is an object, all Java objects are natural monitors, and each Java object has the potential to become a monitor, because in Java design, each Java object carries an invisible lock since it comes out of the womb. It is called an internal lock or monitor lock.
Monitor is a data structure privately owned by threads. Each thread has a list of available monitor records and a global available list. Each locked object will be associated with a monitor (LockWord in the MarkWord of the object header points to the monitor's start address). At the same time, there is an Owner field in the monitor that stores the unique identifier of the thread that owns the lock, indicating that the lock is occupied by the thread. Its structure is as follows:
Owner: NULL at the beginning means that no thread currently owns the monitorrecord. When the thread successfully owns the lock, it saves the thread's unique identity, and when the lock is released, it is set to NULL;
EntryQ: Associate a system mutex (semaphore) to block all threads that attempt to lock monitorrecord.
RcThis: represents the number of all threads blocked or waiting on the monitorrecord.
Nest: Used to implement counting of reentry locks.
HashCode: Saves the HashCode value copied from the object header (may also contain GCage).
Candidate: Used to avoid unnecessary blocking or waiting for threads to wake up, because only one thread can successfully own the lock each time. If the previous thread that releases the lock wakes up all blocking or waiting threads each time, it will cause unnecessary context switching (from blocking to ready and then blocking again because of the competition lock failure) and thus lead to severe performance degradation. Candidate has only two possible values: 0 means there is no thread that needs to be woken up to 1 means to wake up a successor thread to compete for the lock.
Reference: Talk about Synchronized in Java concurrency
We know that synchronized is a heavyweight lock and its efficiency is not very good. At the same time, this concept has always been in our minds. However, the implementation of synchronize has been optimized in jdk1.6, making it not that heavy. So what optimization methods does the JVM use?
Lock optimization
jdk1.6 introduces a lot of optimizations to the implementation of locks, such as spin locks, adaptive spin locks, lock elimination, lock coarseness, bias locks, lightweight locks and other technologies to reduce the overhead of lock operation.
There are four main states of locks, namely: lock-free state, biased lock state, lightweight lock state, and heavyweight lock state. They will gradually upgrade with the fierce competition. Note that locks can be upgraded and cannot be downgraded. This strategy is to improve the efficiency of obtaining and releasing locks.
Spin lock
Thread blocking and wake-up requires the CPU to change from user state to core state. Frequent blocking and wake-up are a heavy task for the CPU, and it will inevitably put a lot of pressure on the system's concurrent performance. At the same time, we found that in many applications, the lock state of the object lock will only last for a very short period of time. It is very unworthy to frequently block and wake up the thread for this very short period of time. So a spin lock is introduced.
What is spin lock?
The so-called spin lock is to let the thread wait for a period of time and not be suspended immediately to see if the thread holding the lock will release the lock quickly. How to wait? Just perform a meaningless cycle (spin).
Spin waiting cannot replace blocking, let’s talk about the requirements for the number of processors (multi-core, it seems that there is no single-core processor now). Although it can avoid the overhead caused by thread switching, it takes up the processor time. If the thread holding the lock releases the lock quickly, the spin efficiency will be very good. On the contrary, the spin thread will consume the processing resources in vain. It will not do any meaningful work. It typically occupies the pit and does not shit, which will instead lead to performance waste. Therefore, the time for spin waiting (number of spins) must have a limit. If the spin exceeds the defined time and still has not obtained the lock, it should be suspended.
Spinlock is introduced in JDK1.4.2 and is turned off by default, but it can be turned on with -XX:+UseSpinning, and is turned on by default in JDK1.6. The default number of spins at the same time is 10 times, which can be adjusted by parameter -XX:PreBlockSpin;
If you adjust the spin number of spins of the spin lock through the parameter -XX:preBlockSpin, it will cause many inconvenience. If I adjust the parameter to 10, but many threads in the system release the lock when you just exit (if you spin once or twice, you can get the lock), would you be embarrassed? Therefore, JDK1.6 introduced adaptive spin locks, making virtual machines smarter and smarter.
Adapt spin lock
JDK1.6 introduces a smarter spin lock, namely adaptive spin lock. The so-called adaptation means that the number of spins is no longer fixed, it is determined by the spin time on the same lock and the state of the lock owner. How to do it? If the thread spins successfully, the number of spins will be more next time, because the virtual machine believes that since the last time it was successful, the spin is likely to be successful again, and it will allow the spin to wait for more times. On the contrary, if few spins can be successful for a certain lock, the number of spins will be reduced or even omitted when the lock is required in the future, so as not to waste processor resources.
With adaptive spin locks, as program operation and performance monitoring information continues to improve, the virtual machine's prediction of program lock status will become more and more accurate, and virtual machines will become smarter.
Lock elimination
In order to ensure the integrity of the data, we need to synchronize this part of the operation when performing operations, but in some cases, the JVM detects that there is no possibility of shared data competition, which is why the JVM will lock these synchronization locks. The basis for lock elimination is data support for escape analysis.
If there is no competition, why do you still need to add a lock? So lock elimination can save time meaninglessly requesting locks. Whether the variable escapes is necessary for virtual machines to determine whether data flow analysis is used, but is it still unclear to us programmers? Will we add synchronization before blocks of code that clearly know there is no data competition? But sometimes the program is not what we think? Although we do not display the lock, when we use some JDK built-in APIs, such as StringBuffer, Vector, HashTable, etc., there will be an invisible locking operation at this time. For example, StringBuffer's append() method and Vector's add() method:
public void vectorTest(){ Vector<String> vector = new Vector<String>(); for(int i = 0 ; i < 10 ; i++){ vector.add(i + ""); } System.out.println(vector); }When running this code, the JVM can clearly detect that the variable vector does not escape from the method vectorTest(), so the JVM can boldly eliminate the lock operation inside the vector.
Lock roughening
We know that when using a synchronization lock, the scope of the synchronization block needs to be as small as possible - only synchronizes in the actual scope of shared data. The purpose of this is to minimize the number of operations that need to be synchronized as much as possible. If there is lock competition, the thread waiting for the lock can also get the lock as soon as possible.
In most cases, the above view is correct, and LZ has always adhered to this view. However, if a series of continuous locking and unlocking operations may lead to unnecessary performance losses, so the concept of locking slutty is introduced.
The concept of lock slander is easier to understand, which is to connect multiple continuous locking and unlocking operations together and expand into a lock with a larger range. As shown in the above example: the vector needs to add a lock operation every time it adds. The JVM detects that the same object (vector) is continuously locked and unlocked, and will merge a larger range of locked and unlocked operations, that is, the locked and unlocked operation will be moved outside the for loop.
Lightweight lock
The main purpose of introducing lightweight locks is to reduce the performance consumption caused by the use of operating system mutexes under the premise of multiple competitions without multiple threads. When the bias lock function is turned off or multiple threads compete for bias locks, the bias lock will be upgraded to a lightweight lock, then the lightweight lock will be attempted, and the steps are as follows:
Get the lock
Determine whether the current object is in a lock-free state (hashcode, 0, 01). If so, the JVM will first create a space called Lock Record in the stack frame of the current thread to store the current Mark Word copy of the lock object (the official adds a Displaced prefix to this copy, namely Displaced Mark Word); otherwise, execute step (3);
The JVM uses CAS operation to try to update the object's Mark Word to a correction pointing to Lock Record. If it successfully indicates that the lock is contested, the lock flag will be changed to 00 (indicating that the object is in a lightweight lock state), and performs a synchronization operation; if it fails, perform step (3);
Determine whether the Mark Word of the current object points to the stack frame of the current thread. If so, it means that the current thread has already held the lock of the current object, and then directly executes the synchronous code block; otherwise, it can only mean that the lock object has been seized by other threads. At this time, the lightweight lock needs to be expanded into a heavyweight lock, the lock flag bit becomes 10, and the thread waiting later will enter the blocking state;
Release the lock
The release of lightweight locks is also carried out through CAS operations, and the main steps are as follows:
Remove the data saved in Displaced Mark Word in the acquisition lightweight lock;
Replace the retrieved data in Mark Word with CAS operation. If it is successful, it means that the lock is released successfully, otherwise it will be executed (3);
If the CAS operation replacement fails, it means that other threads try to acquire the lock, and the suspended thread needs to be awakened while releasing the lock.
For lightweight locks, the basis for improving performance is that "for most locks, there will be no competition throughout the entire life cycle." If this basis is broken, in addition to the mutually exclusive overhead, there are additional CAS operations. Therefore, in the case of multi-thread competition, lightweight locks are slower than heavyweight locks;
The following figure shows the acquisition and release process of lightweight locks
Positive lock
The main purpose of introducing biased locks is to minimize unnecessary lightweight lock execution paths without multi-thread competition. The above mentioned that the lock-up and unlocking operation of lightweight locks requires relying on multiple CAS atomic instructions. So how does bias lock reduce unnecessary CAS operations? We can see the structure of the Mark work. You only need to check whether it is a biased lock, the lock is identified as and the ThreadID. The processing flow is as follows:
Get the lock
Detect whether Mark Word is in a biasable state, that is, whether it is biased lock 1, and the lock identification bit is 01;
If it is a biasable state, test whether the thread ID is the current thread ID. If so, execute step (5), otherwise execute step (3);
If the thread ID is not the current thread ID, then the lock is competed for by CAS operation. If the competition is successful, the thread ID of Mark Word is replaced with the current thread ID, otherwise the thread (4);
The CAS competition lock fails, which proves that there is currently a multi-thread competition situation. When the global security point is reached, the thread that obtains the biased lock is suspended, the biased lock is upgraded to a lightweight lock, and then the thread blocked at the security point continues to execute the synchronous code block;
Execute synchronous code blocks
Release locks. The release of the biased lock adopts a mechanism in which only competition can release the lock. The thread will not actively release the biased lock and needs to wait for other threads to compete. The undo of biased locks requires waiting for a global security point (this point is that there is no code being executed at). The steps are as follows:
Pause the thread with a biased lock and determine whether the lock object stone is still in the locked state;
Unlock the bias towards Su and restore to lock-free state (01) or lightweight lock state;
The following figure shows the acquisition and release process of biased locks
Heavyweight lock
Weight lock is also called an object monitor (Monitor) in JVM. It is very similar to Mutex in C. In addition to having Mutex(0|1) mutually exclusive function, it is also responsible for implementing the function of Semaphore, that is, it contains at least a queue of competition locks and a signal blocking queue (wait queue). The former is responsible for mutual exclusion, and the latter is used for thread synchronization.
Heavyweight locks are implemented through monitors (monitors) inside objects, where the essence of monitors is to rely on Mutex Lock implementation of the underlying operating system. The switching between threads of the operating system requires switching from user state to kernel state, and the switching cost is very high.
Summarize
The above is all the content of this article about the detailed explanation of the implementation principle of synchronized in Java. I hope it will be helpful to everyone. If there are any shortcomings, please leave a message to point it out. Thank you friends for your support for this site!