First, let’s introduce some optimistic locks and pessimistic locks:
Pessimistic lock: Always assume the worst case. Every time I go to get the data, I think others will modify it, so I will lock it every time I get the data, so that others will block it until it gets the lock. Traditional relational databases use many such locking mechanisms, such as row locks, table locks, etc., read locks, write locks, etc., which are locked before doing operations. For example, the implementation of the synchronized keyword synchronized keyword in Java is also a pessimistic lock.
Optimistic lock: As the name suggests, it means very optimistic. Every time I go to get the data, I think others will not modify it, so I will not lock it. However, when updating, I will judge whether others have updated the data during this period, and can use the version number and other mechanisms. Optimistic locks are suitable for multi-read application types, which can improve throughput. For example, the database provides an optimistic lock similar to the write_condition mechanism, but they are actually all provided by optimistic locks. In Java, the atomic variable class under the java.util.concurrent.atomic package is implemented by CAS using an optimistic lock.
An implementation of optimistic lock-CAS (Compare and Swap):
Locking problems:
Before JDK1.5, Java relied on synchronized keywords to ensure synchronization. This way, by using a consistent lock protocol to coordinate access to shared state, it can ensure that no matter which thread holds the lock of shared variables, it uses an exclusive method to access these variables. This is a kind of exclusive lock. Exclusive lock is actually a kind of pessimistic lock, so it can be said that synchronized is a pessimistic lock.
The pessimistic lock mechanism has the following problems:
1. Under multi-thread competition, adding and releasing locks will lead to more context switching and scheduling delays, causing performance problems.
2. A thread holding a lock will cause all other threads that require this lock to hang.
3. If a thread with high priority waits for a thread with low priority to release the lock, it will cause priority inversion, causing performance risks.
Compared with these problems of pessimistic locks, another more effective lock is optimistic locks. In fact, optimistic locking is: every time you don’t add a lock, but you complete an operation assuming there is no concurrent conflict. If the concurrent conflict fails, try again until it succeeds.
Optimistic lock:
Optimistic Locking has been mentioned above, but it is actually a kind of thought. Compared with pessimistic locks, optimistic locks assume that data generally will not cause concurrent conflicts, so when the data is submitted and updated, it will formally detect whether the data has concurrent conflicts. If a concurrent conflict is found, the user's wrong information will be returned and the user decides how to do it.
The concept of optimistic lock mentioned above has actually explained its specific implementation details: it mainly includes two steps: conflict detection and data update. One of the typical implementation methods is Compare and Swap (CAS).
CAS:
CAS is an optimistic locking technology. When multiple threads try to use CAS to update the same variable at the same time, only one of the threads can update the value of the variable, while the other threads fail. The failed thread will not be suspended, but will be told that this competition has failed and can try again.
The CAS operation contains three operands - the memory location (V) that needs to be read and written, the expected original value (A) for comparison, and the new value (B) to be written. If the value of memory position V matches the expected original value A, the processor will automatically update the position value to the new value B. Otherwise, the processor will not do anything. In either case, it returns the value of that location before the CAS directive. (In some special cases of CAS, only whether CAS is successful or not, without extracting the current value.) CAS effectively states that "I think position V should contain the value A; if it contains, put B in this position; otherwise, don't change the position, just tell me the current value of this position." This is actually the same as the principle of conflict checking + data update of optimistic locks.
Let me emphasize here that optimistic locking is a kind of thought. CAS is a way of realizing this idea.
JAVA support for CAS:
The new java.util.concurrent (JUC) in JDK1.5 is built on CAS. Compared with blocking algorithms such as synchronized, CAS is a common implementation of non-blocking algorithms. Therefore, JUC has greatly improved its performance.
Take AtomicInteger in java.util.concurrent as an example to see how to ensure thread safety without using locks. We mainly understand the getAndIncrement method, which is equivalent to the ++i operation.
public class AtomicInteger extends Number implements java.io.Serializable { private volatile int value; public final int get() { return value; } public final int getAndIncrement() { for (;;) { int current = get(); int next = current + 1; if (compareAndSet(current, next)) return current; } } public final boolean compareAndSet(int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); } }In the mechanism without locks, the field value must be used to ensure that the data between threads is visibility. This way, you can read directly when you get the value of a variable. Then let’s see how ++i is done.
getAndIncrement uses CAS operation, and each time you read data from memory, then perform CAS operation on this data and the result after +1. If successful, the result will be returned, otherwise try again until successful.
compareAndSet uses JNI (Java Native Interface) to complete the operation of CPU instructions:
public final boolean compareAndSet(int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); }where unsafe.compareAndSwapInt(this, valueOffset, expect, update); is similar to the following logic:
if (this == expect) { this = update return true; } else { return false; }So how to compare this ==expect, replace this =update, compareAndSwapInt to achieve the atomicity of these two steps? Refer to the principles of CAS
CAS Principle:
CAS is implemented by calling JNI code. compareAndSwapInt is implemented by using C to call the underlying CPU instructions.
The following explains the implementation principle of CAS from the analysis of the more commonly used CPU (intel x86).
Here is the source code of the compareAndSwapInt() method of the sun.misc.Unsafe class:
public final native boolean compareAndSwapInt(Object o, long offset, int expected, int x);
You can see that this is a local method call. The C++ code that this local method calls in the JDK is:
#define LOCK_IF_MP(mp) __asm cmmp mp, 0 / __asm je L0 / __asm _emit 0xF0 / __asm L0:inline jint Atomic::cmpxchg (jint exchange_value, volatile jint* dest, jint compare_value) { // alternative for InterlockedCompareExchange int mp = os::is_MP(); __asm { mov edx, dest mov ecx, exchange_value mov eax, compare_value LOCK_IF_MP(mp) cmmpxchg dword ptr [edx], ecx }}As shown in the source code above, the program will decide whether to add a lock prefix to the cmmpxchg instruction based on the type of the current processor. If the program is running on a multiprocessor, add the lock prefix to the cmmpxchg instruction. On the contrary, if the program is running on a single processor, the lock prefix is omitted (the single processor itself maintains the sequential consistency within the single processor and does not require the memory barrier effect provided by the lock prefix).
CAS Disadvantages:
1. ABA questions:
For example, if a thread one takes out A from memory position V, then another thread two also takes out A from memory, and two performs some operations and becomes B, and then two turns the data at V position A. At this time, thread one performs CAS operation and finds that A is still in memory, and then one operates successfully. Although thread one's CAS operation is successful, there may be hidden problems. As shown below:
There is a stack implemented with a one-way linked list, with the top of the stack being A. At this time, thread T1 already knows that A.next is B, and then hopes to replace the top of the stack with B with CAS:
head.compareAndSet(A,B);
Before T1 executes the above instruction, thread T2 intervenes, puts A and B out of the stack, and then pushD, C and A. At this time, the stack structure is as follows, and object B is in a free state at this time:
At this time, it is thread T1's turn to perform the CAS operation. The detection found that the top of the stack is still A, so CAS succeeds, and the top of the stack becomes B, but in fact B.next is null, so the situation at this time becomes:
There is only one element B in the stack, and the linked list composed of C and D no longer exists in the stack. C and D are thrown away for no reason.
Starting from Java 1.5, the JDK's atomic package provides a class AtomicStampedReference to solve the ABA problem. The compareAndSet method of this class is to first check whether the current reference is equal to the expected reference and whether the current flag is equal to the expected flag. If all are equal, the reference and the value of the flag are set to the given updated value in atomic manner.
public boolean compareAndSet( V expectedReference,//expected reference V newReference,//updated reference int expectedStamp, //expected flag int newStamp //updated flag)
Actual application code:
private static AtomicStampedReference<Integer> atomicStampedRef = new AtomicStampedReference<Integer>(100, 0); ......... atomicStampedRef.compareAndSet(100, 101, stamp, stamp + 1);
2. Long cycle time and high overhead:
Spin CAS (if it fails, it will be executed cycled until it succeeds) If it fails for a long time, it will bring great execution overhead to the CPU. If the JVM can support the pause instructions provided by the processor, the efficiency will be improved to a certain extent. The pause instructions have two functions. First, it can delay pipeline execution instructions (de-pipeline) so that the CPU will not consume too much execution resources. The delay time depends on the specific implementation version. On some processors, the delay time is zero. Second, it can avoid the CPU pipeline flush caused by memory order violation when exiting the loop, thereby improving the CPU execution efficiency.
3. Only atomic operations of a shared variable can be guaranteed:
When performing operations on a shared variable, we can use the cyclic CAS method to ensure atomic operations. However, when operating multiple shared variables, the cyclic CAS cannot guarantee the atomicity of the operation. At this time, you can use locks, or there is a trick, which is to merge multiple shared variables into a shared variable to operate. For example, there are two shared variables i=2, j=a, merge ij=2a, and then use CAS to operate ij. Starting from Java 1.5, JDK provides the AtomicReference class to ensure the atomicity between referenced objects. You can put multiple variables in one object for CAS operation.
CAS and Synchronized usage scenarios:
1. For situations where there is less resource competition (light thread conflict), using synchronized synchronization lock for thread blocking and wake-up switching and switching operations between user-state kernel states is extra waste of CPU resources; while CAS is implemented based on hardware, does not need to enter the kernel, does not need to switch threads, and the chance of operating spins is less, so higher performance can be obtained.
2. For situations where resource competition is serious (severe thread conflict), the probability of CAS spin is relatively high, which wastes more CPU resources and is less efficient than synchronized.
Supplement: Synchronized has been improved and optimized after jdk1.6. The underlying implementation of synchronized mainly relies on the queue of Lock-Free. The basic idea is to block after spin, continue to compete for locks after competition switching, slightly sacrificing fairness, but obtaining high throughput. When there are fewer thread conflicts, similar performance can be obtained; when there are serious thread conflicts, the performance is much higher than that of CAS.
Implementation of the concurrent package:
Since Java's CAS has both memory semantics for volatile read and volatile write, there are now four ways to communicate between Java threads:
1. Thread A writes the volatile variable, and then thread B reads the volatile variable.
2. Thread A writes the volatile variable, and then thread B uses CAS to update the volatile variable.
3. Thread A uses CAS to update a volatile variable, and then thread B uses CAS to update this volatile variable.
4. Thread A uses CAS to update a volatile variable, and then thread B reads this volatile variable.
Java's CAS uses efficient machine-level atomic instructions provided on modern processors, which perform read-change-write operations on memory atomically, which is the key to achieving synchronization in multiprocessors (essentially, a computer machine that can support atomic read-change-write instructions is an asynchronous equivalent machine that sequentially calculates Turing machines, so any modern multiprocessor will support some atomic instructions that can perform atomic read-change-write operations on memory). At the same time, the read/write and CAS of the volatile variable can realize communication between threads. Integrating these features together forms the cornerstone of the implementation of the entire concurrent package. If we carefully analyze the source code implementation of the concurrent package, we will find a general implementation pattern:
1. First, declare the shared variable to be volatile;
2. Then, use the atomic condition update of CAS to achieve synchronization between threads;
3. At the same time, communication between threads is achieved by using the read/write of volatile and the memory semantics of volatile read and write in CAS.
AQS, non-blocking data structures and atomic variable classes (classes in the java.util.concurrent.atomic package), the basic classes in these concurrent packages are implemented using this pattern, and the high-level classes in the concurrent package rely on these basic classes to implement. From a general perspective, the implementation diagram of the concurrent package is as follows:
CAS (Assignment of Objects in the Heap):
Java calls new object() to create an object, which will be allocated to the JVM heap. So how is this object saved in the heap?
First of all, when new object() is executed, how much space this object needs is actually determined, because the various data types in Java and how much space they take are fixed (if you are not clear about its principle, please Google it yourself). Then the next job is to find a piece of space in the heap to store this object.
In the case of single thread, there are generally two allocation strategies:
1. Pointer Collision: This is generally applicable to memory that is absolutely regular (whether the memory is regular depends on the memory recycling strategy). The task of allocating space is just to move the pointer like the distance of the object size on the side of the free memory.
2. Free list: This is suitable for non-regular memory. In this case, the JVM will maintain a memory list to record which memory areas are free and what size is. When allocating space to objects, go to the free list to query the appropriate area and then allocate it.
However, it is impossible for the JVM to run in a single threaded state all the time, so the efficiency is too poor. Since it is not an atomic operation when allocating memory to another object, at least the following steps are required: finding a free list, allocating memory, modifying a free list, etc., which is not safe. There are also two strategies to solve the security problems during concurrency:
1. CAS: In fact, the virtual machine uses CAS to ensure the atomicity of the update operation by failing to retry, and the principle is the same as mentioned above.
2. TLAB: If CAS is used, it will actually have an impact on performance, so the JVM has proposed a more advanced optimization strategy: each thread pre-allocates a small piece of memory in the Java heap, called the local thread allocation buffer (TLAB). When the thread needs to allocate memory within it, it is enough to allocate it directly on TLAB, avoiding thread conflicts. Only when the buffer memory is used up and needs to reallocate memory will CAS operation be performed to allocate larger memory space.
Whether the virtual machine uses TLAB can be configured through -XX:+/-UseTLAB parameter (jdk5 and later versions are enabled by default).
The above is all the content of this article. I hope it will be helpful to everyone's learning and I hope everyone will support Wulin.com more.