This article mainly studies the relevant content of high throughput and thread-safe LRU cache, as follows.
A few years ago, I implemented an LRU cache to find its id for keywords. The data structure is very interesting because the required throughput is large enough to eliminate the performance problems caused by the large number of locks and synchronized keywords. The application is implemented in Java.
I thought that a series of atomic reference allocations would keep LRUs and LRU order in ConcurrentHashMap. At the beginning, I wrapped the value into the entry. The entry has a node in the LRU chain of the double-linked list. The tail of the chain maintains the recently used entry, and the head node stores the entry that may be cleared when the cache reaches a certain size. Each node points to the entry used to find.
When you look up the value through the key, the cache first needs to look for the map to see if this value exists. If it does not exist, it will depend on the loader to read the value from the data source in a read-through way and add it in the map in the "add if missing". The challenge of ensuring high throughput is to effectively maintain the LRU chain. This concurrent hash map is segmented and the thread level is at a certain level (you can specify the concurrency level when you build the map) will not experience too much thread competition. But can't the LRU chain be divided in the same way? To solve this problem, I introduced an auxiliary queue for clearing operations.
There are six basic methods in cache. For cache hits, search includes two basic operations: get and offer, and for coarse loss, there are four basic methods: get, load, put, and offer. In the put method, we may need to track the clear operation. When the cache hits, we passively do some clearing on the LRU chain called the purification operation.
get : lookup entry in the map by key
load : load value from a data source
put : create entry and map it to key
offer: append a node at the tail of the LRU list that refers to a recently accessed entry
evict: remove nodes at the head of the list and associated entries from the map (after the cache reaches a certain size)
purge: delete unused nodes in the LRU list -- we refer to these nodes as holes, and the cleanup queue keeps track of these
The clearing operation and purification operation are both large batches of processing data. Let's take a look at the details of each operation.
The get operation works as follows:
get(K) -> V lookup entry by key k if cache hit, we have an entry e offer entry e try purge some holes else load value v for key k create entry e <- (k,v) try put entry e end return value ev
If the key exists, we provide a new node at the tail of the LRU chain to indicate that this is a recently used value. The execution of get and offer is not an atomic operation (no lock here), so we can't say that this offered node points to the most recently used entity, but it is definitely the most recently used entity obtained when we execute concurrently. We do not force get and offer to execute order between threads, as this may limit throughput. After offering a node, we try to do some operations to clear and return value. Let’s take a look at the offer and purge operations in detail below.
If cache loss occurs, we will call the loader to load value for this key, create a new entity and put it into the map, and the put operation is as follows:
put(E) -> E existing entry ex <- map.putIfAbsent(ek, e) if absent offer entry e; if size reaches evict-threshold evict some entries end return entry e else, we have an existing entry ex return entry ex end
As you can see, there may be competition when two or more threads put an entity into the map, but only one success is allowed and the offer will be called. After providing a node at the tail of the LRU chain, we need to check whether the cache has reached its threshold, which is the identifier we use to start the batch clear operation. In this specific application scenario, the setting of the threshold is smaller than the capacity. The clearing operation occurs in a small batch rather than when every entity is added. Multiple threads may participate in the clearing operation until the cache capacity reaches its capacity. Locking is easy but threads can be safe. Clearing the head node of the LRU chain requires removal, which requires careful atomic operations to avoid multi-threaded removal operations in the map.
This offer operation is very interesting. It always tries to create a node but does not try to remove and delete nodes that are no longer used in the LRU immediately.
offer(E) if tail node doesn't refer to entry e assign current node c <- en create a new node n(e), new node referers to entry e if atomic compare-and-set node en, expect c, assign n add node n to tail of LRU list if node c not null set entry ce to null, c now has a hole add node c to cleanup queue end end end
First, it will check that the node at the tail of the chain does not point to the entity that has been accessed, which is no different unless all threads frequently access the same key-value pair. It will create a new node at the tail of the chain. When this entity is different, before providing a new node, it tries to make a comparison and set operation for the entity, which will prevent multiple threads from doing the same thing.
The thread that successfully allocates nodes provides a new node at the end of the LRU chain. This operation is the same as the find in ConcurrentLinkedQueue. The dependency algorithm is described in the following article. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. The thread will then check whether the entity is related to other nodes before. If so, the old node will not be deleted immediately, but will be marked as a hole (the reference to its entity will be set to empty)
The above is all the detailed explanation of this article about high-throughput and thread-safe LRU cache, and I hope it will be helpful to everyone. Interested friends can continue to refer to other related topics on this site. If there are any shortcomings, please leave a message to point it out. Thank you friends for your support for this site!