This week I spent some time on reading source code of memcached GitHub Link
Followings are some lessons.
The memcached server allocates memory by using “Slab allocator”. The reason why this slab allocator is used over malloc/free is to avoid fragmentation and the operating system having to spend cycles searching for contiguous blocks of memory. These tasks overall, tend to consume more resources than the memcached process itself. With the slab allocator, memory is allocated in chunks and in turn, is constantly being reused. Because memory is allocated into different size slabs, it will waste memory at some extent if the data being cached does not fit perfectly into the slab. Following is the struct definition of slab, which tells us the size is in powers of N. In current implementation, the largest slab size is 1 MB, which means cached data cannot exceed this size.
When memcached’s distributed hash table becomes full, the upcoming inserts force older cached data to be cycled out in a LRU order with associated expiration timeouts. Memcached uses lazy expiration. This menas it does not make use of additional CPU cycles to expire items. When data is requestd via a “get” request, memcached references the expiration time to confirm if the data is valid before returnning it to the client requesting the data. When new data is being added to the cache via a “set”, and memcached is unable to allocate an additional slab, expired data will be cycled out prior to any data that qualifies for the LRU criteria.
Data Redundancy and Reproduce
By design, there are no data redundancy features built into memcached. Memcached is designed to be a scalable and high performance caching-layer, including data redundancy functionality would only add complexity and overhead to the system.
If memcached servers suffer from a loss of data, it should still be able to retrieve its data from the original source database.
Another thing is memcached does not have any built in reproduce operation when first attempt fails. One can simply have an abundance of nodes. Because memcached is designed to scale straight out-of-the-box, this is a key characteristic to exploit. Having plenty of memcached nodes minimizes the overall impact an outage of one or more nodes will have on the system as a whole.
In general, “warming-up” memcached from a database dump is not the best course of
action to take. Several issues arise with this method. First, changes to data that have
occurred between the data dump and load will not be accounted for. Second, there must be
some strategy for dealing with data that may have expired prior to the dump being loaded.
In situations where there are large amounts of fairly static or permanent data to be cached,
using a data dump/load can be useful for warming up the cache quickly.