Lock Free Data Structure I

Introduction

In my current work, there are many heavy work needs CP and IO operation. In order to acheive a high throughput, we adapt Producer and Consumer pattern. But to maintain the state is really harmful for system performance. So I did some research and try to avoid that. This post will introduce some lock free data structure and lock free queue can be used for Producer/Consumer pattern. And in my project, that actually improve QPS 10x!!! Awesome!!!

Compare_And_Swap

Wiki:
Compare_And_Swap (CAS) is an atomic instruction used in multithreading to achieve synchronization.It compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple Boolean response (this variant is often called compare-and-set), or by returning the value read from the memory location (not the value written to it).

CAS is supported directly from CPU level, which means it ensure the atomicity. We can describe the behavior of CAS by following function:

int compare_and_swap(int *reg, int oldval, int newval)
{
	ATOMIC();
	int old_reg_val = *reg;
	if(old_reg_val == oldval)
		*reg = newval;
	END_ATOMIC();
	return old_reg_val;
}

CAS is not only used for implementing synchronization primitives like semaphores and mutexes but also used for more sophisticated lock-free and wait-free algorithms.

By using CAS syntax in C++ 11 (compare_exchange_weak), we can define a non blocking counter:

class nonblocking_counter{
    std::atomic<unsigned> count = 0;
    // increment the count
    unsigned increment(){
      do{
        // get old value and calculate new value for counter
        unsigned old_count = count.load();
        unsigned new_count = old_count+1;
        // Atomically increment the counter.
        boolean success = count.compare_exchange_weak(old_count, new_count);
        // someone else changed the counter first -- start over.
      }while(!success);
      return old_count;
    }
    // get the current count
    unsigned get(){
        return count.load();
    }
};

ABA problem

Before we start implement this, first I want to introduce a famous problem ABA in CAS :It’s possible that between the time the old value is read and the time CAS is attempted, some other processors or threads change the memory location two or more times such that it acquires a bit pattern which matches the old value. The problem arises if this new bit pattern, which looks exactly like the old value, has a different meaning: for instance, it could be a recycled address, or a wrapped version counter.

A general solution to this is to use a double-length CAS (e.g on a 32 bit system, a 64 bit CAS). The second half is used to hold a counter. The compare part of the operation compares the previously read value of the pointer and the counter, to the current pointer and counter. If they match, the swap occurs(the new value is written) but the new value has an incremented counter. This means that if ABA has occurred, although the pointer value will be the same, the counter is exceedingly unlikedly to be the same.

Reference

Maurice Herlihy (1991) Wikipedia