Wednesday, April 23, 2014

Double Checked Locking Problem

The problem is often liked to Singleton pattern. Following Singleton implementation as traditionally done works fine in a single threaded program; but not in a multi threaded one.
1 // from the header file
2 class Singleton {
3 public:
4 static Singleton* instance();
5    ...
6 private:
7    static Singleton* pInstance;
8 };
9 // from the implementation file
10 Singleton* Singleton::pInstance = 0;
 
11 Singleton* Singleton::instance() {
12  if (pInstance == 0) {
13      pInstance = new Singleton;
14   }
15   return pInstance;
16 }


 In the above example, if thread 1 context switched after testing pInstance for null at line 12, thread 2 still see pInstance as null and create an instance of Singleton. thread 1 continue from line 13 and create another instance of Singleton and assign it to pInstance. There are 2 instances of Singleton has now been created.

The solution for this is simple, by protecting the code that modify the shared resource with Locks. However the cost is that each access to the singleton (calling instance()) requires acquisition of a lock, but we need the lock only for the first call.
1 Singleton* Singleton::instance() {
2    Lock lock; // acquire lock (params omitted for simplicity)
3    if (pInstance == 0) {
4      pInstance = new Singleton;
5    }
6    return pInstance;
7 } // release lock (via Lock destructor)
To avoid the expensive locking on each call to instance(), we could check pInstance for null before acquiring the lock, and check again before creating the new instance(hence double checking)
Singleton* Singleton::instance() {
   if (pInstance == 0) { // 1st test
      Lock lock;
      if (pInstance == 0) { // 2nd test
         pInstance = new Singleton;
      }
   }
   return pInstance;
}
The 2nd test is needed as it is possible a second thread could create the pInstance after 1st test but before acquiring the lock. With this version, after the singleton is initialized, access to singleton would return the pInstance straight away without acquiring the lock or creating an instance.

Still there could be issues. C++ standard defines the sequence points are at end of statements. Compiler still have the freedom to re order within the code emitted for one single statement. This is because C++ standard defines the behavior of an abstract machine with no notion of multiple threads (C++ as a language doesn't support multiple threads). 
In the above example, statement pInstance = new Singleton; will emit code for 3 operations Allocate momory, create Singleton object by calling constructor, assign the address to pInstance. If second and third steps are re-ordered, then a 2nd thread calling instance() may see pInstance as not null and return the object, which is not yet constructed (but just the memory has been allocated).
To tell the compiler that the data being dealt with could be changed outside and any read/write operation to the data should not be reordered.
Here you should note that, both pInstance = new Singleton; and *pInstance should be made volatile. if *pInstance is not volatile, any member initialization in the constructor could be re-ordered and may be initialised after pInstance has been set; which means, the object is yet not fully constructed and pInstance is made not null.

The solution would look like:
class Singleton {
public:
static volatile Singleton* volatile instance();
   ...
private:
   // one more volatile added
   static volatile Singleton* volatile pInstance;
};
// from the implementation file
volatile Singleton* volatile Singleton::pInstance = 0;
volatile Singleton* volatile Singleton::instance() {
   if (pInstance == 0) {
      Lock lock;
      if (pInstance == 0) {
        // one more volatile added
        volatile Singleton* volatile temp =
           new volatile Singleton;
        pInstance = temp;
      }
   }
   return pInstance;
}
 However, it could still fail for 2 reasons:
1) the Standard's constraints on observable behavior are only for an abstract machine defined by the Standard, and that abstract machine has no notion of multiple threads of execution. As a result, though the Standard prevents compilers from reordering reads and writes to volatile data within a thread, it imposes no constraints at all on such reorderings across threads.

2) Second, just as const-qualified objects don't become const until their constructors have run to completion, volatile-qualified objects become volatile only upon exit from their constructors. In the statement:

There could be further problems on a multi processor system where each processor has it's own cash and the object construction by one processor may not be flushed to the main memory, but the initialization of pInstance might have been flushed and be seen by other processors.

Conclusion:
Avoid implement singleton with DCLP. Before go for complicated solution, check the performance impact of lock acquisition for each access to the singleton. if not significant, the simple implementation without DCLP would be fine.
Other options are, rather lazy initialization of the singleton, do an eager initialization at the time of program startup in the single threaded code (before it become multi-threaded).