Tuesday, 15 October 2013

Java Memory Model and Happens-Before Internals

Recently I jumped into some concurrency code that takes advantage of happens-before guarantee to ensure a specific configuration was loaded in an executing thread. I was also aware that a good bunch of java concurrency classes, especially AbstractQueuedSynchronizer, makes usage of same synchronization piggybacking technique.

Although Java Memory Model is documented well, I was still wondering which assembler instructions are used to ensure “happens-before” guarantees at multi-core CPU level.
I managed to get some free time and play with java at low level and tested three synchronization techniques. I also found a good documentation explaining  Synchronized Memory Access details.
According to Microsoft’s x86 Architecture following actions happen for lock instructions.
  1. Before issuing the instruction, the CPU will flush all pending memory operations to ensure coherency. All data pre-fetches are abandoned.
  2. While issuing the instruction, the CPU will have exclusive access to the bus. This ensures the atomicity of the load/modify/store operation.
Below, I will detail how java concurrency utilities are translated to low level assembler instructions. Details of assembly printing can be found on PrintAssembly wiki page.

Synchronized


That technique uses synchronized keyword to lock variables properly. As it synchs whole method block, therefore there are two lock cmpxchg instructions in the generated assembly code.

Volatile

Below code piece uses volatile keyword to set the counter value.

Volatile keyword causes inserting a lock instruction to ensure happens-before guarantees and flushes pending changes to make sure these changes are visible.

Atomic

Below code shows a simple AtomicInteger being used a counter.

When above code is executed with -XX:+UnlockDiagnosticVMOptions   -XX:CompileCommand=print,Counter.inc, similar assembly instructions are seen in the assembly dump log.

Tuesday, 1 October 2013

Time Series Map

Recently, I needed a time-series data where a ticker information was needed to be stored every millisecond. My initial thought was I could create a map which would have millisecond as the key and data as the value.

Map<Long, Tick> tickMap;

I soon realized also needed to remove old entries from the map as the goes on to keep memory consumption in control. Again I thought I could evict expiring map entries from map after a put operation or use guava cache builder. Although that would work, I would cause creating/removing a lot of objects and hence risking garbage collector runs.
I remembered Disruptor usage from my Betfair times, the magic Ring Buffer solution which eliminates GC runs by reusing early initialized objects. Ring buffer is basically an array list with a predefined capacity. Instead of adding/removing entries from the ring, entries are updated to reflect the last added object. Locating an entry’s position can be determined by a bitwise operation of index mask and entry’s sequence value.
Disruptor/Ring Buffer

The sequence number is basically an always increasing pointer, yielding the next long number and index mask is simply size of buffer minus one. With that information in my hand, I can build a ring buffer to store 24 hour volatile data without driving GC crazy. All i need is defining a ring buffer with a power of 2 size, above 60 x 60 x 24x1000 milliseconds.  The reason why power of 2 is required is for performance reason. Computers like binary systems and it seems an index locating function using a bitwise operation is cheaper than a modulus operation. Java Hashmap internally uses same logic for determining the size of the internal buckets.