Data decompositionData decomposition is a top-down technique for breaking work into small tasks that can be parallelized. That strategy allows creating data chunks which can be processed in parallel. This strategy is generally applied in database systems, for example both Cassandra and MongoDB use that strategy to scale up. Both databases have a partition strategy which calculates a hash value and then uses that value to locate a node in the cluster.
There are also real life examples showing that strategy works very well and not applying it causes lot of trouble.
For example, I use Tottenham Court Rd tube station for my daily commuting routine and observe how life can be painful when not following decomposition strategies. This station has a notorious access to northern line users, requiring them to go to Central line platform then pass to the opposite to access northern line, moreover passenger traffic on the platform is two ways which causes hundreds of people bumping to each other during peak hours, although they are interested in different routes.
|Blocking platform entries|
Single WriterThat strategy states using a single writer to change a state of a resource frequently used in an application, helping to eliminate memory contention which can be seen in today’s multi-core processors. Synchronizing data among different cores has big performance penalty because of bus traffic, cache coherency, locking cost and cache misses. With single writer, a cpu can utilize modern cpu advancements, knowing that there is no other writers, it doesn't need to force a sync with RAM and can optimize L1 cache usage. Moreover with single writer there is not a lock acquire/release stage that happen with multi writers.
False SharingFalse sharing is notoriously known as the silent performance killer. It arise in parallel applications when two threads operate on an independent data in the same memory address region, stored on the same cache line. The cache coherency protocol may invalidate the whole cache line on each write, leading to memory stalls and occupying the bus or the interconnect.
Detecting this issue is quite a daunting task, and involves inspecting the memory layout of the classes and analysing the CPU cache lines. After identifying that the application indeed suffers from false sharing.
Let’s assume we have a message-id resource which is incremented every time when it is accessed by a single writer thread. With single writer principle, one can expect to have some performance improvements for message-id number generation as there is not any lock release/acquire steps, which would have happen between multiple threads.
However modern processors have multiple cores that include cache lines, these cache lines generally store 64 bytes, therefore a CPU can align a number of variables into the same cache line regardless of their frequency access. Each time a cache line is updated, it has to be invalided, flushed to the RAM and then synced with other core’s cache lines.
Unfortunately there isn't any native way to avoid sharing a cache line with other variables up to JDK7, this issue has been fixed in JDK8 with the @Contented annotation.
However there are still ways to ensure that a variable is not shared with other variables in the same cache line, one of them is to make the shared variable big enough to occupy whole cache line so that another variable cannot fit into it. That’s generally achieved by creating an array of 64byte size, which is a 8-size long array in Java, then the shared variable is stored in one of the array position.
|Cache line alignment|
This is like variables sharing the same cache line. The more we have control how data is decomposed, the easier to improve application performance.