One of the important components of a concurrent multithreaded system, distributed computing or otherwise, is the inter-thread messaging component. In the earlier days we had to make do with the basic synchronized construct, then came java.util.concurrent and then the Disruptor followed by java 7 enhancements to java.util.concurrent. I compared the performance of each in a simple producer consumer scenario. The following were compared
All the tests were performed on my laptop running Core i5 3320M 2.6 GHz with 4 GB RAM and Windows 7 OS. The results can be found below. The code for the tests is checked into github
Let me walk through the test scenario first. The scenario is a simple multicast with a single producer continously generating messages, upto 10 million in a couple seconds in one go after which it stops. I repeated the test with different number of consumers, 1 to 5. All the consumer does is to read messages from the buffer ie there is no processing of the message. That is a bit unrealistic as a typical consumer would do some processing, even if no I/O is involved, there will be atleast a few milliseconds used up in some processing. And a consumer is typically slower than the producer. But like I said, I am going with a simple test scenario. I will have another test scenario later with a few random ms thrown in to simulate some processing in the consumer. But for now, its this way.
The scenario is also a bit unrealistic in that we almost never have to handle 10 million messages in 2 seconds. But this can show how capable these implementations are, when under heavy load.
Anyway, onto the results now
As you can see from the results the TransferQueue is pretty much there with the Disruptor and even bettering the Disruptor’s performance as the number of consumers goes higher. So if you are at java 7 you have the choice of going with a standard TransferQueue instead of Disruptor. On the other hand, if you are stuck at java 1.6/1.5/1.4(god forbid), then Disruptor is probably the only performant choice you have.
There are a few things to note though. The LinkedTransferQueue is unbounded and there are chances of out of memory failures when the producer outpaces the consumers. But a self monitoring system that throttles production or increases the rate of consumption for example by upping the number of consumers should be able to handle that. Also the LinkedTransferQueue is basically a linked list of nodes and the possible randomness of nodes being scattered all over memory is quite likely thereby leading to inefficient cpu cache prefetches. So there will be some inefficiencies there even though it doesn’t translate to worser numbers in the tests. For the ArrayBlockingQueue, I am using the non-blocking methods to add, there will be a bit of spin, but I believe these work better than the blocking versions especially when the consumers have some non-trivial processing involved.
On a side note, I find the TransferQueue easier to program with, basically it’s swappable with any existing implementation like ArrayBlockingQueue for ex. Anyhow helps to have an abstraction around the messaging piece in your design so that such you can swap one implementation of the pipe with another. My test scenario uses such a MessagePipe abstraction and you can see how easy it was to swap one messaging implementation with another. But Disruptor was not that easy to squeeze within the abstraction, which is one reason I dislike it a bit, it’s programming API doesn’t have the shape of existing standard java implementations. It’s no doubt beautiful though.
I then tried a scenario that simulates some random processing in the consumer. The Producer for this scenario is restricted to produce only 100 units. For 5 consumers, with some simulated processing (sleep for a random of upto 300 ms), the timings were as follows
Again we see that the TransferQueue is performing better than the Disruptor. But it has an inherent advantage in that the TransferQueue is unbounded whereas the Disruptor was bounded to 16 slots.
I repeated the same test, this time with a burst of 1000 producer units in one shot, the ring buffer being restricted to 16 slots (and the transfer queue ofcourse is unbounded). Here are the timings
This was with a queue size of 16 slots. I then upped the number of disruptor slots to 512 and the Disruptor equalled the performance of the TransferQueue
The Disruptor also hangs sometimes, especially when number of consumers + producers is more than the number of cores and BusySpinWaitStrategy is used. Makes sense. Switching to the YieldingWaitStrategy solved that hangup issue. This is the first time I am using Disruptor, so if anyone has any suggestions to make in the code, I would be more than happy to lap them up.