Common Message Queuing Misconception

by Mike Richardson

Since this seems to be a hot topic right now, I want to clear up a common misconception regarding message queuing and persistence. Because I cannot cover every vendor implementation, I will discuss this post in the context of .NET (MSMQ) and briefly WebSphere MQ. Also, this only assumes a synchronous messaging model. Although it varies by vendor, there is some “trickery” associated with some of the async methods with regards to persistence and transactions.

Most developers are familiar with the concept of store-and-forward messaging. Basically, if a client sends a message to a server using MSMQ as the communication mechanism and the server experiences a sudden outage, then a basic failover mechanism “kicks in”. Most of the time, in a disconnected state, the client will not attempt to send messages to remote queues. Rather, it will “buffer” them and then eventually write them to disk. This mechanism allows architects to design systems with increased fault tolerance.

Depending on your SLA’s, you may have to support any number of “9’s”. If you need to support 5-9’s and the utmost in availability, you essentially cannot afford to lose ANY messages. The common misconception (and a source of much lost data) is that MSMQ and other queuing vendors support this type of functionality “out of the box”. This is simply not true.

In the case of MSMQ, the “buffer” that I briefly mentioned above is implemented as a MemoryMappedFile structure. This “file” is then flushed to disk at certain intervals (which are configurable). Therefore, if there is an outage after messages are mapped to memory but before they are flushed to disk then you may lose data. So you might be thinking that if you decrease the time to flush the mapped file to disk to its minimum value then you will avoid losing data – also not the case. Depending on the type of disk you use (even if it is a fast 15,000 RPM drive), the head of the disk may not be able to write to the partition fast enough to flush the file (assuming the disk does not cache the data).

The best way to implement persistence using MSMQ is to mark the queue as transactional. This can be configured manually or programmatically by passing a value of true to the queue creation method. By invoking a transaction boundary, MSMQ treats the unit of work much like SQL Server. However, be aware that this significantly degrades performance. Also, the meaning of a transaction failure is commonly misunderstood in the MSMQ world – when a message that is encompassed in a transaction fails to complete, the message is placed back on the sending queue. It is like nothing ever happened.

Implementing persistence in WebSphere MQ is just as simple, and there are two methods at your disposal. First, you can implement a durable channel for queue-to-queue message transmission. In addition, after you configure durability at the channel level you can mark individual messages as durable (or not). One of the great strengths of WebSphere MQ is its implementation of persistent messaging. Similar to DB2, WebSphere MQ uses a transaction log implementation for persistence. Aggregating these logs with monitoring software is simple using something like MOM (Microsoft Operations Manager). The second, and less common option, is to implement the unit of work pattern.

Hopefully this post clears up the fact that most queuing vendors do not support saving every single message. A little bit of extra work and good design is still needed to avoid any loss of critical business data. And remember, increased persistence means increased price. If you need to store lots of data, you will need to be concerned with the size of your data (which increases 20% if you serialize as XML), the size and type (page or non-paged) of the journaling file, and of course disk redundancy. Even if you write the messages to disk immediately or use transactions, the disk itself could always go bad. Who knows, you may end up needing a full-blown SAN network in order to insure never losing a message. As you can see, fixing this misconception is not cheap.