The existing solutions for Tomcat session clustering are viable for moderate clusters and straightforward failover in traditional deployments. Several are well developed with a mature code-base, such as Apache's own multicast clustering. But there's no getting around the fact that today's cloud architectures place new and different expectations on Tomcat that didn't exist several years ago when those solutions were being written. In cloud architectures, where horizontal scalability is king, a dozen or more Tomcat instances is not unusual.
In my hybrid, private cloud, first described in my earlier post on keeping track of the availability and states of services across the cloud, I wanted to fully leverage all my running Tomcat instances on every user request. I didn't want to use sticky sessions as I wanted each request (including AJAX requests on a page) to be routed to a different application server. This is how virtual machines work at a fundamental level. Increased throughput is obtained by parallelizing as much of the work as possible and utilizing more of the available resources. I wanted the same thing for my dynamic pages. Not finding a solution in the wild, I resolved to roll my own session manager. The result is the "vcloud" (or virtual cloud) session manager. I've Apache-licensed the source code and put it on GitHub--mostly to try and elicit free help to make it better. You can find the source code here: http://github.com/jbrisbin/vcloud/tree/master/session-manager/
I dug into the code of the available session managers and found they simply would not do what I was asking them to do. My expectations were (and are) quite high. I was looking for something that was, simultaneously:
I don't have to give the standard session managers a second thought when deploying a web application onto Tomcat. I wanted the same out of any other session manager. I also didn't want user sessions "copied" throughout the network all the time. I use lots of little Tomcat instances that have small heap sizes. If there was any way around it, I didn't want to keep entire copies of every active session in every server. But if I wanted the session loaded on demand, I'd need something screamingly fast. I use a cluster of RabbitMQ servers internally, so it only made sense to leverage that great asynchronous message broker to solve this problem.
Ideally, a user session would always be available in-memory. This is the fastest possible way to serve the user request. But it's inherently limiting. What if a new server comes online? Or, why keep precious resources locked up caching user sessions if that application server never sees that user?
Ain't that just life? Always having to trade one good thing for something else that you hope will really be better. Having user sessions available on-demand in any server that needs them is definitely trading off the speed of having a user session in-memory with the flexibility of not knowing (or caring) where that session object comes from. But realistically, it's only trading off the time it takes to handle the bytes that make up the object going from one server to the next. Since all my servers are connected by either a VMware virtual switch (because they're running on the same host) or are directly connected to our core Cisco gigabit switch; and because the only object stored in those user sessions is a Spring Security user object, I felt okay taking that hit on page load time if it meant I could load-balance all my servers, all the time.
As a Java programmer, I found myself doing what nearly every Java programmer alive does: I tried to complicate things by adding to the code all kinds of stuff I didn't need. It took me a while to filter that out but I eventually settled on only the components that are actually required: a fast, thread-safe list of session IDs that are currently active in the cloud, and a convention of subscribing and publishing standard messages to queues with names that include the session ID.
When a Tomcat server is asked to load a user session, it passes that ID into the CloudStore, which checks the internal list of session IDs. If it finds that ID listed there, it knows it has an active session and checks the internal Map to see if that session is actually local to the application server. If it's not, it publishes a "load" message to the appropriate queue. The thread is then blocked for a configurable amount of time until it's either got the loaded session or it times out and assumes the session is no longer valid (maybe because the server that had it in-memory crashed).
As a somewhat-related aside: the Tomcat server doesn't know what's on the other end of that queue. It could be another Tomcat server that has that session in its internal Map. It could also be a dedicated session replicator written in Ruby and backed by a Redis (or other NoSQL) store. Going the other direction: code running in standalone mode, like a system utility, could load the session object that was created when a user logged into your initial web application, exposing a single user object to any Java code running anywhere in your cloud. Pretty cool, huh? ;)
When the server responsible for that session gets this load message, it serializes the session it has in-memory and sends those bytes back to the sender. The sender's listener gets this "update" message and checks to see if any loaders are waiting on this session. If it finds one, it gives the deserialized session to the loader, who, in turn, passes the session back to Tomcat, who continues to serve the page.
We won't have time in just one article to cover the important parts of the CloudStore. In my next article, I'll discuss the "load" and "update" event handlers in detail as well as issue the obligatory caveats and acknowledge the known shortcomings. But to start with, let's take a quick peek at the event dispatcher, which is the backbone of the CloudStore session manager.
There are several different programming models when writing asynchronous applications. There's no one way to tackle the problem, so it often comes down to simply personal taste. I can get my mind around the dispatching model more easily than some of the alternatives, so that's the route I chose. Fundamentally, the dispatcher has a custom EventListener class that runs in a separate thread which is responsible for pulling messages off the queue, interrogating them, and dispatching them to the appropriate handler based on their type. There is a message type for every action the CloudStore performs.
Almost without exception, I dispatch EventHandlers into separate threads to do the actual work requested by the incoming message. I do this because I have only one QueueingConsumer per queue. I want as much throughput as possible, so the thread that pulls messages off the queue can very quickly get back to doing what it's supposed to concentrate on: processing messages as quickly as possible. Because I'm using the official RabbitMQ Java client, I have to do this in a separate thread so I can publish messages, bind and unbind queues, and perform other operations that require communicating back to a RabbitMQ server; otherwise, those operations would cause a deadlock if I tried to do them in the EventListener's thread.
The other event message types ("load" and "update") get emitted whenever the session manager attempts to load a session or when the server that has the actual object in memory responds to that load message, respectively. I'll cover those in my next article.