Crawler Session Manager Valve

posted by mthomas on May 18, 2011 07:25 AM

For organizations with large publically searchable websites, such as those found in ecommerce companies with large product catalogues or companies with active online communities, web crawlers or bots can trigger the creation of many thousands of sessions as they crawl these large sites. Normally crawling sites without relying on cookies or session IDs, these bots can create a session for each page crawled which, depending on the size of the site, may result in significant memory consumption. New in Apache Tomcat 7, a Crawler Session Manager Valve ensures that crawlers are associated with a single session - just like normal users - regardless of whether or not they provide a session token with their requests.

A Relevant Example

One of the roles I play in the Apache Tomcat project is managing the servers which run the two Apache issue trackers we have—two instances of Bugzilla and one instance of JIRA. Not surprisingly, JIRA runs on Tomcat. A few months ago, while looking at the JIRA management interface, I noticed that we were seeing around 100,000 concurrent sessions. Given that there are only 60,000 registered users and less than 5,000 active users any month, this number appeared extremely inflated.

After a bit of investigation, the access logs revealed that when many of the webcrawlers (e.g., googlebot, bingbot, etc) were crawling the JIRA site, they were creating a new session for every request. For our JIRA instance, this meant that about 95% of the open sessions were left over from a bot creating a single request. For instance, a bot requesting 100 pages, would open 100 sessions. Each one of these requests would hang around in memory for about 4 hours, chewing up tremendous memory resources on the server.

The Fix

The goal for the Crawler Session Manager Valve is to ensure that when that same crawler requests those 100 pages, it only results in a single session. To do this, Tomcat uses a regular expression to see if the incoming request is from a known user agent HTTP request header (by default it checks for *[bB]ot.*|.*Yahoo! Slurp.*|.*Feedfetcher-Google.*), and it keeps a note of all the IP addresses those headers came from as well as the last Session ID of that request.

When a crawler first access the site, a new session is created as part of that first request, however upon requesting a second page – the Crawler Session Manager Valve recognizes the crawler from its user agent header, matches it to the IP address and insert the previous session ID into the request. Thus, the crawler only ever opens a single session.

Configuring the Crawler Session Manager Valve

Shipped with Tomcat 7, the Crawler Session Manager is not enabled by default. To turn on the valve, see the valve documentation at

There are two main options for configuring this valve. The first is the crawlerUserAgents property which allows you to specify what bots to look for by their user agent header name. Additionally you can configure the sessionInactiveInterval which specifies how long Tomcat should hold on to the assigned session ID. It is not recommended to hold onto the session ID for more than a couple hours as these bots do tend to change their IP addresses regularly.

The Result

For the site, implementing this valve on the JIRA site alone took the concurrent number of sessions average down from 100,000 to about 5,000. Additionally, there was a significant drop in resource usage on the server, and it is also now relatively simple to monitor from the Current Sessions page what web crawlers are currently active on the site and how many hits they are generating.

Special note: Although JIRA is only certified to run on Tomcat 5 and Tomcat 6, we actually run it on the latest Tomcat 7 release. Running JIRA on Tomcat 7 has not caused any issues which, as an aside, is a testament to how well Tomcat 7 and the Servlet 3.0 specification has been engineered for backwards compatibility.

Mark Thomas is a Senior Software Engineer for the SpringSource Division of VMware, Inc. (NYSE: VMW). Mark has been using and developing Tomcat for over six years. He first got involved in the development of Tomcat when he needed better control over the SSL configuration than was available at the time. After fixing that first bug, he started working his way through the remaining Tomcat bugs and is still going. Along the way Mark has become a Tomcat committer and PMC member, volunteered to be the Tomcat 4 & 7 release manager, created the Tomcat security pages, become a member of the ASF and joined the Apache Security Committee. He also helps maintain the ASF's Bugzilla instances. Mark has a MEng in Electronic and Electrical Engineering from the University of Birmingham, United Kingdom.


How do I run tomcat without any sessions?

We have a webapp that is exclusively for web service calls. There will never need to be any data retained beyond a single request (e.g., each request is completely stateless). So we really have no need to create any sessions at all for this webapp.

Is there a way to run tomcat (7.0.16) so that it doesn't waste any resources at all creating sessions?


Robin D. Wilson

Do we have anything similar

Do we have anything similar for apache server??


8 ball pool I exploit solely premium quality products -- you will observe these individuals on: coc hack

download software mxf

download software mxf converter, convert p2 mxf files from your camcorder. convert mxf files to avi mp4 mov 09 s

C And C Waste Disposal

On this page you can read Pink Bins Rentals calgary my interests, write something special.

Hay Day Cheats

I've proper selected to build a blog, which I hold been deficient to do for a during. Acknowledges for this inform, it's really serviceable!
Hay Day Cheats

BQ Sewer and Drain Cleaning

The best article I came across a number of years, write something about it on this page.
clogged drain cleaning specialist Brooklyn

Home Remedies

It is very good, but look at the information at this address. Natural Home Remedies

Jet Set Limousine

Such sites are important because they provide a large dose of useful information...
town car Miami Florida

Bernardinos Heating Service

wow this good but ,I like your post and good pics may be any peoples not like because defrent mind all poeple....
Air Conditioning Repair Northridge

Love Traction Lines | Obsession Phrases Reviews It is somewhat fantastic, and yet check out the advice at this treat.

Diabetes Free | Weight Destroyer Reviews

weight destroyer program What a good blog you have here. Please update it more often. This topics is my interest. Thank you. . .

Erase Herpes | Fat Diminisher System Review I really appreciate this wonderful post that you have provided for us. I assure this would be beneficial for most of the people.

Fat Diminisher System | Testerone XL Review it's really nice and meanful. it's really cool blog. Linking is very useful have really helped lots of people who visit blog and provide them usefull information.

Moroccan Argan Oil

Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.
argan oil for skin

J & CP Mobile Mechanic

Welcome to the party of my life National City mobile Mechanic here you will learn everything about me.

Grace Bridal Couture

Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.
bridal gown alterations los angeles

Young and Rogers Law P.L.C.

The best article I came across a number of years, write something about it on this page.
elder law attorney in IA

Wij the Magician

Welcome to the party of my life corporate entertainment Toronto here you will learn everything about me.

I think this is an

I think this is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article. Text Your Ex Back pdf


You have raised an important issue..Thanks for sharing..I would like to read more current affairs from this blog..keep design

Mount Pleasant Real Estate

People advocate a great number of web-sites with your blog site My business is eventually left wanting to know how you will previously are able to exploration with more or less everything. I just discovered people write-up responses with and it also seemed to be and so legitimate. This page is really wonderful!
Mount Pleasant

Plastic Surgery SEO

I invite you to the page where plastic surgery marketing see how much we have in common.

T-Flow Plumbing

Your site has grown a one-stop find as much as possible tutorial authoring. Thanks for your time in the very good get the job done. As i tested the online sites one preferred and even On the net which will best-dissertation. pores and skin look though. Ones own get the job done is kind of exemplary.

T-Flow Boston Plumber

You happen to be a real wonderful method to obtain information, My spouse and i speculate the way we employed to get along with no your current websites. My spouse and i has not been informed that will there are countless online. Our card stock essential a new BRITISH ISLES effect and after this I realize where to locate the idea.
Plumber Somerville

guest post service

Anyway I wanted to say that it's nice to know that someone else also mentioned this as I had trouble finding the same info elsewhere.guest post service

Post new comment

This question is for testing whether you are a human visitor and to prevent automated spam submissions.