I'm coping with a problem for more than two years now. We even hired a specialist who were unable to pinpoint what the problem was.
After a random period of time (may be 2 hours or 2 weeks), Tomcat simply stops responding. The memory utilization seems pretty normal, CPU is high (100%). We get full of "Socket write error" in the logfiles but this seems to be due to the fact that endusers keep hitting reload or stop button on their browser because of the slow /no response of the server.
We use mod_jk within Apache to connect to Tomcat. When this happens, we get a "Service temporarily unavailable" on the webpage which is fired by Apache when it can't connect to Tomcat.
The only way we found out to make this work again is to reboot the whole server! Even when restarting Tomcat or Apache HTTP server, Tomcat keep using 100% CPU and no requests are serviced. We noticed, when this happens, that when opening a webpage directly on the failing server (a webpage of www.google.com for instance), the page doesn't show up correctly, some elements (images) are missing. When hitting "reload", the images are loaded, if hitting "reload" again "images" are missing again and so on.
I performed a Thread Dump when the server was in this kind of situation and didn't find anything of interest. The only thing that differs from a Thread Dump in a regular behaviour is that all TP-Processor threads are in "Object.wait()", locking a ThreadPool$ControlRunnable and waiting a ThreadPool$ControlRunnable. Is that a deadlock? I didn't find anything useful about ThreadPool$ControlRunnable.
We have splitted our web applications on other servers, used other configurations (mod_proxy), virtualized the servers, unvirtualized the servers, used Linux, used Windows... I am lost..
Here is our actual configuration:
The good news is that problems like this are not that hard to figure out what is going on.
The first symptom you see is the 100% CPU usage. Your first step would be to narrow that down.
Threads in Object.wait() are not spinning the CPU. So it is not that. However, you don't have to guess what thread is using the CPU. There are tools out there that can tell you. On Linux, for example, you can simply look this up in "top" (as just one example)
That's where I would look, but before you spend another year trying to guess, I would upgrade your Tomcat server. 6.0.10 is extremely old, it is also early in the 6.0.x series of Tomcat, so yes, it will have several bugs in it, and you could be hitting one of them.
After that, track down the thread that is causing the jump in CPU and you can move on from there.