When you're tuning an application, a good place to start is database connection pooling. It's not glamorous, but it often gives you a quick performance win.
What is a database connection pool, I hear you ask?
Now, if your application is Java-based, you're spoiled for choice with connection pools - c3p0, BoneCP, HikariCP, DBCP, Proxool. Which one should you choose?
Our client had started out using c3p0, for no particular reason. However, they were seeing some issues with the performance of their systems that they couldn't explain. The symptoms pointed towards the connection pool. This gave us an opportunity to look at the performance of 3 of the most popular Java connection pools.
The simplest thing to change about your connection pool is its size. If there aren't enough connections to go around, then users have to queue until a connection becomes available (many connection pools don't actually use queues - they support "queue barging" for efficiency reasons - but that's another story). However, that didn't seem to be the problem, at least most of the time. The connection pool was configured with 200 connections, and normally there were only 5-10 in use at a time. However, occasionally we'd see the number of connections shoot up to 200, with a corresponding drop in performance.
Normally, the system would respond in a fraction of a second, but we were seeing these spikes, where it could take 10 seconds or more. What was going on?
My first thought was that it might be a bug in c3p0. Monitoring data showed that c3p0 was mostly waiting around at that time, when surely it should have been doing something? I decided to try a different thread pool, to check. Fortunately, the application used Spring, which meant it was possible to swap-in a different thread pool with a few changes to some XML.
So I picked BoneCP. I'd used it on a different project, and it had proved reliable, so I thought I'd give it a try here. I changed the connection pool and re-ran the test:
That looks better. There are still some spikes, but they're a lot smaller - a second and a half, at most.
But that still doesn't tell us what was wrong with the c3p0 results, so I dug a little deeper. In fact, it wasn't a bug in c3p0, but a "feature". You see, c3p0 tries to avoid making the user wait, whenever possible. To this end, it has some "helper" threads, that do routine maintenance on the connections (check that they're still working when they're checked in, recycle old ones, open new connections if needed). However, by default it only has 3 helper threads. And if the helper threads get behind on their work, it starts queueing up, which puts them further behind... and so on.
Fortunately, these helper threads are configurable. So, I increased the number of helper threads from 3 to 10, and re-ran the test:
Even better still! We don't have any spikes now.
Out of curiousity, we also tried one other connection pool: HikariCP. HikariCP's relatively new, but its author has been making some bold claims about its performance, so we thought we'd see how it did:
You could be forgiven for thinking I'd accidentally uploaded the c3p0 results twice. They certainly look very similar. We see this even more clearly if we plot all the results on a percentile graph:
As you can see, it's pretty much a dead heat between c3p0 and Hikari.
But how does any of this apply to me?
Now, these tests were undoubtedly useful to the client, they may also be of more general interest. There's a lot of debate amongst the various connection pools about whose is best, and they all have benchmarks that prove theirs is fastest. But, these are microbenchmarks against unrealistic applications. Our tests, on the other hand, were run against an existing multitier enterprise app, with all the usual baggage (Hibernate, Spring, XML, downstream systems) that comes with a real system (albeit with stub downstream systems, and a local HSQLDB database).
So which of them is actually best?
We saw some response time spikes in the BoneCP results, but I suspect they're just a fluke. I haven't been able to get to the bottom of what causes them, but I've occasinally seen similar patterns in tests using the other pools, and I suspect they're related to garbage collection (the tests used the G1 garbage collector, which is sometimes a bit quirky). Outside of the spikes, performance was very similar to the others.
C3p0 was joint fastest once we'd got helper threads configured correctly. Still, it's not encouraging that an obscure config setting could have such a dramatic impact on application stability. It's telling that Hikari also uses helper threads, but tunes them itself on-the-fly.
And then there's Hikari. It certainly seems to be as fast as they say, with minimal configuration. But bear in mind that it's still a relatively young codebase. It's missing a couple of features that the others support, such as statement caching (although this was not enabled during our tests for any of the pools), but it also has some useful features like (mostly) fair queueing.
Overall, I think BoneCP and HikariCP come out of this looking good. BoneCP seems to be slightly more intricate than HikariCP, for better or for worse. However, c3p0 is just too easy to misconfigure, so I don't trust it.
We didn't look at other pools, like Tomcat pooling, DBCP or Proxool. Perhaps this was a missed opportunity (Tomcat supports fully fair queueing, and I've heard DBCP has upped their game recently). But since the client is now happily using their new c3p0 settings, our work is done.