Why do we need connection pool?
I was working on a task where cron jobs were taking hours to complete, those were long running processes. To reduce the time of execution one way was to have worker threads to accomplish the jobs. I used ThreadPoolTaskExecutor and assigned the jobs, but came to know that after certain time application runs out of database connections. When I looked at the database configurations, found out we had DriverManagerDataSource data source which is not pooled data source. So to resolve this problem, decided to use a connection pool in the application.
The flow of database operation:
Database operation requires a connection. And the flow is like this:
1] Application asks the data source for a database connection.
2] Datasource uses the database driver to open a database connection.
3] Connection is created and TCP socket is opened.
4] Application performs the database operation.
5] Connection is closed and the socket is closed.
This involves opening and closing of connections which are expensive operations. This is extra strain on database management system to open / close connections.
So the best option is to reuse the database connections. This can reduce OS I/O overhead of opening/closing TCP connections.
Why pooling is useful?
Whenever a connection is requested, the data source will use the available connection pool to acquire a new connection. The pool will only create new connections when there are no connections available and a pool is not at maximum size. When the close() method is called on the connection, it’s not closed but returned to the connection pool. So reducing the overhead of creating/opening the new connections and making use of existing connections, connection pools are much faster.
Our criteria to decide which connection pool to go with.
Performance is good, but it should not come at the cost of reliability. And we certainly think that reliability of library is important than performance. We don’t want to end up with the library which is faster than others but not reliable.
To decide the reliability of library, the following things we can consider:
- How widely it is used
- How is the code maintained
- How active the library development is
- Number of outstanding bugs in the library
- Community of developers/users
The performance of the library again is really important criteria. The performance of libraries may vary depending upon how they are configured, the environment in which tests are carried out.
It is important to carry out the tests in the environment which you are selecting the library for.
It is important to look at the features libraries provide. What are parameters they provide, how sensible are the default values of the parameters if we don’t provide them.
It’s important to look the connection strategies:
- Do they reset connection properties when they return to the pool? E.g. auto-commit, isolation level
- Do they support statement caching?
4] Ease of use
It’s also important to know how easy it is to configure (or to mis-configure) the connection pool using the library. It is important to have well documented/updated library.
Connection Pool Libraries considered for analysis:
1] Apache Commons DBCP2:
- Easy to configure. Very detailed configuration page.
- Supports prepared statement caching
- Does not validate connections on borrow
- Does not reset transaction isolation when returns to pool
- Does not reset read only status of connection
- Does not reset auto-commit status of connection
- Does not clear SQL warnings
- DBCP2 compiles and runs under Java 7 only (JDBC 4.1), not sure if it runs on Java 8
- More than 60 classes in the library
- Does not validate connections on borrow
- The code is hosted on GitHub https://github.com/apache/commons-dbcp. The repository has 21 contributors and only 2 contributions in the year 2017. A lot of broken links on home page, java doc links are broken.
- I logged into Apache Bugtracker (JIRA) to check for the number of bugs, there are 11 open and reopened bugs.
- Licensed under Apache 2.0
- C3P0 is one of the oldest and best-known libraries. It is very well documented. However, according to some reports, it’s easy to misconfigure the pool and can result in poor performance and deadlocks.
- Comes in package with hibernate.
- Works well in the heavy concurrent environment.
- Supports prepared statement pooling
- Resets auto commit status when return to pool
- Resets read the only status when return to pool
- Resets isolation level when returning to pool
- Does not test connections at getConnection()
- Around 200 classes in the library. Huge code base compared to others.
- The code is hosted on Github -> Repository, the library has 5 contributors with 5 contributions to the code this year, 4th May had the last check-in.
- Issues are tracked on Github tracker, and has 41 issues listed.
- Library is licensed under LGPL v.2.1 and EPL v.1.0
3] Tomcat JDBC:
- Supports highly concurrent environments and multi core/CPU systems.
- It is very well documented here.
- Async connection retrieval. Can queue connection requests and return Future back.
- Ability to configure custom interceptors to enhance functionality.
- Provides multiple ways to configure pool. One is inside the Apache container, JMX and standalone by using DataSource bean.
- Does not by default reset auto-commit and transaction levels for connections in the pool, users must configure custom interceptors to do this.
- Does not by default test connections on getConnection()
- Does not close abandoned open statements.
- Pool prepared statement properties are not used.
- The code is hosted on GitHub. It has 13 active contributors.
- This is a very active community, one can see looking at the commits and bug list.
- LLicensedunder Apache 2.0
- Very light library, around 130 kb.
- Tests connections at getConnection()
- Tracks and closes abandoned connections.
- Clears connections before returning connection to client.
- Resets auto-commit, transaction isolation, read-only status
- Library has 58 active contributors and has very detailed configuration page with performance benchmarks.
- Library has 33 issues listed on github
- Licenced under Apache 2.0
Tests carried out with different pools:
The test results may vary with the configurations and environment they are conducted on.
I ran these tests on local developer environment with initial pool size of 10 and max size of 50.
1] I tried different connection pools and took some readings by calling database method 1000 times. Below is the table of results.
|Pool & Readings (in ms)
2] Carried out tests by opening and closing the database connection thousand times. Below is the table of results.