a tech journey

Connection Pool Analysis

Advertisements

Why do we need connection pool?

I was working on a task where cron jobs were taking hours to complete, those were long running processes. To reduce the time of execution one way was to have worker threads to accomplish the jobs. I used ThreadPoolTaskExecutor and assigned the jobs, but came to know that after certain time application runs out of database connections. When I looked at the database configurations, found out we had DriverManagerDataSource data source which is not pooled data source. So to resolve this problem, decided to use a connection pool in the application.

The flow of database operation:

Database operation requires a connection. And the flow is like this:

1] Application asks the data source for a database connection.

2] Datasource uses the database driver to open a database connection.

3] Connection is created and TCP socket is opened.

4] Application performs the database operation.

5] Connection is closed and the socket is closed.

This involves opening and closing of connections which are expensive operations. This is extra strain on database management system to open / close connections.

So the best option is to reuse the database connections. This can reduce OS I/O overhead of opening/closing TCP connections.

Why pooling is useful?

Whenever a connection is requested, the data source will use the available connection pool to acquire a new connection. The pool will only create new connections when there are no connections available and a pool is not at maximum size. When the close() method is called on the connection, it’s not closed but returned to the connection pool. So reducing the overhead of creating/opening the new connections and making use of existing connections, connection pools are much faster.

Selection criteria:

Our criteria to decide which connection pool to go with.

1] Reliability

Performance is good, but it should not come at the cost of reliability. And we certainly think that reliability of library is important than performance. We don’t want to end up with the library which is faster than others but not reliable.

To decide the reliability of library, the following things we can consider:

2] Performance

The performance of the library again is really important criteria. The performance of libraries may vary depending upon how they are configured, the environment in which tests are carried out.

It is important to carry out the tests in the environment which you are selecting the library for.

3] Features

It is important to look at the features libraries provide. What are parameters they provide, how sensible are the default values of the parameters if we don’t provide them.

It’s important to look the connection strategies:

4] Ease of use

It’s also important to know how easy it is to configure (or to mis-configure) the connection pool using the library. It is important to have well documented/updated library.


Connection Pool Libraries considered for analysis:

1] Apache Commons DBCP2:

2] C3P0:

3] Tomcat JDBC:

4] HikariCP:

Tests carried out with different pools:

The test results may vary with the configurations and environment they are conducted on.

I ran these tests on local developer environment with initial pool size of 10 and max size of 50.

1] I tried different connection pools and took some readings by calling database method 1000 times. Below is the table of results.

Pool & Readings (in ms) Reading 1 Reading 2 Reading 3
DBCP2 54242 48204 51760
C3P0 59204 60008 56360
HikariCP 49797 39998 46658
Tomcat CP 51156 49734 49734

 

2] Carried out tests by opening and closing the database connection thousand times. Below is the table of results.

Pool & Readings (in ms) Reading 1 Reading 2 Reading 3
DBCP2 455 359 209
C3P0 139 156 130
HikariCP 173 13 144
Tomcat CP 54 71 32

 

3] Took some readings on developer tool to monitor the response time of API (To fetch all organizations and their users).

  • Readings with Tomcat JDBC CP

  • Readings with DBCP2

  • Readings with C3P0

  • Readings with HikariCP

Configurations used for performance testing:

1] Tomcat JDBC

2] DBCP2

3] C3P0

4] HikariCP

Final thoughts:

After going through all the resources, tomcat JDBC and HikariCP seem to be reliable and faster than other two connection pools. Both the communities are very active. Their codebase is smaller than other two. DBCP2 once was not active at one point of time but came out of dormancy later, not sure how reliable it is. Stars and forks on HikariCP and Tomcat repositories are more than that of other two connection pools.

Online resources:

http://www.trustiv.co.uk/2014/06/battle-connection-pools

https://blog.johnhagler.com/2016/08/13/pooling-mysql-jdbc-connections/

https://techblog.topdesk.com/coding/choosing-a-database-connection-pool/

https://vladmihalcea.com/2014/04/17/the-anatomy-of-connection-pooling/

https://www.scribd.com/document/111649850/Connection-Pool-Compare

http://brettwooldridge.github.io/HikariCP/ludicrous.html

https://github.com/brettwooldridge/HikariCP/wiki/%22My-benchmark-doesn%27t-show-a-difference.%22

https://github.com/brettwooldridge/HikariCP/wiki/Pool-Analysis

https://www.wix.engineering/single-post/how-does-hikaricp-compare-to-other-connection-pools

https://www.scribd.com/document/111649850/Connection-Pool-Compare

https://stackoverflow.com/questions/520585/connection-pooling-options-with-jdbc-dbcp-vs-c3p0

Advertisements

Advertisements