H2 performance

The flavor-of-the-month in application architecture seems to be NoSQL databases, in which developers abandon  some of the ACID properties in return for better performance, scalability, and availability.  There’s a lot of diversity in this category, including distributed key-value stores, document databases, graph databases, and data grids.  The primary advantages of these new data stores  (besides the coolness factor) include higher performance, built-in replication, and a schema-less approach that better supports agile development.  The other commonality is that most of the new products are prepared (but not required) to throw out conventional RDBMS standards.  Table joins are out the door and developers are encouraged to de-normalize their data to provide better performance.

I’ve actually been boring colleagues for years now about how we’re too dependent on relational databases.  Now that alternatives are coming out of the woodwork, the contrarian in me wants to make sure we don’t jump to a new solution just because it’s fashionable.   My thought is that maybe performance, easy data replication, and agility can be added to conventional relational databases.

One worthy approach for increasing performance is to use an in-memory database (IMDB).  The idea is to keep an entire database in memory and as close as possible to the application code.  The result should be faster than networked database server.   For read-intensive problems requiring only eventual consistency (which are the domain for NoSQL solutions) an IMDB could serve especially well.

The real advantage in using an IMDB is that we get to keep our conventional database APIs, such as SQL, JDBC, and JPA.  We then have architectural agility, since we can move the database around without changing the code.  In memory databases can usually switch to disk-based or can move into a server configuration. As your requirements evolve, you can upgrade or move the database as needed.  Factor out separate data sources for different data requirements, and you can deploy multiple databases, with IMDBs for the fast GUI data and server-based stores for transactional data.

A recent report by Pawel Plaszczak considers two commercial IMDB products: TimesTen from Oracle and solidDB from IBM.  He found that queries to the IMDBs averaged five times faster than to conventional databases, and that inserts and deletes ran two to four times faster.

I’m sure that TimesTen and solidDB are first rate products, but they’re not cheap.  This makes me wonder what we can accomplish with free software.  There are a number of free IMDBs available, including: H2, HSQLDB, and  Derby.     My immediate question is whether these databases can provide a similar speed-up if they run in memory.

Consider H2, an all-Java database.  The H2 site provides performance statistics for H2 and other free databases.  The two configurations they consider are embedded-in-memory and file-based client-server.  I’ll run some quick-and-dirty tests of my own for these two setups, but also throw in tests on embedded but file-based servers.

Query by PK General Queries Inserts Updates
Embedded, in memory 0.282 ms 45.170 ms 0.286 ms 0.243 ms
Embedded, file-based 0.234 ms 63.946 ms 0.291 ms 0.470 ms
TCP Server, file-based 0.692 ms 121.642 ms 0.509 ms 0.807 ms

Staying in-memory offers a respectable speed-up.   Moving your H2 database from a server configuration to in-memory exhibits a speedup of 2.6 for general queries. Keeping H2 embedded within your application but backing data to a local file is also faster than accessing a remote H2 server. It would be fun to compare H2 to the IBM or Oracle servers, but the licensing agreements on those products specifically preclude publishing benchmarks.

These tests were run against a sample employee table containing 300,000 records.  It’s no surprise that primary key retrievals executed quickly.  Normally you would expect writes to be slower than reads, but in this case insertion just appends data and the updates find records by primary key.  For this reason, inserts and updates are also pretty quick.

The general queries retrieved sets of records based on random combinations of gender, birth date, and hire date.  I created indexes to aid these queries and generated explain-plans to verify that they were kicking in.

Your mileage will vary.   Real performance statistics require real implementations, but it seems fair to say that you’ll see a significant performance advantage to keeping your data close to your application.   An in-memory solution provides a full-strength database, rather than just a distributed key-value store.

You get a performance boost, but get to keep your SQL.


No comments yet

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: