Blogs / Tech Blog

Oracle’s JDBC driver + prefetch == garbage [collection]

The Problem

Recently, we were experiencing major performance problems with loading documents from the database. Profiling did not isolate a single cause; everything (including unrelated, background operations) seemed slow. So, we started logging garbage collection, and found that we were collecting garbage at a rate of 20GB/min!

Profiling revealed that the worst offender, by far, was OracleStatement.prepareAccessors(). Interestingly, it only caused a problem when our result set included a LOB. For such queries, it allocates a 1MB object, regardless of whether the query returned any results at all.

Google searches revealed others who saw similar problems when accessing LOBs, but no solutions other than upgrading or changing drivers. We were already using the latest Oracle JDBC driver, and reverting to earlier drivers did not help. Switching drivers did solve the problem; however, pushing the change to production would require extensive testing to ensure that we were not trading in one problem for another (or more).

I was about to start conducting these tests when John discovered that we were setting the OracleConnection parameter “defaultRowPrefetch” to 1000. This parameter determines how many rows are pulled back from the database on each round-trip, and increasing this value from its default of 10 will normally yield a performance gain. As an experiment, I set the value to 1, and re-profiled memory allocation. The amount of memory allocated by OracleStatement.prepareAccessors() decreased by about three orders of magnitude. Thus, it appears that when a query can return a LOB, Oracle’s JDBC driver allocates approximately “rowPrefetch” KB of memory, even when zero rows are returned.

The Solution

Returning the “defaultRowPrefetch” parameter to 10 did rid us of our garbage collection problems. However, because this is a global setting, reverting it reduced the performance of many other queries which returned many rows with no LOBs. The prospect of setting “rowPrefetch” on a per-query basis was unappetizing, to say the least, but the performance loss was significant. In the end, we altered how we retrieve rows from the database so that the fetch size geometrically increases as we pull results back from the database.

Specifically, the first batch we retrieve contains at most 10 rows, after which we increase the batch size to 20. Once we’ve retrieved 20 more rows, we increase the fetch size to 40, and so on. In this way, we never allocate large amounts of memory for queries which return few (or no) results, but we still quickly ramp up to a large fetch size.

For large queries which returned no LOBs, this solution is still slower than when “defaultRowPrefetch” was 1000. However, the slowdown on those queries was minor, overall system performance was substantially improved, and, importantly, the changes did not require any per-query tuning.

Other Blogs