Andre's Blog • I asked for a count, not a life story!

I asked for a count, not a life story!

Posted Wed, 01 Oct 2008 08:00:20 GMT in Stone Steps Webalizer by Andre

One of the main reasons I switched the state file to use Berkeley DB was to allow Stone Steps Webalizer generate reports without loading the entire monthly data set into memory, which may be hundreds of megabytes in size for high traffic web sites and proxies. Once this was implemented, report generators just had to open the database and traverse a few records at the top of each table, such as hosts or URLs, in order to generate a relevant report, which took almost no time. Soon, however, I noticed that generating top-x reports using a 600+ MB database takes minutes and so much memory, as if the entire database was being read.

Upon closer investigation, it turned out that calling the Db::count() Berkeley DB method does a full scan of the relevant table inside the state database, consuming much memory and making hard drive go nuts, churning hundreds of megabytes of data in the process. Berkeley DB offers a convenient fast count, which turned out to be just a cached value that cannot be used for any purpose other than an estimate of how many records used to be in the database at some unspecified time.

I made a post in Oracle forums, describing the problem:

http://forums.oracle.com/forums/thread.jspa?threadID=709777&tstart=0

Oracle confirmed the problem and promised to consider the approaches I suggested. Since there was no immediate fix, I replaced most calls to Db::count() with application-specific counters, such as the total URL or host counts, but had to leave some calls alone because corresponding application values, such as hosts indexed by transfer amounts, are not tracked by Stone Steps Webalizer.

However, removing even a few Db::count()calls improved report generation so much that I decided to include this fix in the upcoming October release. Hopefully, Oracle will address this deficiency soon.

Comments: