Looking for thoughts on HTML reports

HTML reports generated by Stone Steps Webalizer didn't change much structurally since I forked the original project back in 2004. Current HTML reports use CSS styles wherever possible and have some JavaScript niceties, such as rendering charts in JavaScript and, less-known, jumping between reports with Ctrl-Alt-Up/Down, but otherwise they remain the same monthly one-page reports with pre-formatted all-items sub-reports and a single collective index report.

Bitbucket fishing

A few months ago Bitbucket dropped the Stone Steps Webalizer project repository because Mercurial was no longer supported. Along with the source repository Bitbucket also dropped Wiki, issues, downloads and some configuration, like components and milestones, even though these didn't have anything to do with Mercurial.

I moved the project to another hosting service, but about a week ago I needed to follow up on one of the past Bitbucket issues and noticed that some images from the supposedly deleted repository Wiki were still accessible in Bitbucket cloud, which made me think that my repository was just disabled, but not deleted.

Kicking the Bitbucket

A while back now Bitbucket informed me that Mercurial repositories are no longer supported and will be disabled as of Summer of 2020. In my mind of a software developer a repository is just where the source code is kept, so given that Stone Steps Webalizer is now mostly in maintenance mode, I decided to let the Mercurial repository be disabled at Bitbucket and use it just for issue tracking, Wiki and downloads. Soon I learned that Bitbucket and I have different definitions of what a repository means.

SSW, subway edition

It has been over a year since I released the last version of Stone Steps Webalizer. The main reason being total lack of time - the other projects I'm involved in keep me busy day-to-day. However, I was not just about to give up on SSW, so a about a month ago I started looking for ways to continue the development. After some thinking, it dawned on me that every day I'm wasting about 40 minutes on the subway and I began looking for a netbook.

Variable argument lists on x64

People have been reporting x64 builds of Stone Steps Webalizer crashing on Linux for about a year and even though I could see from the stack trace that the problem related to the variable argument list passed into vsnprintf, I couldn't figure out what exactly was going on because I don't have 64-bit hardware to reproduce this problem in a debugger.

The call stack always ended up in strlen called for a bad string with an invalid address, usually 0x3:

A look back and plans for 2009

I made nine releases of Stone Steps Webalizer in 2008. The most notable feature added in 2008 was XML/XSL reporting, which gives website administrators full control over generated HTML. About six thousand people downloaded various number of copies in 2008.

One of the challenges of 2008 was lack of funding - not a single donation was contributed to help the project in 2008. Hardware, some commercial software and co-location are not cheap and I hope to see more support in 2009.

Time to think about new features. Here is what I have in mind, ordered by priority. If you think something is missing, leave a comment or start a discussion thread in the forums.

Viewing all items in XML reports

Those who tried XML reports have noticed that there are no links at the bottom of the reports if the number of the items, like hosts or referrers is greater than the configured top number of items. The reason for this is that, unlike with HTML reports, it does not make much sense to generate the same XML data twice (i.e. once in the top items report and another time in the report listing all items). I have been experimenting with various approaches to this problem and finally have found a solution I like.

Moving Linux installation to new hardware

I finally decided to abandon the old 700 MHz box I was using as a CVS repository and to do Fedora builds of Stone Steps Webalizer. The replacement machine was not new, but I just cannot complain about a 2.8GHz CPU and extra storage! Before this weekend, I never restored a Linux backup onto new hardware and I learned a thing or two about Linux in the past couple of days.

Flash charting - not too flashy

Original Webalizer PNG graphs became quite small when viewed on a high-resolution screen, which is pretty much any screen nowadays and are not very easy to work on due to a lack of a layout engine. Poor antialiasing in the underlying GD library does not help quality either. Being able to produce better graphs was one of the reasons I added XML reports to Stone Steps Webalizer. Last couple of months I was mostly working on making sure that it's easy to use XSL templates that will be included into the Stone Steps Webalizer package with various Flash charting packages.

I asked for a count, not a life story!

One of the main reasons I switched the state file to use Berkeley DB was to allow Stone Steps Webalizer generate reports without loading the entire monthly data set into memory, which may be hundreds of megabytes in size for high traffic web sites and proxies. Once this was implemented, report generators just had to open the database and traverse a few records at the top of each table, such as hosts or URLs, in order to generate a relevant report, which took almost no time. Soon, however, I noticed that generating top-x reports using a 600+ MB database takes minutes and so much memory, as if the entire database was being read.

A blast from the past

A couple of weeks ago I was looking at the Stone Steps Webalizer website stats and noticed a sharp drop in visits. Usually that would mean that there was something wrong with the infrastructure, but taking a closer look at the server and the network I couldn't find anything that would point to the problem. The next day was similar, which made me wonder what kind of a world event has happened that drew traffic away off my site.

Mangling user agents is a good thing!

User agent strings come in all shapes and sizes and showing full user agent strings in reports results in too much fragmentation, as every little detail, such as a service pack or a minor version change results in a new user agent string in the report.

MangleAgents is a configuration parameter that has been around for a while and is designed, despite its name, to tidy up user agent strings and leave only those parts of the user agent string that are interesting from the analysis point of view.

XML Reports in Stone Steps Webalizer

Generating reports in XML has been on my list of things to do for a while and I finally got around to work on it. One might ask, what is so significant about XML and why would an average webmaster be interested in them? Good question.

XML and related technologies provide a neat and powerful way to separate what reports contain, such as hit and visit counts or a list of hosts and URLs, from how reports are presented. As simple as this sounds (and, may be, cryptic to some readers), this separation is the basis for better-looking and much more customizable reports.

SSE2 - not all gold is good for you

A few users notified me that SSW won't run on some of the AMD and Intel processors. After looking at the crash dump submitted by one user, I figured that the culprit was one of the SSE2 instructions, like this one:

movsd   xmm0,mmword ptr [webalizer!_real (0045e290)]

I decided to make a special build, so that people can run SSW on older architectures and spent some time last weekend creating new build configurations. Once I was done, I wanted to check how much slower SSW would run with SSE2 disabled and ran a small test.

Hard drives die in spring

Well, it happened again - the hard drive in my good old RedHat machine got corrupted. Strangely enough, previous failures also ocurred around this time in the previous few years. One of the failures was so bad that I had to buy hard drive recovery software to salvage my CVS repository. This time it seems there are 220+ bad blocks (about 1MB, at 4K/block), but most of the content is still accessible, although I still don't know the extent of the damage.

I have to say that I miss Windows' chkdsk, which not only reports bad blocks, but also the names of files or directories affected by the damaged blocks. e2fsck, on the other hand, comes up with pretty cryptic messages, such as "Attempt to read block from filesystem resulted in short read" or "...Force rewrite(y)?". I'm also more careful this time and so far haven't allowed e2fsck to auto-fix the partition - the last time I did this, it cost me the entire hard drive.