Andre's Blog
Perfection is when there is nothing left to take away
Variable argument lists on x64

People have been reporting x64 builds of Stone Steps Webalizer crashing on Linux for about a year and even though I could see from the stack trace that the problem related to the variable argument list passed into vsnprintf, I couldn't figure out what exactly was going on because I don't have 64-bit hardware to reproduce this problem in a debugger.

The call stack always ended up in strlen called for a bad string with an invalid address, usually 0x3:

#0 strlen () from /lib/libc.so.6
#1 vfprintf () from /lib/libc.so.6
#2 vsnprintf () from /lib/libc.so.6
#3 vsnprintf_ (buffer, count=29, fmt="%s %s %d", valist)
#4 string_base<char>::format_va (this, fmt="%s %s %d", valist)

The code calling vsnprintf wasn't doing anything special and worked for years for me in 32-bit Windows and Linux environments:

template <typename char_t> string_base<char_t>& 
string_base<char_t>::format_va(const char_t *fmt, va_list valist)
{
   ...
   realloc_buffer(strlen(fmt));
   ...
   while((slen = vsnprintf(string, bufsize, fmt, valist)) >= bufsize) {
      if(!realloc_buffer(bufsize << 1)) {
         make_bad_string();
         return *this;
      }
   }
}

This code allocates the initial buffer to fit the format string, checks if the resulting string didn't fit in the buffer, in which case vsnprintf returns either bufsize or -1 (which is, converted to unsigned int, greater than bufsize), allocates a bigger buffer and tries to format the string again.

Some people worked around this problem using Linux-specific calls (see this forum post for an example), but I wanted to find out what exactly was wrong with the code before adopting a fix.

Yesterday night it dawned on me that if the 64-bit vsnprintf uses va_arg macros, which modify the internal state of the list pointer (i.e. valist), and doesn't reset the pointer after the first call, it will be invalid on the next call to vsnprintf.

Sure enough, the combined length of the arguments exceeded the length of the format string in all cases when Stone Steps Webalizer crashed (e.g. the format string was "%s %s %d" and the arguments for English were "Daily usage for", "June" and 2009.), meaning that vsnprintf was called more then once.

Based on this, I asked somebody to run a quick test for me using this fix:

va_list temp;
va_copy(temp, valist);
while((slen = vsnprintf(string, bufsize, fmt, temp)) >= bufsize) {
   va_end(temp);
   if(!realloc_buffer(bufsize << 1)) {
      make_bad_string();
      return *this;
   }
   va_copy(temp, valist);
}
va_end(temp);

, and it worked just fine. All I have to do now is to compile this code conditionally so other compilers don't trip over va_copy, which seems to be an odd-ball macro. On one hand, it's defined in the ISO C standard along with va_start and va_end, but on the other hand, ISO C++ standard does not list it among C macros included into C++ (ISO-14882, 2003, table 95, page 704).

I would like to thank everybody who pointed out this problem, provided various bits and pieces of information, suggestions and ran tests for me!

Comments:
Name:

Comment: