Andre's Blog
Perfection is when there is nothing left to take away
64-bit optimization gone wrong

I was working on a 64-bit VC8 project and the executable built in release configuration kept crashing at run time with evident traces of stack corruption. A debug build or even a release build with all optimizations disabled worked fine and so did a fully-optimized 32-bit build.

The problem with debugging optimized release builds is that many variables are passed through registers and stack frames are omitted, making it more difficult to trace parameters and local variables. Debugging x64 builds is even more difficult because the optimizer heavily uses additional 64-bit registers and hardly puts any parameters on stack.

Eventually, I traced the problem to one local variable that seemed to contain bizarre STL strings of enormous length, which would indicate buffer overflow of some kind. Once I found the address of the STL string that was consistently being overwritten, I thought I'm almost done - all it takes in such case is to set up a data breakpoint that would fire when a good string went bad and that would give me the location of the code trashing the STL string. Not so fast - the x64 debugger crashed the entire IDE every single time this data breakpoint was hit.

At that point I had to resort to the good old divide and conquer method, stepping over method calls and restarting the debug session once the location I was monitoring was trashed. This was a fairly long process, but eventually I zeroed in on the std::vector::resize(size_t) call, which apparently was triggering the problem.

The resize method is not very efficient in STL and creates a temporary to initialize new elements while resizing the vector (ISO 14882:2003, p.492):

void resize(size_type sz, T c = T());

More optimal approach would be to pass in an optional pointer, so that the source object could be reused, if.

Microsoft implemented the resize method as two separate methods, probably to avoid using the default argument:

void resize(size_type _Newsize)
{   // determine new length, padding with _Ty() elements as needed
   resize(_Newsize, _Ty());
}

void resize(size_type _Newsize, _Ty _Val)
{   // determine new length, padding with _Val elements as needed
   if (size() < _Newsize)
      _Insert_n(end(), _Newsize - size(), _Val);
   else if (_Newsize < size())
      erase(begin() + _Newsize, end());
}

The x64 optimizer eliminates an extra call and simply jumps from the first resize method to the next one. This is where the bug creeps in. The _Insert method in VC8 creates another temporary (Microsoft must be thinking that creating a temporary is a no-op). These two temporaries and the call eliminating optimization are the key to this crash.

The optimizer tries to predict how much stack all subsequent calls require after the first resize and subtracts the stack pointer to accommodate all following stack allocations. The first temporary is allocated at the top of the stack (remember that stack grows towards smaller addresses), which creates this picture right after the first temporary is created, but before the second resize is called:

┌──────────┐   <- top of the stack
│//////////│   <- first temporary
│//////////│
│//////////│
├──────────┤
│          │   <- uninitialized
│          │ 
│          │ 
│          │ 
├──────────┤
│          │   <- 1st resize return address
└──────────┘

After the first temporary is created, optimized code jumps to the second resize call. That call creates the second temporary in the uninitialized space of the stack. However, the optimizer miscalculates offsets and the temporaries overlap, which causes the values at end of the first temporary overwritten with arbitrary values at the beginning of the second temporary:

┌──────────┐   <- first temporary
│//////////│ 
│//////////│ 
├──────────┤   <- second temporary
│XXXXXXXXXX│   <- overlapped temporaries
├──────────┤   <- end of the first temporary
│\\\\\\\\\\│ 
│\\\\\\\\\\│
├──────────┤   <- end of the second temporary
│          │ 
├──────────┤
│          │   <- 1st resize return address
└──────────┘

After finding the cause of the bug, it was fairly easy to work around it - eliminating the first resize call by providing explicit initialization value prevents this botched optimization:

std::vector<object_t> v;
resize(10, object_t());

Of course, this work-around is not guaranteed to work in all cases because the optimizer may still consider this sequence for optimization, but it worked in the particular project I was working on.

I reported this issue to Microsoft as a bug #346455. Hopefully they will fix it some time soon. If you came across the same problem, make sure to add your vote to the bug.

August 1st, 2008

Given their answer at connect.microsoft.com, which said:

Sorry, we've been swamped for quite some time. Yes, we can reproduce the bug, and it's on my queue to fix. I haven't had any time to spend on it yet, however

, I called Microsoft earlier in July to make clear the severity of the bug. After a couple of conversations and some paperwork, Microsoft informed me that they started working on a fix.

I have to say that out of all of my conversations with various tech support departments (especially Intel's, which proved to be technically incompetent), Microsoft comes out on top after you manage to get their attention.

Comments:
Posted Fri Jun 18 06:16:31 EDT 2010 by Wayne

Do you know if there was ever a hotfix or any other fix for this? We seem to have run into the same issue with VS2008 SP1. There seems to have been no follow up on the bug that's visible.

Posted Fri Jun 18 08:00:23 EDT 2010 by Andre

Microsoft has issued a hotfix for me, which effectively disabled the compiler optimization that caused this bug. I'm surprised that it's still reproducible. I will check my test case when I have a moment and update this thread within a couple of days whether I still can reproduce the bug or not. You may want to call Microsoft in the meantime and refer to the bug number I provided.

Name:

Comment: