Andre's Blog
Perfection is when there is nothing left to take away
Mixing up 32-bit and 64-bit code

Back in the days of Windows 3.11, Microsoft provided a special layer that made it possible for 16-bit and 32-bit code to interact with each other.The technique used for such interface is called thunking, which allowed both sides to be blissfully unaware that they are not quite compatible.

You thunk what?

Computer folklore describes thunking as if the compiler thought, or thunk in the lingo of the 60s, of something before it even happened. There are many variations of thunk types, but in a typical 16-to-32-bit thunking scenario, the caller ends up calling a special function that is aware of the actual function and its parameters and converts the actual arguments into a form suitable for the called function. For example, if 16-bit code called LoadLibrary and passed in a far pointer (i.e. a segment and an offset) to the name of the dynamic library to load, the thunk code converted the far pointer into a flat 32-bit address and called the 32-bit LoadLibrary to do the actual work.

While this technique did require extra operations on every call that crossed the 16/32-bit boundary, it allowed otherwise incompatible types of code work together.


With the release of 64-bit Windows Microsoft decided not to continue this practice, so 32-bit and 64-bit libraries may only reside in their own processes and communicate with each other through some form of inter-process communication, such as shared memory, pipes, RPC, etc. Note that this does not mean that the x64 processor cannot run 32-bit code alongside with 64-bit code, within the same process, just that the OS cannot mix 32-bit and 64-bit libraries.


AMD certainly showed Intel when in the year 2000 they introduced 64-bit technology for everyone, as opposed to Intel's strategy to move everybody to IA-64 that lacked backward compatibility with existing x86 applications, even though, arguably, was technically-superior to AMD's approach.

AMD introduced so-called Long Mode, which could be enabled by setting the Long Mode Enable (LME) bit in the Extended Feature Enable Register (EFER). In long mode, the processor evaluated the L and D bits in the code segment descriptor to select a sub-mode for the code segment in question. If the L bit is cleared, the processor is running in compatibility mode and all addresses and operands are the same as they are on a standard 32-bit x86 processor. If the L bit is set, the processor is running in 64-bit mode and addresses are treated as 64-bit values and operands are considered 32-bit in size.

The default address and operand size may be overridden in 64-bit mode by adding an REX prefix, which makes it possible to mix 32-bit and 64-bit instructions within the same code segment. For example, the following code does the same operation on 32-bit and 64-bit operands. The highlighted byte is the REX prefix (0x40) with the W bit (0x08) set:

unsigned int i1 = 0ul;
unsigned int i2;

unsigned long long i3 = 0ull;
unsigned long long i4;

i2 = ~i1;
000000014003CA94 8B 44 24 30 mov eax,dword ptr [i1]
000000014003CA98 F7 D0 not eax
000000014003CA9A 89 44 24 50 mov dword ptr [i2],eax

i4 = ~i3;
000000014003CA9E 48 8B 44 24 38 mov rax,qword ptr [i3]
000000014003CAA3 48 F7 D0 not rax
000000014003CAA6 48 89 44 24 48 mov qword ptr [i4],rax

The runner-up giant

Intel eventually followed AMD and introduced the same architecture, which is currently known as Intel 64. Intel used their own terminology and managed to avoid the phrase Long Mode in their CPU manuals, even though they kept various abbreviations (e.g. LMA for Long Mode Active or LME for Long Mode Enable) or slightly renamed them to look more Intel-like (e.g. IA32_EFER). The new 64-bit mode is called IA-32e.

Missing a few petabytes?

Even though AMD64 architecture interprets addresses as 64-bit values, physical address space is much smaller. The first implementation of AMD and Intel CPUs supports only 40 bits of addressable physical memory, which amounts to 1 terabyte.

You can use the CPUID instruction to learn the maximum physical and linear address for your CPU. If you are using VC++, compile this code in 32-bit mode (VC9 does not support __asm for x64).

__asm {
   mov eax, 80000008h

The result of the CPUID instruction will be stored in the EAX register. The least significant byte will indicate the number of bits in a physical address and the second byte will indicate the number of bits in a linear address. You can examine the AL and AH parts of the EAX register for these values.

In order to prevent applications to use unused address bits for other than addressing purposes, AMD64 introduces canonical addresses, which guarantee that all unused bits are set to either one or zero, depending on the high bit of the largest linear address. An attempt to use a non-canonical address will result in a general protection fault.