Consider the following C code:
int int_to_bool( int i )
{
return i == 0 ? 0 : 1;
}
If you run this thru your favorite x86 compiler, there’s a good chance you’ll see either a setcc instruction, or a cmovcc instruction. The latter was added in Pentium Pro, and thus cannot always be generated. The former has the unfortunate requirement that it write a byte register destination, which invokes the dreaded “insert semantics” of x86. †
Here’s a sequence I dreamed up that avoids these problems (input in ecx):
33 DB xor eax,eax
3B D9 cmp eax,ecx
13 DB adc eax,eax
In this case I’m using cmp in such a way that operand order is important. I need to subtract my input from zero. (The compare instruction is just a subtract which only sets flags.) As it turns out, this sets the carry flag to exactly the answer I want to return. All that’s left is to extract the carry flag, and the quickest way to do that is to perform add-with-carry into a zero.
In summary: 6 bytes of code, 3 simple ubiquitous ALU instructions and — best of all — no merge or partial register stall issues.
Microsoft’s C compiler also does this kind of superoptimzer-inspired bit-twiddly magic for integer absolute value.
† The x86 general-purpose registers are divided into sub-registers of varying widths. The ax register, for example, refers to the low 16-bits of the eax register. Similarly al refers to the lower byte. When writing to these sub-registers the upper bits of the register are preserved. This can hurt performance on a modern dynamically-scheduled processor. For more detail:
- x86 architecture — Wikipedia
- AMD’s Software Optimization Guide (Section 4.8: Partial-Register Writes)
- Intel’s Optimization Reference Manual (Section 3.5.2.3: Partial Register Stalls)

What kinds of improvements are we talking about?
I’ll let you know when I implement it :)