More Stupid x86 Assembly Tricks

A couple of weeks ago I went hunting for a better way to compute x!=0 on x86. Eventually, I came up with a cute carry-flag trick and blogged about it.

(Note: I’m not branching on this comparison — that would be easy. Instead I want the value of the comparison in a general-purpose register. I should have made this explicitly clear in my original post. Alas, I did not. Doh.)

My goal was to avoid using setcc, because partial-register writes are the devil.

Try as I might, I couldn’t imagine a way generalize my solution so that it would also work for x==0. Someone suggested that I try the GNU superoptimizer (PDF, code), so I did.

At first I was a bit disappointed that the superoptimizer didn’t discover my sequence for x!=0. I think, maybe, the cost heuristics are outdated. (It should model xor reg,reg as being really cheap†.)

Turns out that the superoptimizer is still really clever anyway. It was a source of some great ideas. I’m delighted with what “we” came up with for x==0:

Old method; naive and literal:

85 c9           test     ecx, ecx
0f 94 c0        sete     al
0f b6 c0        movzx    eax, al

New method:

31 c0           xor      eax, eax
83 f9 01        cmp      ecx, 1
11 c0           adc      eax, eax

Once again the new method avoids the setcc and thus avoids insert semantics. As a nice bonus, we save a byte of code.

†This doesn’t actually depend on the input register at all. It’s essentially a “load-zero” instruction. Modern processors understand this and schedule accordingly.

0 Responses to “More Stupid x86 Assembly Tricks”


  1. No Comments

Leave a Reply




Creative Commons Attribution-NonCommercial 3.0 United States
Creative Commons Attribution-NonCommercial 3.0 United States