380 Picoseconds

Please excuse me while I toot my own horn. Take a look at this:


   C:\latency >run
   latency
   imul    : 57 - 53 = 4
   lea shl : 56 - 53 = 3
   just lea: 55 - 53 = 2
   just shl: 54 - 53 = 1

That’s right, bitches, I am dynamically measuring the latency of a single x86 instruction — accurate down to one cycle! That’s ~380 picoseconds on my hardware.

This is really hard (impossible?) to do without a serializing read time-stamp counter instruction.

3 Responses to “380 Picoseconds”


  1. 1 AMDFan

    That’s cool - which CPU has this instruction? Is this on real hardware that you are running at Microsoft?

  2. 2 Mark

    Everything since K8 rev F has had it. Check out RDTSCP in the AMD Architecture Programmers Manual, volume 3:
    http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24594.pdf

  3. 3 AMDFan

    Cool!

Leave a Reply




Creative Commons Attribution-NonCommercial 3.0 United States
Creative Commons Attribution-NonCommercial 3.0 United States