Cache effects

Quick call out to an illustrative blog entry on various cache effects.

http://igoro.com/archive/gallery-of-processor-cache-effects/

When someone bugs me about “X is too slow, because it has to make a virtual call”, and I get annoyed, it’s because a hot virtual call is an overhead of some dozen cycles or so.  Missing the cache?  In the thousands.  Don’t get me wrong, virtual calls can matter for many reasons, but that all flies out the window the moment you’re working on any non-trivial sized data set.  If your objects are 100′s of bytes large, you don’t worry about the virtual calls, you worry about shuffling their member slots around to squeeze more out of your caches.

“It better not allocate”

My other favorite perf quote from this month: “It better not allocate — this call needs to take 100 microseconds or less”.  On my dev box, on the default Win7 heap, an uncontended small allocation (and the pairing free) is 120 ns — or 0.12 microseconds.  My personal favorite small object allocator can hit down to 0.020 microseconds sustained.

We could allocate thousands of objects per call and still come in under budget.

On the value of consistent API design

Raymond Chen writes in “We’re using a smart pointer, so we can’t possibly be the source of the leak“.  The most immediate cause is a subtle misuse of CComPtr, using operator= which performs an AddRef on a return value that has already been AddRef’d, leading to one AddRef too many.

The less immediate failure was a poorly designed API.

Well, not poorly designed.  Unfortunately designed.

Continue reading