Quick call out to an illustrative blog entry on various cache effects.
http://igoro.com/archive/gallery-of-processor-cache-effects/
When someone bugs me about “X is too slow, because it has to make a virtual call”, and I get annoyed, it’s because a hot virtual call is an overhead of some dozen cycles or so. Missing the cache? In the thousands. Don’t get me wrong, virtual calls can matter for many reasons, but that all flies out the window the moment you’re working on any non-trivial sized data set. If your objects are 100’s of bytes large, you don’t worry about the virtual calls, you worry about shuffling their member slots around to squeeze more out of your caches.
“It better not allocate”
My other favorite perf quote from this month: “It better not allocate — this call needs to take 100 microseconds or less”. On my dev box, on the default Win7 heap, an uncontended small allocation (and the pairing free) is 120 ns — or 0.12 microseconds. My personal favorite small object allocator can hit down to 0.020 microseconds sustained.
We could allocate thousands of objects per call and still come in under budget.