Concepts: Typeclasses for C++?

I’ve had a hypothesis for a while that C++ templates (paired at times with ADL) are an ad-hoc, unsound version of typeclasses. I’ve seen this hold for parser combinators, range base algorithms, and more. I’m also not the first to draw this comparison[1].

Concepts are supposed to bring soundness in through constrained templates. Concepts look awfully a lot like type classes; they export functions and types, and are parameterized, and act as constraints on generic functions and other concepts. I checked the draft specification [2], and it even seems to permit parameterizing concepts on type constructors, just like Haskell! (er, C++ calls them class templates, not type constructors. to-may-to, to-mah-to)

But I worried I may be mistaken about concepts, as I’ve searched through google and literature and have yet to find a single example in literature demonstrating the use of template template concepts.

In case you’re curious what this might look like, here’s an educated guess:

concept Monad<template <> class m> {
  template<typename T, typename U>
  m<U> mbind(m<T>, function<m<U>(T)>);
 
  template<class T>
  m<T> mreturn(T);
};
 
template <template <typename> class m,
          class T,
          class U,
          class Iter>
requires M<m>
requires InputIterator<Iter, U>
m<T> foldM(Iter begin, Iter end, T i, function<m<T>(T, U)> f)
{
    if(begin == end)
        return mreturn(i);
    else
        return mbind(f(i, *begin), [=](T result){
            Iter next = begin;
            ++next;
            return foldM(next, end, result, f);
        });
}

Have template template concepts been covered thoroughly somewhere, and I’ve just missed it?

Return-code vs. Exception handling

(Originally authored Feb 2008.  This is a revision of an older post from before I began blogging on the internet at large.  It’s been edited for style, not content)

This is one of those “religous wars” in programming language theory; return code handling (RCH) vs exception handling (EH).  Firstly, I am biased.  Secondly, I will try to remain objective as well, partitioning observations of code using these mechanism from how I feel this reflects on the utility of either mechanism.  I tend to _prefer_ EH in general, but not universally, and generally follow the viewpoint that “exceptions are for exceptional things”.  With that aside…

Raymond Chen wrote an article a while back on RCH vs EH[2]. There, he points out accurately some facts about code review of RCH and EH, concluding that it’s generally harder to conclude the failure-safety of some example the RCH code than some comparable EH code.

He’s right.  And he’s wrong.  His article is incomplete.  Lets start with defining the playing field.

Continue reading

CLOS circa 1987, Dynamic Dispatch in C++/C#

As reported by programming musings,

http://www.archive.org/details/DanielGB1987

Common Lisp Object Standard presentation, by Daniel G Bobrow.

Some reflections:

Fast Multiple Dispatch

"[for dynamic method resolution] A cache of 1000 entries is hit 98% of the time".  Assuming this is accurate, it explains how you could get by with the cached virtual dispatch lookup that the desktop Microsoft CLR runtime does – while not exactly what Bobrow means, 1000s of cache locations spread through the program can get a hit rate fast enough to compensate for the slow resolution path that is taken.

Fastpath Optimization

"What we learned from smalltalk is … something like 60%-70% of generic functions have exactly one method implemented".  I’ve been seeing discussions lately about fastpath optimization in dynamic languages.  The technique is to take advantage of the fact that not all dynamic call sites are very dynamic.  You can sometimes statically dispatch calls (assume various parameters are fixed-width-integers, assume others strings, etc) as long as you have recovery code that can detect the difference before any results are visible.

Tech talks haven’t really changed that much in 20 years; apparently we’ve already long since found the local optimum.  Code samples that fit on slides are always stacked; don’t solve real problems.  Diagrams are always awkward but we keep trying anyway.  Oh, and we didn’t compile the code before putting it in!  But don’t get me wrong, I think this talk was well done, and to be fair, he knew of the error (I assume back then, editing slides was a bit harder than PowerPoint).

Some CLR tie-ins

I think both of these features play a significant role in performance of CLR performance characteristics.

For those not familiar, if you step through a virtual call in MSIL on x86 or x64, you’ll see that each call site caches the address of the function that was last hit from that call site.  Consider the pseudo code:

void some_object::do_something(int x)
{
    //...
}
int main()
{
   some_interface* p = get_an_object();
   p->do_something(42);
}

In C++, typically some_object has a vtable for each interface, and each member function has an entry in this table.  Actual execution looks something like:

// ----  C++ style dispatch ----
struct some_interface
{
    struct vtable_layout
    {
        void (*do_something)(some_interface*, int);
    };
    vtable_layout* vtbl;
};
some_object::some_object()
{
    static vtable_layout vtbl = {
        &some_object::do_something;
    };
    this->vtbl = &vtbl;
}
void some_object::do_something(int x)
{
    //...
}
int main()
{
   some_interface* p = get_an_object();
   p->vtbl->do_something(p, 42);
}

Note that because the dispatch mechanism is tied to the vtbl, then only the first argument (or, rather, only one argument, according to some convention) can affect dispatch.

The CLR dispatches calls differently, more like:

 

// ---- CLR style dispatch ----
struct object
{
    void* type_identifier;
}
void default_do_something(object* p, int x)
{
    void (**memfun_do_something)(object*, int);

    // find address of "last_function_called" variable
    memfun_do_something = find_callsite();

    // fill in value using dynamic type of 'p'
    *memfun_do_something = 
        global_function_table
        .find_type(p->type_identifier)
        .find_member_function("do_something");

    // finally, call the actual member function
    return (*memfun_do_something)(p, x);
}
some_object::some_object
{
    this->type_identifier = some_object_id;
}
void some_object::do_something(int x)
{
    // thunking code -- in case wrong function called
    if (this->type_identifier != some_object_id)
        __tailcall(default_do_something(this, x));
    
    // ...
}
int main()
{
   some_interface* p = get_an_object();

   static void (*last_function_called)(object*, int) 
    = default_do_something;
   last_function_called(p, 42);
}

This is much more complicated – though keep in mind that the function ‘default_do_something’ can be shared for all defaults, using the call site as a hint.  Here, if the default is called or if the ‘last_function_called’ cache is wrong, we pay the hit of searching the type system for the method to call.  However, if the call is on the same object as the last call, then the cache will allow us to make a single indirection initially.  And because it is not using dispatch tables but rather a single type id, we can in general get away with having only a single type id per object instead of one vtbl pointer per interface implemented, saving space as well. 

There are still two indirections going on, but only one indirection that will force a stall the instruction pipeline – the type check can take advantage of out-of-order execution and branch prediction.

Related to CLOS, this scheme’s performance is no longer linked to single dispatch – any lookup scheme can be implemented in default_do_something, provided that a good cache hit rate can be achieved.

As well, when only one implementation exists for a virtual function, the cache hit rate will be 100% after the initial call.  For fast path, this means that you could potentially recompile main() to always call some_object::do_something, and rely on the type check to dispatch correctly if the situation changes.