Memoizer 3 – Lua

Continuing on the dynamic/scripting languages front, I’m squeezing some time on Lua in parallel with experimenting with Python.  From background reading and other research, it looks like Lua is strong but might not have as many libraries as Python does.  It seems to have a lot of popularity in the gaming world, and is also a smaller language, which may be good for my desires to tinker with languages and runtimes.

If you’d like to learn more, you can start with a visit Lua.org

Memoizer

Bear with me as this experiment was conducted entirely from the interactive shell, and formatted with codepad.org:

function memoize(f)
  -- create a new associative table to record results
  store = {}
  function anon(x)
    if store[x] then
      return store[x]
    else
      y = f(x)
      store[x] = y
      return y
    end
  end
  return anon
end

If you read the last entry on the topic, this should looks very familiar.  Lua also has associative containers, also supports nested functions, also supports function closures.  The syntax is a bit different, and Lua uses nil (which evaluates to false) to represent empty slots in an associative container — so ‘if A[B]’ roughly means ‘if the key B appears in table A’.

Otherwise, it behaves identically to the Python version I wrote previously (with the caveat that floating point or fixed sized integes are used by Lua instead of Python’s unbounded precision integers)

function fib(x)
  if x < 2 then
    return 1
  else
    return fib(x-1) + fib(x-2)
  end
end

fib = memoize(fib)

This code, like the Python example, also takes advantage of functions being first class objects — variables they are assigned to can be later re-assigned.  Also like the Python example, the call to ‘fib(x-1)’ is late bound, or dynamically bound.  It’s not until the function is invoked that the ‘fib’ function call is looked up — which allows us to easily inject memoization after the fact.

Functions as expressions

We can do one better though.  In Lua, like some functional languages such as Scheme and JavaScript, you can define anonymous functions inline, as part of an expression.  As a result, we can define and memoize a function in one go without a special-purpose syntax:

-- this:
function fib(x) ... end
fib = memoize(fib)

-- is the same as this:
fib = memoize(function(x) ... end)

The full demonstration with instrumentation is up on code pad — I encourage you take a look and try it yourself, or download the (very fast and small) Lua runtime from Lua.org

Lua ends up using tables for everything — which is not entirely unlike Python where attributes are mostly a special dictionary named __dict__, but the syntax and structure is simpler.  This can be good thing (smaller, more generic language) and bad thing (when everything is the same, it’s hard to see the classes, namespaces, and models through the tables).

I apologize if this code looks very rough — it is.  I started Lua today (no, really!), and mostly translated the Python version.  I’ll try a bit harder on the next Lua post.

Memoizer revisited – Python

A conversation with a coworker entertained the idea of giving a talk on dynamic typed programming language patterns to a group of C++ developers — not that we’re experts, but as part of my team’s initiative for continued education for our engineers, individuals go out and research different ideas, techniques, technologies and present the results to the group so we can, in theory, learn more than each of us could learn in isolation.

Of course, if I want to have any chance of saying anything intelligent, I would need to have more working experience than ‘having tinkered with Perl 5 at one point’. 

So, I’m learning Python.

Starting at the beginning

The first thing I did is go out and download IronPython and CPython, and install both.  Start up the runtime and:

>>>

Ooo, interactive console.  I remember this from Matlab and Maple in college.  Okay, okay, my experience is out of date, but I’m learning.  I wasted like a week on one C++ project last fighting with the limitations of a hand-rolled interactive shell.  Having one out of the box is already a major win, and I’m forgetting how I lived without one for so long. 

After half an hour with O’Reilly’s Python pocket reference, I had our friendly naive Fibonacci sequence generator:

>>> def fib(x):
...   if x < 2:
...     return 1
...   else:
...     return fib(x-1) + fib(x-2)
...
>>>

And we can invoke it with a range of integers 0 thru 9 with the built in map() function and range() functions

>>> map( fib, range(10) )
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
>>>

To the code

So, moving to a source file "memoize.py", memoize becomes some special magic.  This is short and sweet, and I like it even better than when I tinkered with JavaScript closures:

 

def memoize(f):
  "single argument memoization for results of 'f'"
  store = {}
  def anon(x):
    if store.has_key(x):
      return store[x]
    else:
      y = f(x)
      store[x] = y
      return y
  return anon

This is hardly complete, but works to illustrate.  How this works: when memoize is invoked with a function, ‘f’, it sets up a new hash table named ‘store’.  It then on the fly defines a new function ‘anon’ that uses both the original function ‘f’ and the new hash table.  This not-anonymous function is then returned from memoize.

You can now replace fib with the statement

fib = memoize(fib)

This new function object, returned from the call, is assigned to the ‘fib’ variable.  This function object will in turn sometimes invokes the original function, which calls ‘fib’, which now points to the memoizer function object, and not the original function.  Function names are just variables, and can be assigned.

As far as I know, there is no syntax for a function to call itself — it can only call through a variable, and that variable can be reassigned if you choose to.

Decorators

I would be done, but before finishing up this article, I luckily stumbled across decorator syntax — which is exactly this, a shorthand for saying ‘f = g(f)’.  So after defining memoize, we can skip a set and define ‘fib’ pre memoized:

@memoize
def fib(x):
  "computes fibonacci numbers"
  if x < 2:
    return 1
  else:
    return fib(x-1) + fib(x-2)

Paste both of these into "memoize.py", and at the interactive console:

>>> from memoize import fib
>>> fib(100)
573147844013817084101L
>>>

And that’s it.  In all, less than half the size of the C++ example. 

 

And don’t forget to try out the interative console at http://play1.pypy.org/console/.  The code samples above work, and on my machine (Asus eee 901, Win 7, Opera 9.6), it takes less  than a second to compute fib(800).

EDIT ADD: After this posting, I came across documentation for a decorator module in python.  This module apparently provides a set of helper functions for defining decorators.  Among the examples listed is a function named ‘memoize’ that is almost line-for-line identical with my definition above, except that it supports multiple arguments, and uses diifferent dictionary lookup.

In practice, you will want to scope the lifetime of the dictionary — one way to do this is to only use the memoize decorator on nested functions which have definite or bounded scope.

Static typing where possible, dynamic typing when needed

 

From the discussion paper by authors Erik Meijer and Peter Drayton, as reported on Lambda the Ultimate:

Unfortunately there is a discontinuity between contemporary statically typed and dynamically typed languages as well as a huge technical and cultural gap between the respective language communities.

The paper goes on to list 8 different static or dynamic language behaviors that programmers rely on, and in most cases this critical feature of the language has little or nothing to do directly with the static/dynamic choice.

Favorites are "I want contracts" from the static typing camp, and "I want higher order functions, serialization, and code literals" from the dynamic camp.  I think ‘GetHashCode’ from Java and C#/CLR show how static typing can fail at contracts, while C++ TR1 function and CLR delegates show how most higher order functions can work in static languages (although recursively defined function types can still be a problem).

In (seemly) unrelated research I’m doing for my parser writing hobby, these two authors came up again in another blog, Monadic Parser Combinators using C# 3.0, and also looks to be of some interest, but not  directly on programming language front.

The paper is also available on Meijer’s Microsoft research page, at http://research.microsoft.com/~emeijer/Papers/RDL04Meijer.pdf

PDC 2008 picks

Some of my favorite picks from PDC 2008:

  • IronRuby: the Right Language for the Right job – I didn’t realize how well developed the DLR and dynamic languages on the CLR really were.  And on that…
  • Deep Dive: Dynamic Languages in Microsoft .Net – it’s a dynamic language infrastructure that, out of the box, looks like it’s going to have C#, Python, Ruby, COM IDispatch, and looks to be easy to extend in existing .Net languages.  I’ve been critical of many portions of the CLR, but I have to admit that it has proved successful at bringing languages together.
  • "Oslo: building textual DSLs" – The rest of Oslo, not really in my interest areas, but the parser generator is certainly fun.  It still confounds me that modern languages and runtimes come with so many features, but most are missing good general purpose parser libraries.  Boost.Spirit for C++ is a notable exception.
  • Parallel Programming for C++ Developers… – C++ is needing a facelift in the world of concurrent software.  C++0x is providing basic cross-platform concurrency primitives, and Microsoft VC++ looks to be providing easier to work with high-level libraries, as well as the PDC announced support for interoperating with Intel Threading Blocks, with extensions support available for your existing home-grown concurrency libraries.  This should be interesting.

There’s a whole treasure trove of PDC talks available up on channel 9, check it out.

Memoizer, in C++

Part of what I’m trying to do here is record experiments, little bits of code demonstrating software techniques.  This is a revisit of a recent one.

As a refresher, memoization is a dynamic programming technique used to optimize the time to solve problems where partial problems overlap frequently.  For example, in the recursive definition of the Fibonacci sequence, F(x) = { 1 : x < 2,  F(x-1)+F(x-2) : x>=2 }, at each stage, you compute the solution by taking the results from two separate computations.

So, walking briefly through, get the C++ bootstrapping out of the way.  To start, we include the standard module’s we’ll be using:

#include <iostream>
#include <iomanip>
#include <map>

Next up, the memoizer — this is a helper class that will record the results for us. 

template <
    class Result, 
    class Arg,
    class ResultStore = std::map<Arg, Result>
>
class memoizer1{
public:
    // accelerate 'f' by remembering partial results
    template <class F>
    const Result& operator()(F f, const Arg& a){
        typename ResultStore::const_iterator it = memo_.find(a);
        if(it == memo_.end()) {
            it = memo_.insert(make_pair(a, f(a))).first;
        }
        return it->second;
    }
private:
    ResultStore memo_;
};

We’ll separate the definition of fibonacci function from the public surface so that the memoizer can be applied later

int fib(int);

namespace {
    // instrumentation, track number of uses
    int total_ops = 0;

    // implementation
    int fib_(int n){
        ++total_ops;
        if(n == 0 || n == 1) 
            return 1;
        else
            return fib(n-1) + fib(n-2);
    }
}

Lastly the driver function will decorate the implementation with a memoizer, and a prompt loop.  You should be able to copy-paste the code into a cpp file, comple, then run.

// driver function
int fib(int n) {
    static memoizer1<int,int> memo;
    return memo(fib_, n);
}

using namespace std;

int main(){
    cout << "fibonacci numbers, memoized, ctrl+z to quitn";
    for(;;) {
        cout << "enter integer: ";
        int x;
        if(!(cin >> x))
            break;

        total_ops = 0;
        int y = fib(x);

        cout << "fib(" << x << ") == " << y << endl;
        cout << "cost was " << total_ops << endl;
    }
}

And that’s the complete program.  Trying it with a few inputs, you’ll see few number of operations.  To see the non-memoized results, you need only comment out the code for ‘fib()’ and have it invoke ‘fib_’ directly — the results will slow to a halt around x=30 to 35.

By using a look up table, each time ‘fib’ is invoked with a given value of ‘x’, the result is stored so it doesn’t need to recurse again.  A hash table, or even a dynamically sizing array (std::vector) could have also been used in this case, but may not be appropriate for other problem spaces. 

‘Hello World’

Hello Blog

I’ve been experimenting for a while with blogging-as-note-taking for a while now, at work, internally.  And it’s great.  On multiple occasions, I found myself citing the blog to co-workers rather than re-explaining a topic or re-writing mail.  It’s like a really long clipboard history, except that I put some more effort in sorting out my thoughts.  Oddly, more effort than in email.  Perhaps I should proofread my email better.  Except I write more email messages in a day than I blog in months.

But about that internal thing, it was just easier — we have sharepoint, blogging built in and ready to go, etc.  However, I don’t spend any time blogging on confidential topics, and discoverability is higher on the web, so taking it external makes sense.

Topics

As I imagine most software engineers do, I write little code samples, self contained apps, to try out various ideas.  Pasting the result into a blog, with notes on the how, why, and whatnot, is a useful reference, or diary of ideas.  Maybe I’ll get useful feedback from peers, maybe not, but it puts the idea out there.

As well, it’s a bin for storing interesting and relevant links to other blogs, presentations, and research I find.  I can put a searchable name on the bucket of links to find it later, and some thoughts or reactions.  “Hmm, what did I really think the first time I saw that”.

Disclaimer?

This is not a work blog; I am not here to represent in an official capacity either my present employer, nor any past or future employer.  I’m a software engineer professionally, but this is also my hobby.  I’m here to experiment and to learn, and plan to use “educational value” as I see fit as the primary metric by which to moderate.

Other standard disclaimers apply, expressed and not.  See Raymond’s Comment Policy for some good examples.

Last but not least, thanks for visiting!