<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dysfunctional Programming</title>
	<atom:link href="http://ra3s.com/wordpress/dysfunctional-programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://ra3s.com/wordpress/dysfunctional-programming</link>
	<description>(λ (a b) a) vs (λ (a b) b)</description>
	<lastBuildDate>Fri, 02 Mar 2012 08:56:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Some parser combinators for Python</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/some-parser-combinators-for-python/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/some-parser-combinators-for-python/#comments</comments>
		<pubDate>Fri, 02 Mar 2012 08:56:54 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=396</guid>
		<description><![CDATA[I&#8217;ve got two parser combinators today for you to play with, both whipped up this evening from pieces of earlier experiments. Parser 5: PEG grammar without memoization This is loosely based on Daan Leijen and Erik Meijers&#8217; 2001 paper [1]. &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/some-parser-combinators-for-python/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve got two parser combinators today for you to play with, both whipped up this evening from pieces of earlier experiments.</p>
<p><span id="more-396"></span></p>
<p><strong>Parser 5: PEG grammar without memoization</strong></p>
<p>This is loosely based on Daan Leijen and Erik Meijers&#8217; 2001 paper [1]. I say loosely as it lacks all the important elements demonstrated in the paper &#8212; efficiency, useful error messages, etc. &#8212; but it is a monadic parser.</p>
<p>(Why do you want a monadic parser? <a href="http://www.valuedlessons.com/2008/04/you-could-have-invented-monadic-parsing.html">This author explains better than I</a>) </p>
<p>Here, the type of a parser is &#8216;string -&gt; maybe (a, string)&#8217;, where &#8216;a&#8217; is your parse tree, and the result string is the remaining input. If the parse fails, None is returned instead of a tuple.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;"># simplified monadic (PEG) parser. no memoization, some backtracking.</span>
<span style="color: #808080; font-style: italic;">#  parser :: str -&amp;gt; maybe (value, str)</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> ret <span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff7700;font-weight:bold;">lambda</span> s: <span style="color: black;">&#40;</span>value, s<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> bind <span style="color: black;">&#40;</span>p, <span style="color: #66cc66;">*</span>fs<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        res = p<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> f <span style="color: #ff7700;font-weight:bold;">in</span> fs:
            res = res <span style="color: #ff7700;font-weight:bold;">and</span> f<span style="color: black;">&#40;</span>res<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#40;</span>res<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> res
    <span style="color: #ff7700;font-weight:bold;">return</span> parse</pre></td></tr></table></div>

<p>These are the monad operators return and bind. &#8216;Ret(v)&#8217; produces a parser that consumes and empty string, producing the parse tree &#8216;v&#8217;, but the juice is in &#8216;bind&#8217;.</p>
<p>&#8216;Bind(p, f)&#8217; glues together a parser and a function. A parser &#8216;p&#8217; consumes some input and produces a parse tree &#8216;v&#8217;. This value &#8216;v&#8217; is then passed to a function &#8216;f&#8217;, returning a <em>new parser</em> to consume the remaining input. That is, &#8216;f&#8217; chooses, based on the parse tree so far, which language to use to interpret the rest of the input. </p>
<p>This is extremely powerful, and permits monadic parsers the ability to recognize classes of context sensitive grammars, such as parsing XML, or loading new languages on the fly as you parse (Perl6, anyone?) or perhaps backtracking to recover if an evaluation of the parse tree fail (<a href="http://www.perlmonks.org/?node_id=663393">Perl5, anyone?</a>).</p>
<p>It can also inhibit various parser optimizations if you&#8217;re not careful.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>15
16
17
18
19
20
21
22
23
</pre></td><td class="code"><pre class="python" style="font-family:monospace;">never = <span style="color: #ff7700;font-weight:bold;">lambda</span> s: <span style="color: #008000;">False</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> alt <span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> p <span style="color: #ff7700;font-weight:bold;">in</span> ps:
            res = p<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> res: <span style="color: #ff7700;font-weight:bold;">return</span> res
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">False</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> parse</pre></td></tr></table></div>

<p>&#8216;never&#8217; is the MonadZero value, a sort of additive identity for the &#8216;alt&#8217; <a href="http://www.haskell.org/haskellwiki/MonadPlus">MonadPlus</a> operator. &#8216;alt&#8217; produces a parser that will recognize any of the languages passed to it. &#8216;never&#8217; recognizes nothing, so e.g. &#8216;alt(p, never)&#8217; is equivalent to &#8216;p&#8217; in the same way as &#8216;bind(p, ret)&#8217; is equivalent to &#8216;p&#8217;.</p>
<p>You can ignore these equivalences for now &#8212; they&#8217;re useful though when it comes time to optimize grammars, but that&#8217;s not on today&#8217;s agenda.</p>
<p>From here on out, we have a handful of various other operators:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
</pre></td><td class="code"><pre class="python" style="font-family:monospace;">empty = ret<span style="color: black;">&#40;</span><span style="color: #008000;">None</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> then <span style="color: black;">&#40;</span>p, <span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>p, <span style="color: #ff7700;font-weight:bold;">lambda</span> x: bind<span style="color: black;">&#40;</span>then<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>, <span style="color: #ff7700;font-weight:bold;">lambda</span> rest: ret<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span>+rest<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> ps <span style="color: #ff7700;font-weight:bold;">else</span> ret<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> repeat <span style="color: black;">&#40;</span>p, res=<span style="color: black;">&#91;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> alt<span style="color: black;">&#40;</span> bind<span style="color: black;">&#40;</span>p, <span style="color: #ff7700;font-weight:bold;">lambda</span> v: repeat<span style="color: black;">&#40;</span>p, res+<span style="color: black;">&#91;</span>v<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>,
                ret<span style="color: black;">&#40;</span>res<span style="color: black;">&#41;</span> <span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">def</span> repeat1 <span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>p, <span style="color: #ff7700;font-weight:bold;">lambda</span> v: repeat<span style="color: black;">&#40;</span>p, <span style="color: black;">&#91;</span>v<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> pred <span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> s <span style="color: #ff7700;font-weight:bold;">and</span> f<span style="color: black;">&#40;</span>s<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#40;</span>s<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, s<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>:<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">else</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">False</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> parse
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> option <span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> alt<span style="color: black;">&#40;</span>p, empty<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> repn <span style="color: black;">&#40;</span>p, n<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> then<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>p<span style="color: black;">&#93;</span><span style="color: #66cc66;">*</span>n<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> char <span style="color: black;">&#40;</span>ch<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> pred<span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> c: c==ch<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> literal <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> then<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span><span style="color: black;">&#40;</span>char<span style="color: black;">&#40;</span>c<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> c <span style="color: #ff7700;font-weight:bold;">in</span> s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> delay <span style="color: black;">&#40;</span>fn<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>empty, <span style="color: #ff7700;font-weight:bold;">lambda</span> _:fn<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
end = <span style="color: #ff7700;font-weight:bold;">lambda</span> s: <span style="color: #008000;">False</span> <span style="color: #ff7700;font-weight:bold;">if</span> s <span style="color: #ff7700;font-weight:bold;">else</span> <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, s<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>&#8216;Empty&#8217; recognizes the empty string, &#8216;then&#8217; chains parsers in sequence, &#8216;repeat&#8217; is Klein star, &#8216;repeat1&#8242; likewise plus. &#8216;option&#8217; recognizes one or zero of a language, while &#8216;repn(p,n)&#8217; recognizes exactly n occurrences of p.</p>
<p>&#8216;Pred(fn)&#8217; consumes one character &#8216;c&#8217; for which &#8216;fn(c)&#8217; is true (e.g. using str.isalpha). &#8216;char(c)&#8217; recognizes the character &#8216;c&#8217;, and &#8216;literal(s)&#8217; the sequence of characters in the string &#8216;s&#8217;. &#8216;delay(lambda:p)&#8217; recognizes &#8216;p&#8217; &#8212; and is merely an artifact of strict evaluation, as &#8216;p&#8217; might not yet be defined. Finally, &#8216;end&#8217; recognizes only the end of the input string; not very useful in today&#8217;s example, but it can be handy in parsers that perform lookahead.</p>
<p>You may not have noticed as it whizzed by, but this parser doesn&#8217;t perform full backtracking. The &#8216;alt&#8217; operator attempts to parse using the first language, eagerly, and if it succeeds, returns immediately. If a subsequent parse operations should fail, no backtracking is performed to consider the alternate path. As such, this parser combinator library might be classified as a <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">parsing expression grammar</a>. Contrast this with <a href="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammars</a>, where both sides of the alternation are considered equal for backtracking / lookahead purposes.</p>
<p>Size: ~60 lines</p>
<p><strong>Parser 7: CSG with partial memoization</strong></p>
<p>This next parser adds a couple tweaks. First, the type of the parser has changed to &#8216;string -&gt; lcons of (a, string)&#8217;, where lcons is a kind of lazy list.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> lcons <span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span> <span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">iter</span><span style="color: black;">&#41;</span>: <span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span> = <span style="color: #008000;">iter</span><span style="color: #66cc66;">;</span> <span style="color: #008000;">self</span>.<span style="color: black;">value</span> = <span style="color: #008000;">None</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> force <span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span>:
            <span style="color: #ff7700;font-weight:bold;">try</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">value</span> = <span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span>.<span style="color: black;">next</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, lcons<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">StopIteration</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">value</span> = <span style="color: #008000;">None</span>
            <span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span> = <span style="color: #008000;">None</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">value</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> empty<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">force</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> == <span style="color: #008000;">None</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> head <span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">force</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> tail <span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">force</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__iter__</span> <span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">self</span>.<span style="color: black;">empty</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">yield</span> <span style="color: #008000;">self</span>.<span style="color: black;">head</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span> = <span style="color: #008000;">self</span>.<span style="color: black;">tail</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> memoize<span style="color: black;">&#40;</span>fn, _memo=<span style="color: #008000;">dict</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> replc <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        k = <span style="color: black;">&#40;</span>fn, s<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> k <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #ff7700;font-weight:bold;">in</span> _memo:
            _memo<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = lcons<span style="color: black;">&#40;</span><span style="color: #008000;">iter</span><span style="color: black;">&#40;</span>fn<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> _memo<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> replc</pre></td></tr></table></div>

<p>&#8216;lcons&#8217; only role is ensure that an iteration is computed only once; otherwise, we&#8217;ll be using Python iterators. Speaking of memoization, we&#8217;ll also memoize of our monad parser so that we don&#8217;t recompute them on backtrack. This &#8230; will consume quite a bit of memory, but will reduce the backtracking costs in some key places. You can disable this with otherwise no change in behavior.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> ret <span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff7700;font-weight:bold;">lambda</span> s: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span>value, s<span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> bind <span style="color: black;">&#40;</span>p, f, <span style="color: #66cc66;">*</span>fs<span style="color: black;">&#41;</span>:
    @memoize
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> res1 <span style="color: #ff7700;font-weight:bold;">in</span> p<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">for</span> res2 <span style="color: #ff7700;font-weight:bold;">in</span> f<span style="color: black;">&#40;</span>res1<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#40;</span>res1<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">yield</span> res2
    <span style="color: #ff7700;font-weight:bold;">if</span> fs: <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>parse, <span style="color: #66cc66;">*</span>fs<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> parse
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> alt <span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>:
    @memoize
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> p <span style="color: #ff7700;font-weight:bold;">in</span> ps:
            <span style="color: #ff7700;font-weight:bold;">for</span> res <span style="color: #ff7700;font-weight:bold;">in</span> p<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">yield</span> res
    <span style="color: #ff7700;font-weight:bold;">return</span> parse
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> then <span style="color: black;">&#40;</span>p, <span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>:
    more = then<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> ps <span style="color: #ff7700;font-weight:bold;">else</span> <span style="color: #008000;">None</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>p, <span style="color: #ff7700;font-weight:bold;">lambda</span> x: bind<span style="color: black;">&#40;</span>more, <span style="color: #ff7700;font-weight:bold;">lambda</span> rest: ret<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span>+rest<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> ps <span style="color: #ff7700;font-weight:bold;">else</span> ret<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>The monad parser combinators have received a facelift. &#8216;bind(p,f)&#8217; now evaluates each possible parse of &#8216;p&#8217;, passing the result to &#8216;f&#8217; and then enumerating the results from that parser, lazily yielding them to be memoized by &#8216;lcons&#8217;.</p>
<p>&#8216;Alt&#8217; has similarly been updated &#8212; each possible parse result from each parser is yielded in turn.</p>
<p>&#8216;Then&#8217; gets a minor performance improvement. As we&#8217;re memoizing parse results, we get a boost by reusing the same tail parser each time.</p>
<p>&#8216;Ret&#8217;, &#8216;never&#8217;, &#8216;end&#8217;, &#8216;char&#8217;, and &#8216;pred&#8217; are each updated to return arrays of results, but are otherwise unchanged. All the other parser combinators remain as before.</p>
<p>As a result of these changes, parser7 now supports nearly the class of <a href="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammars</a> (CFG), save its inability to handle left recursion (perhaps the techniques in [2] could fix that). It can even handle ambiguous grammars, returning a parse forest instead of a parse tree. Of course, nothing is free &#8212; parser7 is slower than parser5, and most useful CFGs can be rewritten as PEGs with some effort.</p>
<p>Above and beyond CFGs, this parser continues to provide monadic bind, so can continue to parse a number of useful languages from the class of <a href="http://en.wikipedia.org/wiki/Context-sensitive_language">context sensitive languages</a>. For example, the mini-xml parser in the samples bellow runs great under both libraries.</p>
<p>It&#8217;s a memory hog, but it&#8217;s also ~90 lines of vanilla Python.</p>
<p><strong>Full source, samples, references</strong></p>
<p>I&#8217;ve put the parser combinators and some samples up on codepad.org so you can get a feel for what the output looks like. It&#8217;s not pretty, but it&#8217;s pretty well structured:</p>
<ul>
<li>parser5: <a href="http://codepad.org/P5l2l6dm">http://codepad.org/P5l2l6dm</a></li>
<li>parser7: <a href="http://codepad.org/pmpqp1wI">http://codepad.org/pmpqp1wI</a></li>
</ul>
<p>[1] <a href="http://research.microsoft.com/en-us/um/people/daan/download/papers/parsec-paper.pdf">Parsec: Direct Style Monadic Parser Combinators For The Real World</a>. Daan Leijen, Erik Meijer (2001).</p>
<p>[2] <a href="http://www.vpri.org/pdf/tr2007002_packrat.pdf">Packrat Parsers Can Support Left Recursion</a>. Alessandro Warth, James R. Douglass, Todd Millstein (2007).</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/some-parser-combinators-for-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Schrodinger&#8217;s Yacc</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/schrodingers-yacc/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/schrodingers-yacc/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 17:05:51 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[functional]]></category>
		<category><![CDATA[parsers]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=373</guid>
		<description><![CDATA[There was a small controversy last year about parser combinators, a convenient way of rapidly developing parsers in a functional style. Yacc is presumably chosen as the archetypal non-combinator parser generator, requiring separate external parser compiler, known for being a &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/schrodingers-yacc/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>There was a small controversy last year about <a href="http://en.wikipedia.org/wiki/Parser_combinator">parser combinators</a>, a convenient way of rapidly developing parsers in a functional style. Yacc is presumably chosen as the archetypal non-combinator parser generator, requiring separate external parser compiler, known for being a pain to use.</p>
<ul>
<li>&#8220;<a href="http://arxiv.org/abs/1010.5023">Yacc is dead</a>&#8221; (<a href="http://lambda-the-ultimate.org/node/4148">ltu discussion</a>)</li>
<li>&#8220;<a href="http://matt.might.net/articles/parsing-with-derivatives/">Yacc is not dead</a>&#8220;</li>
<li>&#8220;<a href="http://matt.might.net/articles/parsing-with-derivatives/">Yacc is dead: and update</a>&#8220;</li>
</ul>
<p>Like Schrodinger&#8217;s cat, Yacc seems to be indeterminately alive or dead (though the last article conclusively opened the box for me).</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/schrodingers-yacc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Concepts: Typeclasses for C++?</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/concepts-typeclasses-for-cpp/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/concepts-typeclasses-for-cpp/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 05:01:00 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[concepts]]></category>
		<category><![CDATA[generic programming]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=377</guid>
		<description><![CDATA[I&#8217;ve had a hypothesis for a while that C++ templates (paired at times with ADL) are an ad-hoc, unsound version of typeclasses. I&#8217;ve seen this hold for parser combinators, range base algorithms, and more. I&#8217;m also not the first to draw &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/concepts-typeclasses-for-cpp/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had a hypothesis for a while that C++ templates (paired at times with <a href="http://en.wikipedia.org/wiki/Argument-dependent_name_lookup">ADL</a>) are an ad-hoc, unsound version of typeclasses. I&#8217;ve seen this hold for <a href="http://parsnip-parser.sourceforge.net/">parser combinators</a>, <a href="http://www.boost.org/doc/libs/1_48_0/libs/range/doc/html/index.html">range base algorithms</a>, and more. I&#8217;m also not the first to draw this comparison[<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.2151">1</a>].</p>
<p><a href="http://en.wikipedia.org/wiki/Concepts_(C%2B%2B)">Concepts</a> are supposed to bring soundness in through constrained templates. Concepts look awfully a lot like type classes; they export functions and types, and are parameterized, and act as constraints on generic functions and other concepts. I checked the draft specification [<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2617.pdf">2</a>], and it even seems to permit parameterizing concepts on type constructors, just like Haskell! (er, C++ calls them class templates, not type constructors. <a href="http://en.wiktionary.org/wiki/tomato_tomato#Phrase">to-<em>may-</em>to, to-<em>mah-</em>to</a>)</p>
<p>But I worried I may be mistaken about concepts, as I&#8217;ve searched through google and literature and have yet to find a single example in literature demonstrating the use of template template concepts.</p>
<p>In case you&#8217;re curious what this might look like, here&#8217;s an educated guess:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">concept Monad<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">template</span> <span style="color: #000080;">&lt;&gt;</span> <span style="color: #0000ff;">class</span> m<span style="color: #000080;">&gt;</span> <span style="color: #008000;">&#123;</span>
  <span style="color: #0000ff;">template</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">typename</span> T, <span style="color: #0000ff;">typename</span> U<span style="color: #000080;">&gt;</span>
  m<span style="color: #000080;">&lt;</span>U<span style="color: #000080;">&gt;</span> mbind<span style="color: #008000;">&#40;</span>m<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span>, function<span style="color: #000080;">&lt;</span>m<span style="color: #000080;">&lt;</span>U<span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>T<span style="color: #008000;">&#41;</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #0000ff;">template</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">class</span> T<span style="color: #000080;">&gt;</span>
  m<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span> mreturn<span style="color: #008000;">&#40;</span>T<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">template</span> <span style="color: #000080;">&lt;</span><span style="color: #0000ff;">template</span> <span style="color: #000080;">&lt;</span><span style="color: #0000ff;">typename</span><span style="color: #000080;">&gt;</span> <span style="color: #0000ff;">class</span> m,
          <span style="color: #0000ff;">class</span> T,
          <span style="color: #0000ff;">class</span> U,
          <span style="color: #0000ff;">class</span> Iter<span style="color: #000080;">&gt;</span>
requires M<span style="color: #000080;">&lt;</span>m<span style="color: #000080;">&gt;</span>
requires InputIterator<span style="color: #000080;">&lt;</span>Iter, U<span style="color: #000080;">&gt;</span>
m<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span> foldM<span style="color: #008000;">&#40;</span>Iter begin, Iter end, T i, function<span style="color: #000080;">&lt;</span>m<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>T, U<span style="color: #008000;">&#41;</span><span style="color: #000080;">&gt;</span> f<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">if</span><span style="color: #008000;">&#40;</span>begin <span style="color: #000080;">==</span> end<span style="color: #008000;">&#41;</span>
        <span style="color: #0000ff;">return</span> mreturn<span style="color: #008000;">&#40;</span>i<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">else</span>
        <span style="color: #0000ff;">return</span> mbind<span style="color: #008000;">&#40;</span>f<span style="color: #008000;">&#40;</span>i, <span style="color: #000040;">*</span>begin<span style="color: #008000;">&#41;</span>, <span style="color: #008000;">&#91;</span><span style="color: #000080;">=</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">&#40;</span>T result<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#123;</span>
            Iter next <span style="color: #000080;">=</span> begin<span style="color: #008080;">;</span>
            <span style="color: #000040;">++</span>next<span style="color: #008080;">;</span>
            <span style="color: #0000ff;">return</span> foldM<span style="color: #008000;">&#40;</span>next, end, result, f<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #008000;">&#125;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Have template template concepts been covered thoroughly somewhere, and I&#8217;ve just missed it?</p>
<ul style="list-style-type: none;">
<li>[1] &#8220;<a href="(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.2151)">C++ templates/traits versus Haskell typeclasses</a>&#8221; (2005), by Sunil Kothari, Martin Sulzmann.</li>
<li>[2] &#8220;<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2617.pdf">Proposed Wording for Concepts (Revision 5)</a>&#8221; (2008).</li>
<li>[3] &#8220;<a href="http://herbsutter.com/2009/07/21/trip-report/">Trip Report: Exit Concepts, Final ISO C++ Draft in ~18 Months</a>&#8221; (2009), Herb Sutter.</li>
<li>[4] &#8220;ConceptClang: An Implementation of C++ Concepts in Clang&#8221; [<a href="http://www.generic-programming.org/software/ConceptClang/papers/wgp06v-voufo.pdf">pdf</a>]</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/concepts-typeclasses-for-cpp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Repls, repls, everywhere</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/repls-repls-everywhere/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/repls-repls-everywhere/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 16:46:41 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[languages]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=345</guid>
		<description><![CDATA[Have a new-years resolution to try out a new programming language, but in too much of a hurry to pick only one, or install anything? Online REPs and REPLs Today there&#8217;s 61 different languages on that list. That many, there&#8217;s &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/repls-repls-everywhere/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Have a new-years resolution to <a href="http://matt.might.net/articles/programmers-resolutions/">try out a new programming language</a>, but in too much of a hurry to pick only one, or install anything?</p>
<p><a href="http://joel.franusic.com/w/page/26128430/Online-REPs-and-REPLs">Online REPs and REPLs</a></p>
<p>Today there&#8217;s 61 different languages on that list. That many, there&#8217;s gotta be at least <em>one</em> that strikes your fancy.</p>
<p>Best batch execution site: <a href="http://ideone.com">ideone.com</a> with 50 unique languages.</p>
<p>Best interactive REPL site: <a href="http://repl.it/">repl.it</a> with 16 unique languages.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/repls-repls-everywhere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Only a Mathematician would say</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/only-a-mathematician-would-say/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/only-a-mathematician-would-say/#comments</comments>
		<pubDate>Wed, 05 Oct 2011 03:35:35 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[tech talk]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=342</guid>
		<description><![CDATA[I came across a version of Dijkstra&#8217;s &#8220;Goto Considered Harmful&#8221; annotated by David Tribble. A lot has changed since then; the annotated version is perfect for acquiring the right context to read the paper. There was a particular line by &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/only-a-mathematician-would-say/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I came across a version of Dijkstra&#8217;s &#8220;Goto Considered Harmful&#8221; <a href="http://david.tribble.com/text/goto.html">annotated by David Tribble</a>. A lot has changed since then; the annotated version is perfect for acquiring the right context to read the paper.</p>
<p>There was a particular line by the annotator that continues to amuse me:</p>
<blockquote><p>Dijkstra seems to imply that iterative looping (inductive) statements are intellectually harder to grasp than recursion, which is the kind of thing only a mathematician would say.</p></blockquote>
<p>Among the ways you can bisect computer science, is whether this statement insults or compliments.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/only-a-mathematician-would-say/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Quickrefs update: Common Lisp</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/quickrefs-update-common-lisp/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/quickrefs-update-common-lisp/#comments</comments>
		<pubDate>Thu, 25 Aug 2011 16:03:16 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[quickref]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=329</guid>
		<description><![CDATA[I&#8217;ve got a fairly terrible draft of a quickref sheet for Common Lisp. This should be enough to recall the most common/needed commands, but it&#8217;s fairly terrible at this point. That said, it might be useful for someone out there: &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/quickrefs-update-common-lisp/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve got a fairly terrible draft of a quickref sheet for Common Lisp. This should be enough to recall the most common/needed commands, but it&#8217;s fairly terrible at this point. That said, it might be useful for someone out there:</p>
<p><a href="http://ra3s.com/wordpress/dysfunctional-programming/wp-content/uploads/2011/08/CL-PocketMod-draft-1.pdf">CL-PocketMod draft 1</a></p>
<p>For reference, the prior quickrefs are here: [<a title="Quickrefs for Python and Vim" href="http://ra3s.com/wordpress/dysfunctional-programming/2011/01/17/quickrefs-for-python-and-vim/">link</a>]</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/quickrefs-update-common-lisp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bondage and Discipline Python</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/bondage-and-discipline-python/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/bondage-and-discipline-python/#comments</comments>
		<pubDate>Fri, 18 Mar 2011 05:00:23 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=320</guid>
		<description><![CDATA[Python has an identity crisis sometimes. It starts with the premise, from Guido&#8217;s prior work on ABC, to make a simple but easy to understand language. But then turns around and cries out &#8220;one way to do it&#8220;, leaving the &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/bondage-and-discipline-python/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Python has an identity crisis sometimes. It starts with the premise, from Guido&#8217;s prior work on ABC, to make a simple but easy to understand language.</p>
<p>But then turns around and cries out &#8220;<a href="http://www.python.org/dev/peps/pep-0020/">one way to do it</a>&#8220;, leaving the programmer perplexed as to how Guido van Rossum thought we should do things. For example, Guido <a href="http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html">hates tail calls</a>, so recursion isn&#8217;t the one way to do it that he picked (note that his blog post and followups contain a large number of <a href="http://ra3s.com/wordpress/dysfunctional-programming/2009/05/18/the-python-debate-on-tail-calls/">factual errors</a>; read it as an opinion piece only).</p>
<p>During these fits, Python suffers itself to be a <a href="http://www.jargon.net/jargonfile/b/bondage-and-disciplinelanguage.html">bondage and discipline</a> language.</p>
<p>Apparently there is hope. One bit that causes me particular pain is that nested functions cannot rebind variables in the outer scope; only read from them. A case in point:</p>
<p><code> </code></p>
<pre><code>def f():
  x = 1
  def doubleIt():
    x *= 2 # local variable 'x' referenced before assignment
  doubleIt(); doubleIt();
  return x;</code></pre>
<p>However, I was reading a <a href="http://jedahu.blogspot.com/2010/08/why-i-like-factor.html">piece on Factor</a>, and it mentions that this restriction is lifted in Python 3.0. I&#8217;m still on 2.6 (the only differences I had been aware of were the somewhat arbitrary swapping of the &#8216;/&#8217; and &#8221; operators, and the insulting <em>removal</em> of <a href="http://www.artima.com/weblogs/viewpost.jsp?thread=98196">map, filter, and reduce</a> from Python 3.0). However, fixing this scoping rule, even if an extra keyword is needed, sure would be convenient for me.</p>
<p>Actually, there are <a href="http://docs.python.org/release/3.0.1/whatsnew/3.0.html">a lot of improvements</a>:</p>
<ul>
<li>Various APIs returns views instead of mutable list (copies).</li>
<li><em>sorted</em> is now built in. (I&#8217;ve had to write that myself so many times&#8230;)</li>
<li>Apparently <em>map</em> and <em>filter</em> are still here (though he still went and put <em>reduce</em> all the way off in <em>functools</em>. I guess you can&#8217;t have it all -_-)</li>
<li>Set literals, yay!</li>
</ul>
<p>Now, if Guido would be so kind to add proper statement support to lambdas, I&#8217;ll switch ^_^. Oh, but Guido hates lambdas. Oh well, next time maybe.</p>
<p><em>&#8211; signed an ambivalent Python user</em></p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/bondage-and-discipline-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The case of the different shifts</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/the-case-of-the-different-shifts/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/the-case-of-the-different-shifts/#comments</comments>
		<pubDate>Sat, 12 Feb 2011 18:05:53 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[trivia]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=309</guid>
		<description><![CDATA[Larry Osterman has commented on an interesting edge case in the C/C++ standards, involving the underflow of the right shift operator. They reported that if they compiled code which calculated: 1130149156 &#62;&#62; -05701653 it generated different results on 32bit and &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/the-case-of-the-different-shifts/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Larry Osterman has commented on an interesting edge case in the C/C++ standards, involving the underflow of the right shift operator.</p>
<blockquote><p>They reported that if they compiled code which calculated: 1130149156 &gt;&gt; -05701653 it generated different results on 32bit and 64bit operating systems.  On 32bit machines it reported 0 but on 64bit machines, it reported 0x21a.</p></blockquote>
<p>This is one of various areas of &#8220;undefined behavior&#8221; for which you can ask 2 engineers what it should do and get 3 answers, at least one being that &#8220;but I know there must be one!&#8221; I think I know where at least 2 of the answers are coming from &#8230;<span id="more-309"></span><strong> </strong></p>
<p><strong>The Hardware shift</strong></p>
<p>On x64, Larry mentions that the <a href="http://en.wikibooks.org/wiki/X86_Assembly/Shift_and_Rotate">sar instruction</a> is used to carry out the operations. This instruction is not new to the x86 architecture, and in that architecture line&#8217;s early days you did <em>not</em> waste die space. So what&#8217;s cheap?</p>
<p>Enter the <a href="http://en.wikipedia.org/wiki/Barrel_shifter">barrel shifter</a>. It&#8217;s cheap, it&#8217;s small, you pick the maximum power-of-2 shift you want, and that&#8217;s what you pay for. For a 16-bit shift, you can support shifts of 0 to 15 and pay for 4 stages. If you want to shift out all 16, you might think &#8220;add another stage&#8221; &#8212; but it gets worse! It only takes rotation as input. If you need to identify greater shifts than 31, then you&#8217;re toast.</p>
<p>No, what you consider doing is detect shifts &gt; 15, and add a an extra stage to mask away the entire output. But wait, <a href="http://en.wikipedia.org/wiki/Intel_8086">it&#8217;s 1972</a>, and you don&#8217;t have either the die space nor the extra cycles to waste on a conditional masking stage (masking to 4 bits of shift input is free &#8212; you just don&#8217;t connect the other 12 input bits to anything). You just tell the programmer to never shift too much, use the bottom most bits, and ignore the rest. (1 &gt;&gt; 128) == 1.</p>
<p>In time the processor is expanded to support 32-bit quantities and 64-bit quantities, but each time you need to remain compatible, so the sar instruction continues to mask away the upper bits.</p>
<p><strong>The Programming Language</strong></p>
<p>I&#8217;m gonna gloss over the whole &#8220;<a href="http://en.wikipedia.org/wiki/Undefined_behavior">undefined behavior</a>&#8221; thing. Enough <a href="http://blog.regehr.org/archives/213">others</a> have discussed it. Even languages or implementations which aim to minimize it will occasionally miss edge cases or <a href="http://web.mit.edu/~axch/www/scheme/choices/r5rs-letrec.html">fall short</a>.</p>
<p>And C takes a very brutal route of making a great many things &#8220;undefined behavior&#8221; in the name of permitting performance shortcuts. For instance, signed arithmetic overflow yields undefined behavior. This might seem strange now, but at the time 2&#8242;s compliment arithmetic was not common. Choosing 2&#8242;s compliment wrap-around would have introduced perf penalties on 1&#8242;s compliment architectures as they did not generally include native 2&#8242;s compliment instructions, and vice-versa. A modern spin on this are special purpose DSP chips that default to saturating arithmetic.</p>
<p>In any case, I imagine a similar case stands for shifting. One architecture may prefer precise shifting, but x86 prefers to take shifts in modulo. Rather than specify one behavior and penalize compilation on all other hardware, the C language says instead &#8220;this area off limits &#8212; if you want a particular slow-but-precise behavior, implement it yourself&#8221;</p>
<p><strong>The Software Shift</strong></p>
<p>The 32-bit x86 CRT routine mentioned was _allshr (the extra leading underscore comes from <a href="http://blogs.msdn.com/b/oldnewthing/archive/2004/01/08/48616.aspx">cdecl calling convention</a> on x86), and you can find its source in your Visual C++ product installation, under %programfiles%/Microsoft Visual Studio 10.0/vc/crt/src/intel/llshr.asm. It leads off:</p>
<p><!--StartFragment--></p>
<div>
<pre>;
; Handle shifts of 64 bits or more (if shifting 64 bits or more, the result
; depends only on the high order bit of edx).
;
        cmp     cl,64
        jae     short RETSIGN</pre>
</div>
<p><!--EndFragment-->Interesting &#8212; for large shifts, it bails early. Following it is another branch, switching between different implementations for 0-31 and 32-63 bit. In any case, the resulting 3 code paths are short, with the cl &gt;= 64 path being the shortest of all.</p>
<p>Now, the designer could have considered masking instead of comparing &#8212; but for random shift amounts, that would have been a <em>longer</em> code sequence to execute*. And by most reasonable interpretations, returning the sign bit <em>is</em> the right answer.</p>
<p><strong>So there you have it</strong></p>
<p>Each implementation is most efficient and reasonable, when and where it was designed, and both are permitted by a language standard that valued efficiency above most else. And in that respect, both implementations were perfect &#8230; at least, they were when originally written.</p>
<p>* yes, the conditional branch is probably a bigger cost than the savings&#8230; today. While I don&#8217;t know when this code was written, it mentions PROC NEAR, leading me to think that perhaps it comes from a time before <a href="http://en.wikipedia.org/wiki/Superscalar">superscalar architectures</a>, where branches were cheap(er) and arithmetic was costly. You might be want to change it now, but that decision opens up <a href="http://en.wikipedia.org/wiki/Backwards_compatibility">back-compat</a> questions.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/the-case-of-the-different-shifts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eval in Python</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/eval-in-python/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/eval-in-python/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 03:16:48 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=305</guid>
		<description><![CDATA[I&#8217;ll just leave this here for you: Wait what. Python compiles? That is correct. CPython and PyPy (the implementations worth caring about currently) are in fact creating a code object from the string you pass to exec or eval before executing it. &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/eval-in-python/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll just leave <a href="http://lucumr.pocoo.org/2011/2/1/exec-in-python/">this</a> here for you:</p>
<blockquote><p>Wait what. Python compiles? That is correct. CPython and PyPy (the implementations worth caring about currently) are in fact creating a code object from the string you pass to <cite>exec</cite> or <cite>eval</cite> before executing it. And that&#8217;s just one of the things many people don&#8217;t know about the exec statement.</p></blockquote>
<p>It doesn&#8217;t always have to be about theory, nor should it be. I might recommend this article for programmers already versed in Python, to read side by side or leading up to SICP&#8217;s &#8220;<a href="http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-25.html#%_chap_4">Metalinguistic Abstractions</a>&#8221; chapter.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/eval-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kindle programming (part 1)</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/kindle-programming-part-1/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/kindle-programming-part-1/#comments</comments>
		<pubDate>Fri, 28 Jan 2011 04:26:04 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=299</guid>
		<description><![CDATA[I bought an Amazon Kindle 3G back in October. So far I&#8217;ve mostly been reading research papers on it (that 3rd gen eInk really is amazing), occasionally proggit. I applied for SDK access, but heard nothing back. So instead, I&#8217;m &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/kindle-programming-part-1/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I bought an <a href="http://www.amazon.com/Kindle-Wireless-Reader-3G-Wifi-White/dp/B002LVUX1W/">Amazon Kindle 3G</a> back in October. So far I&#8217;ve mostly been reading research papers on it (that 3rd gen <a href="http://en.wikipedia.org/wiki/E_Ink">eInk</a> really is amazing), occasionally <a href="http://www.reddit.com/r/programming">proggit</a>.</p>
<p>I applied for SDK access, but heard nothing back. So instead, I&#8217;m using the built in &#8220;experimental&#8221; web browser. It has canvas support, so I&#8217;m golden.</p>
<p><a href="http://ra3s.com/wordpress/dysfunctional-programming/wp-content/uploads/2011/01/IMAG0018sm.jpg"><img class="aligncenter size-full wp-image-301" title="KindleProgrammingPart1" src="http://ra3s.com/wordpress/dysfunctional-programming/wp-content/uploads/2011/01/IMAG0018sm.jpg" alt="Image of Kindle web browser displaying vector fonts." width="640" height="360" /></a></p>
<p>As you might have noticed, I couldn&#8217;t resist the urge to design a vector font for the task; please be gentle, I know the font leaves a lot to be desired, it&#8217;s a work in progress. The Kindle browser didn&#8217;t seem to have font support anyway, while public domain, I didn&#8217;t particularly enjoy the <a href="http://idlastro.gsfc.nasa.gov/idl_html_help/About_Hershey_Vector_Fonts.html">Hershey fonts</a>. The Hershey fonts are nice, but they&#8217;re <a href="http://www.lowing.org/fonts/">unsuitable for programming</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/kindle-programming-part-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

