<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dysfunctional Programming &#187; aaron</title>
	<atom:link href="http://ra3s.com/wordpress/dysfunctional-programming/author/admin/feed/" rel="self" type="application/rss+xml" />
	<link>http://ra3s.com/wordpress/dysfunctional-programming</link>
	<description>(λ (a b) a) vs (λ (a b) b)</description>
	<lastBuildDate>Fri, 02 Mar 2012 08:56:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Some parser combinators for Python</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/some-parser-combinators-for-python/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/some-parser-combinators-for-python/#comments</comments>
		<pubDate>Fri, 02 Mar 2012 08:56:54 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=396</guid>
		<description><![CDATA[I&#8217;ve got two parser combinators today for you to play with, both whipped up this evening from pieces of earlier experiments. Parser 5: PEG grammar without memoization This is loosely based on Daan Leijen and Erik Meijers&#8217; 2001 paper [1]. &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/some-parser-combinators-for-python/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve got two parser combinators today for you to play with, both whipped up this evening from pieces of earlier experiments.</p>
<p><span id="more-396"></span></p>
<p><strong>Parser 5: PEG grammar without memoization</strong></p>
<p>This is loosely based on Daan Leijen and Erik Meijers&#8217; 2001 paper [1]. I say loosely as it lacks all the important elements demonstrated in the paper &#8212; efficiency, useful error messages, etc. &#8212; but it is a monadic parser.</p>
<p>(Why do you want a monadic parser? <a href="http://www.valuedlessons.com/2008/04/you-could-have-invented-monadic-parsing.html">This author explains better than I</a>) </p>
<p>Here, the type of a parser is &#8216;string -&gt; maybe (a, string)&#8217;, where &#8216;a&#8217; is your parse tree, and the result string is the remaining input. If the parse fails, None is returned instead of a tuple.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;"># simplified monadic (PEG) parser. no memoization, some backtracking.</span>
<span style="color: #808080; font-style: italic;">#  parser :: str -&amp;gt; maybe (value, str)</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> ret <span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff7700;font-weight:bold;">lambda</span> s: <span style="color: black;">&#40;</span>value, s<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> bind <span style="color: black;">&#40;</span>p, <span style="color: #66cc66;">*</span>fs<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        res = p<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> f <span style="color: #ff7700;font-weight:bold;">in</span> fs:
            res = res <span style="color: #ff7700;font-weight:bold;">and</span> f<span style="color: black;">&#40;</span>res<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#40;</span>res<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> res
    <span style="color: #ff7700;font-weight:bold;">return</span> parse</pre></td></tr></table></div>

<p>These are the monad operators return and bind. &#8216;Ret(v)&#8217; produces a parser that consumes and empty string, producing the parse tree &#8216;v&#8217;, but the juice is in &#8216;bind&#8217;.</p>
<p>&#8216;Bind(p, f)&#8217; glues together a parser and a function. A parser &#8216;p&#8217; consumes some input and produces a parse tree &#8216;v&#8217;. This value &#8216;v&#8217; is then passed to a function &#8216;f&#8217;, returning a <em>new parser</em> to consume the remaining input. That is, &#8216;f&#8217; chooses, based on the parse tree so far, which language to use to interpret the rest of the input. </p>
<p>This is extremely powerful, and permits monadic parsers the ability to recognize classes of context sensitive grammars, such as parsing XML, or loading new languages on the fly as you parse (Perl6, anyone?) or perhaps backtracking to recover if an evaluation of the parse tree fail (<a href="http://www.perlmonks.org/?node_id=663393">Perl5, anyone?</a>).</p>
<p>It can also inhibit various parser optimizations if you&#8217;re not careful.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>15
16
17
18
19
20
21
22
23
</pre></td><td class="code"><pre class="python" style="font-family:monospace;">never = <span style="color: #ff7700;font-weight:bold;">lambda</span> s: <span style="color: #008000;">False</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> alt <span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> p <span style="color: #ff7700;font-weight:bold;">in</span> ps:
            res = p<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> res: <span style="color: #ff7700;font-weight:bold;">return</span> res
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">False</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> parse</pre></td></tr></table></div>

<p>&#8216;never&#8217; is the MonadZero value, a sort of additive identity for the &#8216;alt&#8217; <a href="http://www.haskell.org/haskellwiki/MonadPlus">MonadPlus</a> operator. &#8216;alt&#8217; produces a parser that will recognize any of the languages passed to it. &#8216;never&#8217; recognizes nothing, so e.g. &#8216;alt(p, never)&#8217; is equivalent to &#8216;p&#8217; in the same way as &#8216;bind(p, ret)&#8217; is equivalent to &#8216;p&#8217;.</p>
<p>You can ignore these equivalences for now &#8212; they&#8217;re useful though when it comes time to optimize grammars, but that&#8217;s not on today&#8217;s agenda.</p>
<p>From here on out, we have a handful of various other operators:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
</pre></td><td class="code"><pre class="python" style="font-family:monospace;">empty = ret<span style="color: black;">&#40;</span><span style="color: #008000;">None</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> then <span style="color: black;">&#40;</span>p, <span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>p, <span style="color: #ff7700;font-weight:bold;">lambda</span> x: bind<span style="color: black;">&#40;</span>then<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>, <span style="color: #ff7700;font-weight:bold;">lambda</span> rest: ret<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span>+rest<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> ps <span style="color: #ff7700;font-weight:bold;">else</span> ret<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> repeat <span style="color: black;">&#40;</span>p, res=<span style="color: black;">&#91;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> alt<span style="color: black;">&#40;</span> bind<span style="color: black;">&#40;</span>p, <span style="color: #ff7700;font-weight:bold;">lambda</span> v: repeat<span style="color: black;">&#40;</span>p, res+<span style="color: black;">&#91;</span>v<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>,
                ret<span style="color: black;">&#40;</span>res<span style="color: black;">&#41;</span> <span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">def</span> repeat1 <span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>p, <span style="color: #ff7700;font-weight:bold;">lambda</span> v: repeat<span style="color: black;">&#40;</span>p, <span style="color: black;">&#91;</span>v<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> pred <span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> s <span style="color: #ff7700;font-weight:bold;">and</span> f<span style="color: black;">&#40;</span>s<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#40;</span>s<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, s<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>:<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">else</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">False</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> parse
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> option <span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> alt<span style="color: black;">&#40;</span>p, empty<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> repn <span style="color: black;">&#40;</span>p, n<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> then<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>p<span style="color: black;">&#93;</span><span style="color: #66cc66;">*</span>n<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> char <span style="color: black;">&#40;</span>ch<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> pred<span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> c: c==ch<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> literal <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> then<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span><span style="color: black;">&#40;</span>char<span style="color: black;">&#40;</span>c<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> c <span style="color: #ff7700;font-weight:bold;">in</span> s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> delay <span style="color: black;">&#40;</span>fn<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>empty, <span style="color: #ff7700;font-weight:bold;">lambda</span> _:fn<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
end = <span style="color: #ff7700;font-weight:bold;">lambda</span> s: <span style="color: #008000;">False</span> <span style="color: #ff7700;font-weight:bold;">if</span> s <span style="color: #ff7700;font-weight:bold;">else</span> <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, s<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>&#8216;Empty&#8217; recognizes the empty string, &#8216;then&#8217; chains parsers in sequence, &#8216;repeat&#8217; is Klein star, &#8216;repeat1&#8242; likewise plus. &#8216;option&#8217; recognizes one or zero of a language, while &#8216;repn(p,n)&#8217; recognizes exactly n occurrences of p.</p>
<p>&#8216;Pred(fn)&#8217; consumes one character &#8216;c&#8217; for which &#8216;fn(c)&#8217; is true (e.g. using str.isalpha). &#8216;char(c)&#8217; recognizes the character &#8216;c&#8217;, and &#8216;literal(s)&#8217; the sequence of characters in the string &#8216;s&#8217;. &#8216;delay(lambda:p)&#8217; recognizes &#8216;p&#8217; &#8212; and is merely an artifact of strict evaluation, as &#8216;p&#8217; might not yet be defined. Finally, &#8216;end&#8217; recognizes only the end of the input string; not very useful in today&#8217;s example, but it can be handy in parsers that perform lookahead.</p>
<p>You may not have noticed as it whizzed by, but this parser doesn&#8217;t perform full backtracking. The &#8216;alt&#8217; operator attempts to parse using the first language, eagerly, and if it succeeds, returns immediately. If a subsequent parse operations should fail, no backtracking is performed to consider the alternate path. As such, this parser combinator library might be classified as a <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">parsing expression grammar</a>. Contrast this with <a href="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammars</a>, where both sides of the alternation are considered equal for backtracking / lookahead purposes.</p>
<p>Size: ~60 lines</p>
<p><strong>Parser 7: CSG with partial memoization</strong></p>
<p>This next parser adds a couple tweaks. First, the type of the parser has changed to &#8216;string -&gt; lcons of (a, string)&#8217;, where lcons is a kind of lazy list.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> lcons <span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span> <span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">iter</span><span style="color: black;">&#41;</span>: <span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span> = <span style="color: #008000;">iter</span><span style="color: #66cc66;">;</span> <span style="color: #008000;">self</span>.<span style="color: black;">value</span> = <span style="color: #008000;">None</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> force <span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span>:
            <span style="color: #ff7700;font-weight:bold;">try</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">value</span> = <span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span>.<span style="color: black;">next</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, lcons<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">StopIteration</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">value</span> = <span style="color: #008000;">None</span>
            <span style="color: #008000;">self</span>.<span style="color: #008000;">iter</span> = <span style="color: #008000;">None</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">value</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> empty<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">force</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> == <span style="color: #008000;">None</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> head <span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">force</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> tail <span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">force</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__iter__</span> <span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">self</span>.<span style="color: black;">empty</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">yield</span> <span style="color: #008000;">self</span>.<span style="color: black;">head</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span> = <span style="color: #008000;">self</span>.<span style="color: black;">tail</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> memoize<span style="color: black;">&#40;</span>fn, _memo=<span style="color: #008000;">dict</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> replc <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        k = <span style="color: black;">&#40;</span>fn, s<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> k <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #ff7700;font-weight:bold;">in</span> _memo:
            _memo<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = lcons<span style="color: black;">&#40;</span><span style="color: #008000;">iter</span><span style="color: black;">&#40;</span>fn<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> _memo<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> replc</pre></td></tr></table></div>

<p>&#8216;lcons&#8217; only role is ensure that an iteration is computed only once; otherwise, we&#8217;ll be using Python iterators. Speaking of memoization, we&#8217;ll also memoize of our monad parser so that we don&#8217;t recompute them on backtrack. This &#8230; will consume quite a bit of memory, but will reduce the backtracking costs in some key places. You can disable this with otherwise no change in behavior.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> ret <span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff7700;font-weight:bold;">lambda</span> s: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span>value, s<span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> bind <span style="color: black;">&#40;</span>p, f, <span style="color: #66cc66;">*</span>fs<span style="color: black;">&#41;</span>:
    @memoize
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> res1 <span style="color: #ff7700;font-weight:bold;">in</span> p<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">for</span> res2 <span style="color: #ff7700;font-weight:bold;">in</span> f<span style="color: black;">&#40;</span>res1<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#40;</span>res1<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">yield</span> res2
    <span style="color: #ff7700;font-weight:bold;">if</span> fs: <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>parse, <span style="color: #66cc66;">*</span>fs<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> parse
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> alt <span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>:
    @memoize
    <span style="color: #ff7700;font-weight:bold;">def</span> parse <span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> p <span style="color: #ff7700;font-weight:bold;">in</span> ps:
            <span style="color: #ff7700;font-weight:bold;">for</span> res <span style="color: #ff7700;font-weight:bold;">in</span> p<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">yield</span> res
    <span style="color: #ff7700;font-weight:bold;">return</span> parse
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> then <span style="color: black;">&#40;</span>p, <span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span>:
    more = then<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>ps<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> ps <span style="color: #ff7700;font-weight:bold;">else</span> <span style="color: #008000;">None</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> bind<span style="color: black;">&#40;</span>p, <span style="color: #ff7700;font-weight:bold;">lambda</span> x: bind<span style="color: black;">&#40;</span>more, <span style="color: #ff7700;font-weight:bold;">lambda</span> rest: ret<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span>+rest<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> ps <span style="color: #ff7700;font-weight:bold;">else</span> ret<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>The monad parser combinators have received a facelift. &#8216;bind(p,f)&#8217; now evaluates each possible parse of &#8216;p&#8217;, passing the result to &#8216;f&#8217; and then enumerating the results from that parser, lazily yielding them to be memoized by &#8216;lcons&#8217;.</p>
<p>&#8216;Alt&#8217; has similarly been updated &#8212; each possible parse result from each parser is yielded in turn.</p>
<p>&#8216;Then&#8217; gets a minor performance improvement. As we&#8217;re memoizing parse results, we get a boost by reusing the same tail parser each time.</p>
<p>&#8216;Ret&#8217;, &#8216;never&#8217;, &#8216;end&#8217;, &#8216;char&#8217;, and &#8216;pred&#8217; are each updated to return arrays of results, but are otherwise unchanged. All the other parser combinators remain as before.</p>
<p>As a result of these changes, parser7 now supports nearly the class of <a href="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammars</a> (CFG), save its inability to handle left recursion (perhaps the techniques in [2] could fix that). It can even handle ambiguous grammars, returning a parse forest instead of a parse tree. Of course, nothing is free &#8212; parser7 is slower than parser5, and most useful CFGs can be rewritten as PEGs with some effort.</p>
<p>Above and beyond CFGs, this parser continues to provide monadic bind, so can continue to parse a number of useful languages from the class of <a href="http://en.wikipedia.org/wiki/Context-sensitive_language">context sensitive languages</a>. For example, the mini-xml parser in the samples bellow runs great under both libraries.</p>
<p>It&#8217;s a memory hog, but it&#8217;s also ~90 lines of vanilla Python.</p>
<p><strong>Full source, samples, references</strong></p>
<p>I&#8217;ve put the parser combinators and some samples up on codepad.org so you can get a feel for what the output looks like. It&#8217;s not pretty, but it&#8217;s pretty well structured:</p>
<ul>
<li>parser5: <a href="http://codepad.org/P5l2l6dm">http://codepad.org/P5l2l6dm</a></li>
<li>parser7: <a href="http://codepad.org/pmpqp1wI">http://codepad.org/pmpqp1wI</a></li>
</ul>
<p>[1] <a href="http://research.microsoft.com/en-us/um/people/daan/download/papers/parsec-paper.pdf">Parsec: Direct Style Monadic Parser Combinators For The Real World</a>. Daan Leijen, Erik Meijer (2001).</p>
<p>[2] <a href="http://www.vpri.org/pdf/tr2007002_packrat.pdf">Packrat Parsers Can Support Left Recursion</a>. Alessandro Warth, James R. Douglass, Todd Millstein (2007).</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/some-parser-combinators-for-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Schrodinger&#8217;s Yacc</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/schrodingers-yacc/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/schrodingers-yacc/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 17:05:51 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[functional]]></category>
		<category><![CDATA[parsers]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=373</guid>
		<description><![CDATA[There was a small controversy last year about parser combinators, a convenient way of rapidly developing parsers in a functional style. Yacc is presumably chosen as the archetypal non-combinator parser generator, requiring separate external parser compiler, known for being a &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/schrodingers-yacc/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>There was a small controversy last year about <a href="http://en.wikipedia.org/wiki/Parser_combinator">parser combinators</a>, a convenient way of rapidly developing parsers in a functional style. Yacc is presumably chosen as the archetypal non-combinator parser generator, requiring separate external parser compiler, known for being a pain to use.</p>
<ul>
<li>&#8220;<a href="http://arxiv.org/abs/1010.5023">Yacc is dead</a>&#8221; (<a href="http://lambda-the-ultimate.org/node/4148">ltu discussion</a>)</li>
<li>&#8220;<a href="http://matt.might.net/articles/parsing-with-derivatives/">Yacc is not dead</a>&#8220;</li>
<li>&#8220;<a href="http://matt.might.net/articles/parsing-with-derivatives/">Yacc is dead: and update</a>&#8220;</li>
</ul>
<p>Like Schrodinger&#8217;s cat, Yacc seems to be indeterminately alive or dead (though the last article conclusively opened the box for me).</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/schrodingers-yacc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Concepts: Typeclasses for C++?</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/concepts-typeclasses-for-cpp/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/concepts-typeclasses-for-cpp/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 05:01:00 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[concepts]]></category>
		<category><![CDATA[generic programming]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=377</guid>
		<description><![CDATA[I&#8217;ve had a hypothesis for a while that C++ templates (paired at times with ADL) are an ad-hoc, unsound version of typeclasses. I&#8217;ve seen this hold for parser combinators, range base algorithms, and more. I&#8217;m also not the first to draw &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/concepts-typeclasses-for-cpp/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had a hypothesis for a while that C++ templates (paired at times with <a href="http://en.wikipedia.org/wiki/Argument-dependent_name_lookup">ADL</a>) are an ad-hoc, unsound version of typeclasses. I&#8217;ve seen this hold for <a href="http://parsnip-parser.sourceforge.net/">parser combinators</a>, <a href="http://www.boost.org/doc/libs/1_48_0/libs/range/doc/html/index.html">range base algorithms</a>, and more. I&#8217;m also not the first to draw this comparison[<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.2151">1</a>].</p>
<p><a href="http://en.wikipedia.org/wiki/Concepts_(C%2B%2B)">Concepts</a> are supposed to bring soundness in through constrained templates. Concepts look awfully a lot like type classes; they export functions and types, and are parameterized, and act as constraints on generic functions and other concepts. I checked the draft specification [<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2617.pdf">2</a>], and it even seems to permit parameterizing concepts on type constructors, just like Haskell! (er, C++ calls them class templates, not type constructors. <a href="http://en.wiktionary.org/wiki/tomato_tomato#Phrase">to-<em>may-</em>to, to-<em>mah-</em>to</a>)</p>
<p>But I worried I may be mistaken about concepts, as I&#8217;ve searched through google and literature and have yet to find a single example in literature demonstrating the use of template template concepts.</p>
<p>In case you&#8217;re curious what this might look like, here&#8217;s an educated guess:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">concept Monad<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">template</span> <span style="color: #000080;">&lt;&gt;</span> <span style="color: #0000ff;">class</span> m<span style="color: #000080;">&gt;</span> <span style="color: #008000;">&#123;</span>
  <span style="color: #0000ff;">template</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">typename</span> T, <span style="color: #0000ff;">typename</span> U<span style="color: #000080;">&gt;</span>
  m<span style="color: #000080;">&lt;</span>U<span style="color: #000080;">&gt;</span> mbind<span style="color: #008000;">&#40;</span>m<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span>, function<span style="color: #000080;">&lt;</span>m<span style="color: #000080;">&lt;</span>U<span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>T<span style="color: #008000;">&#41;</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #0000ff;">template</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">class</span> T<span style="color: #000080;">&gt;</span>
  m<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span> mreturn<span style="color: #008000;">&#40;</span>T<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">template</span> <span style="color: #000080;">&lt;</span><span style="color: #0000ff;">template</span> <span style="color: #000080;">&lt;</span><span style="color: #0000ff;">typename</span><span style="color: #000080;">&gt;</span> <span style="color: #0000ff;">class</span> m,
          <span style="color: #0000ff;">class</span> T,
          <span style="color: #0000ff;">class</span> U,
          <span style="color: #0000ff;">class</span> Iter<span style="color: #000080;">&gt;</span>
requires M<span style="color: #000080;">&lt;</span>m<span style="color: #000080;">&gt;</span>
requires InputIterator<span style="color: #000080;">&lt;</span>Iter, U<span style="color: #000080;">&gt;</span>
m<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span> foldM<span style="color: #008000;">&#40;</span>Iter begin, Iter end, T i, function<span style="color: #000080;">&lt;</span>m<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>T, U<span style="color: #008000;">&#41;</span><span style="color: #000080;">&gt;</span> f<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">if</span><span style="color: #008000;">&#40;</span>begin <span style="color: #000080;">==</span> end<span style="color: #008000;">&#41;</span>
        <span style="color: #0000ff;">return</span> mreturn<span style="color: #008000;">&#40;</span>i<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">else</span>
        <span style="color: #0000ff;">return</span> mbind<span style="color: #008000;">&#40;</span>f<span style="color: #008000;">&#40;</span>i, <span style="color: #000040;">*</span>begin<span style="color: #008000;">&#41;</span>, <span style="color: #008000;">&#91;</span><span style="color: #000080;">=</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">&#40;</span>T result<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#123;</span>
            Iter next <span style="color: #000080;">=</span> begin<span style="color: #008080;">;</span>
            <span style="color: #000040;">++</span>next<span style="color: #008080;">;</span>
            <span style="color: #0000ff;">return</span> foldM<span style="color: #008000;">&#40;</span>next, end, result, f<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #008000;">&#125;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Have template template concepts been covered thoroughly somewhere, and I&#8217;ve just missed it?</p>
<ul style="list-style-type: none;">
<li>[1] &#8220;<a href="(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.2151)">C++ templates/traits versus Haskell typeclasses</a>&#8221; (2005), by Sunil Kothari, Martin Sulzmann.</li>
<li>[2] &#8220;<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2617.pdf">Proposed Wording for Concepts (Revision 5)</a>&#8221; (2008).</li>
<li>[3] &#8220;<a href="http://herbsutter.com/2009/07/21/trip-report/">Trip Report: Exit Concepts, Final ISO C++ Draft in ~18 Months</a>&#8221; (2009), Herb Sutter.</li>
<li>[4] &#8220;ConceptClang: An Implementation of C++ Concepts in Clang&#8221; [<a href="http://www.generic-programming.org/software/ConceptClang/papers/wgp06v-voufo.pdf">pdf</a>]</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/concepts-typeclasses-for-cpp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Repls, repls, everywhere</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/repls-repls-everywhere/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/repls-repls-everywhere/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 16:46:41 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[languages]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=345</guid>
		<description><![CDATA[Have a new-years resolution to try out a new programming language, but in too much of a hurry to pick only one, or install anything? Online REPs and REPLs Today there&#8217;s 61 different languages on that list. That many, there&#8217;s &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/repls-repls-everywhere/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Have a new-years resolution to <a href="http://matt.might.net/articles/programmers-resolutions/">try out a new programming language</a>, but in too much of a hurry to pick only one, or install anything?</p>
<p><a href="http://joel.franusic.com/w/page/26128430/Online-REPs-and-REPLs">Online REPs and REPLs</a></p>
<p>Today there&#8217;s 61 different languages on that list. That many, there&#8217;s gotta be at least <em>one</em> that strikes your fancy.</p>
<p>Best batch execution site: <a href="http://ideone.com">ideone.com</a> with 50 unique languages.</p>
<p>Best interactive REPL site: <a href="http://repl.it/">repl.it</a> with 16 unique languages.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/repls-repls-everywhere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bondage and Discipline Python</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/bondage-and-discipline-python/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/bondage-and-discipline-python/#comments</comments>
		<pubDate>Fri, 18 Mar 2011 05:00:23 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=320</guid>
		<description><![CDATA[Python has an identity crisis sometimes. It starts with the premise, from Guido&#8217;s prior work on ABC, to make a simple but easy to understand language. But then turns around and cries out &#8220;one way to do it&#8220;, leaving the &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/bondage-and-discipline-python/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Python has an identity crisis sometimes. It starts with the premise, from Guido&#8217;s prior work on ABC, to make a simple but easy to understand language.</p>
<p>But then turns around and cries out &#8220;<a href="http://www.python.org/dev/peps/pep-0020/">one way to do it</a>&#8220;, leaving the programmer perplexed as to how Guido van Rossum thought we should do things. For example, Guido <a href="http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html">hates tail calls</a>, so recursion isn&#8217;t the one way to do it that he picked (note that his blog post and followups contain a large number of <a href="http://ra3s.com/wordpress/dysfunctional-programming/2009/05/18/the-python-debate-on-tail-calls/">factual errors</a>; read it as an opinion piece only).</p>
<p>During these fits, Python suffers itself to be a <a href="http://www.jargon.net/jargonfile/b/bondage-and-disciplinelanguage.html">bondage and discipline</a> language.</p>
<p>Apparently there is hope. One bit that causes me particular pain is that nested functions cannot rebind variables in the outer scope; only read from them. A case in point:</p>
<p><code> </code></p>
<pre><code>def f():
  x = 1
  def doubleIt():
    x *= 2 # local variable 'x' referenced before assignment
  doubleIt(); doubleIt();
  return x;</code></pre>
<p>However, I was reading a <a href="http://jedahu.blogspot.com/2010/08/why-i-like-factor.html">piece on Factor</a>, and it mentions that this restriction is lifted in Python 3.0. I&#8217;m still on 2.6 (the only differences I had been aware of were the somewhat arbitrary swapping of the &#8216;/&#8217; and &#8221; operators, and the insulting <em>removal</em> of <a href="http://www.artima.com/weblogs/viewpost.jsp?thread=98196">map, filter, and reduce</a> from Python 3.0). However, fixing this scoping rule, even if an extra keyword is needed, sure would be convenient for me.</p>
<p>Actually, there are <a href="http://docs.python.org/release/3.0.1/whatsnew/3.0.html">a lot of improvements</a>:</p>
<ul>
<li>Various APIs returns views instead of mutable list (copies).</li>
<li><em>sorted</em> is now built in. (I&#8217;ve had to write that myself so many times&#8230;)</li>
<li>Apparently <em>map</em> and <em>filter</em> are still here (though he still went and put <em>reduce</em> all the way off in <em>functools</em>. I guess you can&#8217;t have it all -_-)</li>
<li>Set literals, yay!</li>
</ul>
<p>Now, if Guido would be so kind to add proper statement support to lambdas, I&#8217;ll switch ^_^. Oh, but Guido hates lambdas. Oh well, next time maybe.</p>
<p><em>&#8211; signed an ambivalent Python user</em></p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/bondage-and-discipline-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The case of the different shifts</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/the-case-of-the-different-shifts/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/the-case-of-the-different-shifts/#comments</comments>
		<pubDate>Sat, 12 Feb 2011 18:05:53 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[trivia]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=309</guid>
		<description><![CDATA[Larry Osterman has commented on an interesting edge case in the C/C++ standards, involving the underflow of the right shift operator. They reported that if they compiled code which calculated: 1130149156 &#62;&#62; -05701653 it generated different results on 32bit and &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/the-case-of-the-different-shifts/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Larry Osterman has commented on an interesting edge case in the C/C++ standards, involving the underflow of the right shift operator.</p>
<blockquote><p>They reported that if they compiled code which calculated: 1130149156 &gt;&gt; -05701653 it generated different results on 32bit and 64bit operating systems.  On 32bit machines it reported 0 but on 64bit machines, it reported 0x21a.</p></blockquote>
<p>This is one of various areas of &#8220;undefined behavior&#8221; for which you can ask 2 engineers what it should do and get 3 answers, at least one being that &#8220;but I know there must be one!&#8221; I think I know where at least 2 of the answers are coming from &#8230;<span id="more-309"></span><strong> </strong></p>
<p><strong>The Hardware shift</strong></p>
<p>On x64, Larry mentions that the <a href="http://en.wikibooks.org/wiki/X86_Assembly/Shift_and_Rotate">sar instruction</a> is used to carry out the operations. This instruction is not new to the x86 architecture, and in that architecture line&#8217;s early days you did <em>not</em> waste die space. So what&#8217;s cheap?</p>
<p>Enter the <a href="http://en.wikipedia.org/wiki/Barrel_shifter">barrel shifter</a>. It&#8217;s cheap, it&#8217;s small, you pick the maximum power-of-2 shift you want, and that&#8217;s what you pay for. For a 16-bit shift, you can support shifts of 0 to 15 and pay for 4 stages. If you want to shift out all 16, you might think &#8220;add another stage&#8221; &#8212; but it gets worse! It only takes rotation as input. If you need to identify greater shifts than 31, then you&#8217;re toast.</p>
<p>No, what you consider doing is detect shifts &gt; 15, and add a an extra stage to mask away the entire output. But wait, <a href="http://en.wikipedia.org/wiki/Intel_8086">it&#8217;s 1972</a>, and you don&#8217;t have either the die space nor the extra cycles to waste on a conditional masking stage (masking to 4 bits of shift input is free &#8212; you just don&#8217;t connect the other 12 input bits to anything). You just tell the programmer to never shift too much, use the bottom most bits, and ignore the rest. (1 &gt;&gt; 128) == 1.</p>
<p>In time the processor is expanded to support 32-bit quantities and 64-bit quantities, but each time you need to remain compatible, so the sar instruction continues to mask away the upper bits.</p>
<p><strong>The Programming Language</strong></p>
<p>I&#8217;m gonna gloss over the whole &#8220;<a href="http://en.wikipedia.org/wiki/Undefined_behavior">undefined behavior</a>&#8221; thing. Enough <a href="http://blog.regehr.org/archives/213">others</a> have discussed it. Even languages or implementations which aim to minimize it will occasionally miss edge cases or <a href="http://web.mit.edu/~axch/www/scheme/choices/r5rs-letrec.html">fall short</a>.</p>
<p>And C takes a very brutal route of making a great many things &#8220;undefined behavior&#8221; in the name of permitting performance shortcuts. For instance, signed arithmetic overflow yields undefined behavior. This might seem strange now, but at the time 2&#8242;s compliment arithmetic was not common. Choosing 2&#8242;s compliment wrap-around would have introduced perf penalties on 1&#8242;s compliment architectures as they did not generally include native 2&#8242;s compliment instructions, and vice-versa. A modern spin on this are special purpose DSP chips that default to saturating arithmetic.</p>
<p>In any case, I imagine a similar case stands for shifting. One architecture may prefer precise shifting, but x86 prefers to take shifts in modulo. Rather than specify one behavior and penalize compilation on all other hardware, the C language says instead &#8220;this area off limits &#8212; if you want a particular slow-but-precise behavior, implement it yourself&#8221;</p>
<p><strong>The Software Shift</strong></p>
<p>The 32-bit x86 CRT routine mentioned was _allshr (the extra leading underscore comes from <a href="http://blogs.msdn.com/b/oldnewthing/archive/2004/01/08/48616.aspx">cdecl calling convention</a> on x86), and you can find its source in your Visual C++ product installation, under %programfiles%/Microsoft Visual Studio 10.0/vc/crt/src/intel/llshr.asm. It leads off:</p>
<p><!--StartFragment--></p>
<div>
<pre>;
; Handle shifts of 64 bits or more (if shifting 64 bits or more, the result
; depends only on the high order bit of edx).
;
        cmp     cl,64
        jae     short RETSIGN</pre>
</div>
<p><!--EndFragment-->Interesting &#8212; for large shifts, it bails early. Following it is another branch, switching between different implementations for 0-31 and 32-63 bit. In any case, the resulting 3 code paths are short, with the cl &gt;= 64 path being the shortest of all.</p>
<p>Now, the designer could have considered masking instead of comparing &#8212; but for random shift amounts, that would have been a <em>longer</em> code sequence to execute*. And by most reasonable interpretations, returning the sign bit <em>is</em> the right answer.</p>
<p><strong>So there you have it</strong></p>
<p>Each implementation is most efficient and reasonable, when and where it was designed, and both are permitted by a language standard that valued efficiency above most else. And in that respect, both implementations were perfect &#8230; at least, they were when originally written.</p>
<p>* yes, the conditional branch is probably a bigger cost than the savings&#8230; today. While I don&#8217;t know when this code was written, it mentions PROC NEAR, leading me to think that perhaps it comes from a time before <a href="http://en.wikipedia.org/wiki/Superscalar">superscalar architectures</a>, where branches were cheap(er) and arithmetic was costly. You might be want to change it now, but that decision opens up <a href="http://en.wikipedia.org/wiki/Backwards_compatibility">back-compat</a> questions.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/the-case-of-the-different-shifts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Typography for Lawyers</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/typography-for-lawyers/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/typography-for-lawyers/#comments</comments>
		<pubDate>Thu, 22 Apr 2010 15:00:05 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[non-tech]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=132</guid>
		<description><![CDATA[&#8230; And for everyone else. Typography for Lawyers; an easy read, very accurate, and despite the title, the typography advice is really for everybody. Well, maybe not for typographers. I get the impression this is all entry level stuff. For &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/typography-for-lawyers/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>&#8230; And for everyone else. <a href="http://www.typographyforlawyers.com/">Typography for Lawyers</a>; an easy read, very accurate, and despite the title, the typography advice is really for <em>everybody</em>.</p>
<p>Well, maybe not for typographers. I get the impression this is all entry level stuff.</p>
<p><span id="more-132"></span>For instance,</p>
<blockquote><p>I understand that many people were taught early in life to double-space their sentences. I was too. But double-spacing is a habit held over from the typewriter age. It has never been part of standard typography. Because typewriter fonts were unusually proportioned, a double space helped set off sentences better. Today, since we don’t use typewriter fonts, double spaces aren’t necessary or desirable.</p>
<p>Let’s see that paragraph again, but with double spaces:</p>
<p>I understand that many people were taught early in life to double-space their sentences.   I was too.   But double-spacing is a habit held over from the typewriter age.   It has never been part of standard typography.   Because typewriter fonts were unusually proportioned, a double space helped set off sentences better.   Today, since we don’t use typewriter fonts, double spaces aren’t necessary or desirable.</p></blockquote>
<p>Wow. Not only do they teach you, but they then demonstrate, <em>right there</em> in the middle of the lesson.  It can&#8217;t be easier.</p>
<p>(Yes, I had to rewrite this post to remove all those double-space-after-periods that I too had been trained to type)</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/typography-for-lawyers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cache effects</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/cache-effects/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/cache-effects/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 05:00:35 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=126</guid>
		<description><![CDATA[Quick call out to an illustrative blog entry on various cache effects. http://igoro.com/archive/gallery-of-processor-cache-effects/ When someone bugs me about &#8220;X is too slow, because it has to make a virtual call&#8221;, and I get annoyed, it&#8217;s because a hot virtual call &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/cache-effects/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Quick call out to an illustrative blog entry on various cache effects.</p>
<p><a href="http://igoro.com/archive/gallery-of-processor-cache-effects/">http://igoro.com/archive/gallery-of-processor-cache-effects/</a></p>
<p>When someone bugs me about &#8220;X is too slow, because it has to make a virtual call&#8221;, and I get annoyed, it&#8217;s because a hot virtual call is an overhead of some dozen cycles or so.  Missing the cache?  In the thousands.  Don&#8217;t get me wrong, virtual calls can matter for many reasons, but that all flies out the window the moment you&#8217;re working on any non-trivial sized data set.  If your objects are 100&#8242;s of bytes large, you don&#8217;t worry about the virtual calls, you worry about shuffling their member slots around to squeeze more out of your caches.</p>
<p><strong><em><span style="font-style: normal; font-weight: normal;">&#8220;It better not allocate&#8221;</span></em></strong></p>
<p>My other favorite perf quote from this month: &#8220;It better not allocate &#8212; this call needs to take 100 microseconds or less&#8221;.  On my dev box, on the default Win7 heap, an uncontended small allocation (and the pairing free) is 120 ns &#8212; or 0.12 microseconds.  My personal favorite small object allocator can hit down to 0.020 microseconds sustained.</p>
<p>We could allocate thousands of objects per call and still come in under budget.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/cache-effects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>[ot] 500 mile email bug</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/ot-500-mile-email-bug/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/ot-500-mile-email-bug/#comments</comments>
		<pubDate>Sat, 20 Feb 2010 03:23:03 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[humor]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=121</guid>
		<description><![CDATA[Just a humorous software bug story, the case of the 500 mile email: "We're having a problem sending email out of the department." "What's the problem?" I asked. "We can't send mail more than 500 miles," the chairman explained. I &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/ot-500-mile-email-bug/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Just a humorous software bug story, <a href="http://www.ibiblio.org/harris/500milemail.html">the case of the 500 mile email</a>:</p>
<blockquote>
<pre>"We're having a problem sending email out of the department."

"What's the problem?" I asked.

"We can't send mail more than 500 miles," the chairman explained.

I choked on my latte.  "Come again?"
</pre>
</blockquote>
<p>A must read for CS/IT folks.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/ot-500-mile-email-bug/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My programming languages story</title>
		<link>http://ra3s.com/wordpress/dysfunctional-programming/my-programming-languages-story/</link>
		<comments>http://ra3s.com/wordpress/dysfunctional-programming/my-programming-languages-story/#comments</comments>
		<pubDate>Sat, 13 Feb 2010 18:14:47 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[languages]]></category>

		<guid isPermaLink="false">http://ra3s.com/wordpress/dysfunctional-programming/?p=109</guid>
		<description><![CDATA[It&#8217;s bad style, but I must start with an aside:  on reddit/scheme, there was a link to a blog series on developing a Scheme interpreter over January 2010.  It might not implement any particular Scheme standard or particularly many libraries, &#8230; <a href="http://ra3s.com/wordpress/dysfunctional-programming/my-programming-languages-story/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s bad style, but I must start with an aside:  on <a href="http://www.reddit.com/r/scheme/">reddit/scheme</a>, there was a link to a blog series on <a href="http://peter.michaux.ca/index#Scheme%20from%20Scratch">developing a Scheme interpreter</a> over January 2010.  It might not implement any particular Scheme standard or particularly many libraries, but it&#8217;s got all the functional elements.  <a href="http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29">Bootstrapping</a> a programming language is fun and easy.</p>
<p>Anyway, he also posted a <a href="http://peter.michaux.ca/articles/my-road-to-lisp">his personal history</a> of programming language study, and it got me thinking about my own personal programming languages history.</p>
<p>It all started with Logo&#8230;</p>
<p><span id="more-109"></span></p>
<p>1992: <a href="http://en.wikipedia.org/wiki/Logo_%28programming_language%29">Logo</a>/LogoWriter.  After a basic intro class (4th grade) of moving a turtle around, I would occasionally try out various programming exercise cards, where I learned about variables, loops, and functions.  Unfortunately, the advanced exercises required a version of the software the school didn&#8217;t have.</p>
<p>1995: <a href="http://www.amazon.com/dp/1568840764">VB for dummies</a>, and a cloned copy of a <a href="http://en.wikipedia.org/wiki/Visual_basic">Visual Basic</a> 3 floppy diskette.  Remember floppy disks?  Yeah, me neither.  First real introduction to arrays, graphics and UI controls.  Wanted to make a clock app, so I learned trig early from my dad with sketches of triangles on a post-it note.  Some time later got to enjoy VB 6, GDI, and BitBlt for bitmapped graphics.  I remember hating BitBlt, it seemed to have a really annoying set of params; I wished it was written in a real programming language, like VB for instance.</p>
<p>1999: <a href="http://en.wikipedia.org/wiki/C_programming">C</a>, <a href="http://en.wikipedia.org/wiki/Java_programming">Java</a>.  I don&#8217;t remember the exact order, but I made the jump from VB to C (with thanks to <a href="http://www.amazon.com/dp/0131103628/">K&amp;R</a>), and mode 13h graphics.  That was the simple mode right?  First introduction to algorithms, wrote a 2d &#8216;Doom&#8217;-style raytracer after reading about it in one of those so-so <a href="http://www.amazon.com/dp/0672305623">game programming in 21 days </a>books.  I say &#8216;so-so&#8217; because to even have a hope of learning anything about a programming topic in <a href="http://norvig.com/21-days.html">only 21-days</a>, it either has to have very narrow focus or, in this case, absolutely no depth. Also spent a little time in C++, but found it very complicated and confusing; I only got as far as I did because the AP test for Computer Science was in C++ the year I was taking it. Finally finished up this period hacking around in Java, made a bunch of browser Applets, as was the style at the time.</p>
<p>2000: <a href="http://en.wikipedia.org/wiki/VHDL">VHDL</a>, <a href="http://en.wikipedia.org/wiki/Perl">Perl</a>, <a href="http://en.wikipedia.org/wiki/Java_(programming_language)">Java</a> &#8211; Skipped intro CS courses.  Learning VHDL for digital logic course, and Java for algorithms courses.  Picked up a book on Perl for the fun of it, used it to parse and process Counter-Strike logs and generate webpages of stats.</p>
<p>2001: back to <a href="http://en.wikipedia.org/wiki/BASIC">Basic</a>s &#8211; Spent the summer working at a computer camp for kids, instructing in robotics kits programmed in Parallax Basic.  Later that year, switched back to Visual Basic; the <a href="http://en.wikipedia.org/wiki/Visual_basic_.net">.Net</a> version, which now rivaled Java in functionality.  This ended up being my language of choice for some time, marrying true OO semantics like Java with the wonderful Basic syntax.</p>
<p>2003: honed my programming and design skills finishing my CS degree.  Touched <a href="http://en.wikipedia.org/wiki/Prolog">Prolog</a> and <a href="http://en.wikipedia.org/wiki/Lisp_%28programming_language%29">Lisp</a> briefly in one of those programming languages survey courses, but didn&#8217;t have a particularly good instructor.  Senior year I interned at Microsoft where I got to dig back into C for a while.</p>
<p>2004: <a href="http://en.wikipedia.org/wiki/C_Sharp_%28programming_language%29">C#</a> &#8211; Started fulltime at Microsoft, where I was to be working in C#, so I learned C#.  &#8220;Oh, this is basically VB.Net with a Java-like syntax.&#8221;  Heh, yeah, sounds funny now maybe, but only because (statistically) you don&#8217;t know anyone who uses VB.Net.  I still say, for CLR 1.0 languages, it had the better syntax.  Learned C# 2.0 <a href="http://en.wikipedia.org/wiki/Generic_programming">generics</a> while it was fresh off the line, and learned a bunch of stuff about the difference between the CLR&#8217;s programming model and that of C.</p>
<p>2006: <a href="http://en.wikipedia.org/wiki/C%2B%2B">C++</a> &#8211; Was told I was to be working in C++, so I learned C++.  <a href="http://www.amazon.com/dp/0596004966">C++ Pocket Reference</a> for syntax on the go, Stroustrup&#8217;s <a href="http://www.amazon.com/dp/B000MRSQUM">TC++PL</a> 2nd ed for semantics (make sure you get special edition though), Meyers&#8217; <a href="http://www.amazon.com/dp/0321334876">Effective C++ 3rd ed</a> for learning a huge selection of gotchas and design issues, and Alexandrescu&#8217;s <a href="http://www.amazon.com/dp/0201704315">Modern C++ Design</a> for teaching me an exciting selection of generic and <a href="http://en.wikipedia.org/wiki/Generic_programming">generative</a> programming techniques using templates.</p>
<p>2008: <a href="http://en.wikipedia.org/wiki/Python_%28programming_language%29">Python</a>, <a href="http://en.wikipedia.org/wiki/Javascript">JavaScript</a> &#8211; Sparked by a conversation with a co-worker, started learning Python.  Was really surprised about how much easier many programming problems became.  I realized I didn&#8217;t really have a good idea what was out there.  Tinkered with JavaScript.  Started looking into programming language <a href="http://en.wikipedia.org/wiki/History_of_programming_languages">origins</a>.</p>
<p>2009: <a href="http://en.wikipedia.org/wiki/Lisp_%28programming_language%29">Lisp</a>, <a href="http://en.wikipedia.org/wiki/Scheme_%28programming_language%29">Scheme</a>, <a href="http://en.wikipedia.org/wiki/Haskell_%28programming_language%29">Haskell</a>, <a href="http://en.wikipedia.org/wiki/Ml_programming_language">ML</a> &#8211; Inevitably a pass through Lisp.  It seems a lot of programmers go through this phase at one point, and it makes sense; <a href="http://www-formal.stanford.edu/jmc/history/lisp/node2.html">McCarthy created Lisp</a> precisely to describe computing, and that clarity is refreshing.  Scheme, and the book <a href="http://www.amazon.com/dp/0262011530">SICP</a>, introduced me to different computing models including first class continuations, <a href="http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-24.html#%_sec_3.5">streams</a> (aka lazy lists), and the <a href="http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-28.html#%_sec_4.3">amb operator</a>.  Haskell led to more exploration of lazy evaluation, better understanding and insights in generics, and its supremely practical <a href="http://www.haskell.org/tutorial/classes.html">typeclasses</a>.</p>
<p>Currently tinkering with Standard ML, as a simplified alternative to Haskell; hacking together small tasks in Python; and whatever my new day job brings in.</p>
<p>~~~</p>
<p>Oh, and somewhere in school I briefly tried &#8220;eMbedded Visual Basic&#8221;.  That&#8217;s where I first learned it&#8217;s possible to start with a good programming language and cut it down to something similar but that *<strong>really</strong>* sucks.  For those who never heard of it, they removed a variety of language features including user defined types, the compiler error messages were inscrutable, and it was slow.</p>
<p>If you were a developer on that project, I&#8217;m <em>really</em> sorry, but better languages have been implemented <a href="http://groups.csail.mit.edu/mac/projects/s48/">over a weekend</a>.  I don&#8217;t blame you.  Maybe you didn&#8217;t have a whole weekend.  Or were held hostage.  That would explain it I think.</p>
]]></content:encoded>
			<wfw:commentRss>http://ra3s.com/wordpress/dysfunctional-programming/my-programming-languages-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

