Jan 8, 2015

Expressive Functional Programming with Continuations in Python

In Python, statements and expressions are separate and unequal citizens. For example, statements can contain expressions but expressions can't contain statements. This prevents things like full-powered anonymous functions which can do everything that named functions can, or expression calculations which contain statements like 'import' etc.

I wanted to fix this, to give Python an expressiveness which I thought it was missing. I first thought that we could introduce a new syntax token which could 'wrap' up any number of statements and return the value of the final expression in the block. This turned out to not be feasible, for a couple of reasons, as discussed in the mailing list.

Then I had the idea that we can turn all statements into expressions. In Python, function calls are expressions. So statements can be wrapped up inside functions to turn them into expressions. Since there aren't that many statements in Python, it's feasible to just do this for all of them. Each of these functions can take a continuation to represent 'the rest of the computation'. If all statements become expressions, and so everything is an expression, this lets us define full anonymous functions because now everything inside the anonymous function can be a single expression!

Implementation

To manage the continuations, we first define a data structure: a pair of (continuation, expression) which is passed around among the functions we'll define later, named whatever_. expression in the pair is meant to represent the input to the continuation, if it takes one.

A 'dead' continuation doesn't take an input value.

    def __dead(k): return (k, None)

A 'live' continuation takes an input value.

    def __live(k, x): return (k, x)

Continuations are run using trampolining to prevent stack overflow, allowing us to overcome the restriction of Python not optimising tail calls. The trampoline takes care of running each continuation with input or not depending on whether it's live or dead.

    def run_k_(prog):
      (k, x) = prog

      while k is not None:
        if x is None: (k, x) = k()
        else: (k, x) = k(x)

In short, a trampoline function like the above converts recursion into iteration. If you want to learn more about trampolines, here's a beautifully simple description of the idea and some examples in Python.

Some 'Syntax' Definitions

Below I define some 'syntax' in the form of functions which use continuation-passing style (CPS). So as discussed above, these functions are wrappers for the real syntax, and turn statements into expressions. Note that some of them are fairly simple versions of the real functionality.

A print function which just prints something and then doesn't return a value, it just sets up the rest of the computation to go ahead when it returns. This pattern can be used for all the Python statements.

    def print_(x, k):
      print x
      return __dead(k)

A primitive version of a 'let' binding. This 'returns' a value, i.e. the expr, bound to the parameter name in the continuation lambda. This pattern can be used for all expression-oriented syntax.

    def let_(expr, k): return __live(k, expr)

An assertion 'statement'.

    def assert_(expr, k):
      assert expr
      return __dead(k)

An if_ 'expression' can 'return' one of two values--so in other words it can pass on either of its input expressions to its continuation.

    def if_(test_expr, then_expr, else_expr, k):
      if test_expr: return __live(k, then_expr())
      else: return __live(k, else_expr())

A cond_ is basically a switch statement, but in the form of an expression that again 'returns' a value. Can be used as a replacement for Python's if ... elif ... else ... syntax.

    def cond_(test_pairs, default_expr, k):
      for (test_expr, then_expr) in test_pairs:
        if test_expr: return __live(k, then_expr())
      else: return __live(k, default_expr())

We can import a module and pass it along to a continuation, which will then use it and pass it along implicitly to its children continuations through the magic of closures. This is almost like a 'let' binding but it binds the name to the imported module instead of to an expression.

    def import_(mod_name, k):
      m = __import__(mod_name)
      return __live(k, m)

The try_ function is different from the others because it actually runs the 'try' continuation and (if needed) the 'except' continuation. This is because it can't just set up two different continuations to be run. It has to run one first to actually find out if there's an exception or not. It returns the 'finally' continuation because the 'finally' block should always be run whether or not an exception occurred, so it's a natural fit for being a continuation.

    def try_(expr_k, except_k, finally_k):
      try: run_k_(__dead(expr_k))
      except: run_k_(__dead(except_k))
      finally: return __dead(finally_k)

The for_ function also needs to run its act continuation on all the items in its seq (sequence), because there may be a lot of items and just queuing them up for running might build up a huge data structure. It's more efficient to flatten the structure at this point and then just set up the ending continuation after all that.

    def for_(seq, act, k):
      for x in seq: run_k_(__live(act, x))
      return __dead(k)

Usage

The end result is that we build and run a large expression that looks kind of like normal code on the left, but ends with a bunch of lambdas on the right of each line. Because Python doesn't have macros or autoclosures, we can't hide the fact that we're passing around lots of functions.

Note that we stop the 'program' at any point by passing in None as the next continuation. You can see this in the functions stored inside the dict below. Also note how we're storing code that looks pretty imperative inside the functions. If you squint a little bit you can kind of ignore the extra clutter and think of each line as a separate 'imperative' statement. Indentation becomes arguably more important here than in normal Python for readability.

Finally, remember that run_k_ ('run continuation') runs the whole thing. As long as we compose our program using only functions specially designed to work with the trampolining mechanism, like the ones above, it should all run fine.

    if __name__ == "__main__":
      run_k_(
        let_(
          { "foo":
              λ:
                print_("I pity the foo!", λ:
                for_(range(5), λ x:
                  print_(x, None), None)),
            "bar":
              λ:
                import_("math", λ m:
                let_(5, λ r:
    
                print_("The area of the circle is:", λ:
                print_(m.pi * r * r, None)))),
            "baz":
              λ:
                print_("None of your bazness!", None) },
          λ dispatch_dict:

        # Call to the function inside the dispatch dict is also handled by
        # the trampolining mechanism.
        dispatch_dict["bar"]()))

For the sake of readability, I've replaced all occurrences of the keyword 'lambda' above with the Greek symbol 'λ'. Of course in real Python code we'll use the keyword (search and replace should do it).

Obviously, you won't want to program like this in normal Python. It would drive people up the wall with crazy. But for specialised use cases like storing a lot of functions inside other data structures, callback-based event handling when you want to do something more complex than just call a function, or building DSLs in Python (hey, worse things have happened), this expressive method could come in very handy.

Update: this article describes a simpler, more primitive version of what I currently have. You can follow the latest developments at the GitHub repository.

No comments: