Tuesday, November 25, 2008

Trampolining for recursion...

In essence, CPS (continuation passing style) is a method of calling functions where instead of returning a value, you pass the results to another function (the continuation) provided as an argument. I wont go too deep into the reasoning behind this, as it's usually not used directly but rather as a middle representation of programming languages by some compiler/interpreter (read more on the wikipedia link).

The big downside is that you're bombarding the stack with function frames for every call, going deeper and deeper, but never returning (you just keep passing what would be the return value to another function).

One nifty method of avoiding this is to use a trampoline. A trampoline resembles a stream/lazy-list to anyone who's read SICP. Essentially, instead of calling functions from within another function, you return a bounce "call" to the trampoline which makes the call for you, doing the same with the return value of that call until it gets a land (a non-bounce return). This avoids the use of the stack for function calls, but allows you to use function closure as a means of saving state.

That being said, here is a simple trampoline implementation in Python showing a recursive fibonacci function implemented in CPS:
class Call:
def __init__(self, fn, *args, **kwargs):
self.call = lambda: fn(*args, **kwargs)

def trampoline(obj):
while isinstance(obj, Call):
obj = obj.call()
return obj

def fibonacci(k, n):
if n < 2:
return Call(k, n)
def with_a(a):
def with_b(b):
return Call(k, a + b)
return Call(fibonacci, with_b, n - 1)
return Call(fibonacci, with_a, n - 2)

call = fibonacci(lambda x: x, 20)

print trampoline(call)
Here is a normal fibonacci function, memoized, that fails due to exceeding Pythons recursion limit. The problem being, of course, that you have to recurse down deep enough to use the leaf calls to accumulate to the trunk:
def memoized(func):
cache = {}
def decorator(*args):
if args not in cache:
cache[args] = func(*args)
return cache[args]
return decorator

@memoized
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 2) + fibonacci(n - 1)
print fibonacci(1000)
Here, however, is essentially the same function (memoized) implemented using a trampoline. The trampoline allows it to "recurse" all the way to the leaf calls because the calls are actually being made external to the function and state is being preserved in the with_a/with_b closures:
def kmemoized(func):
cache = {}
def decorator(k, *args):
if args in cache:
return Call(k, cache[args])
def with_value(value):
cache[args] = value
return Call(k, value)
return Call(func, with_value, *args)
return decorator

@kmemoized
def fibonacci(k, n):
if n < 2:
return Call(k, n)
def with_a(a):
def with_b(b):
return Call(k, a + b)
return Call(fibonacci, with_b, n - 1)
return Call(fibonacci, with_a, n - 2)
Practical for typical application development? Probably not...

A neat concept to play around with (or an interesting strategy at generating Python code)? Definitely!

(Somewhat) related wikipedia articles: continuation, functional-programming, tail-call-optimization, thunk, closure

Wednesday, August 6, 2008

Python: know when to be lazy

Lazy is good. One Python trend I've noticed quite a bit is the severe under-usage of generators. I often see list comprehension being used in situations where building a temporary list is not only wasteful, but more work.

Example:

print sum([x * 2 for x in something])

In this case, LC is being used to build a temporary list whose only purpose is to be iterated by "sum". Instead of constructing this list, why not pass an iterable object to "sum" and have it accumulate the sum from that? Well, not only can you, but it's even less to type (a whopping two characters less):

print sum(x * 2 for x in something)

Simply excluding the square brackets takes it from a list-comp to a generator expression, which is lazily evaluated. To demonstrate, proof that generators don't actually compute anything until they are pulled:

def f(x):
print x

a = [f("LC") for x in xrange(10)]
b = (f("GE") for x in xrange(10))

print "outside"
b.next()

print "outside"
b.next()

So naturally, these can be a really useful and compact way to transform a sequence. But the generator-expression syntax doesn't really allow for very complex stuff... Lucky Python also supports generator functions using the "yield" keyword that allows you to lazily preserve state between calls to "next"

def powersof2():
x = 1
while True:
yield x
x *= 2

g = powersof2()
print g.next()
print g.next()
print g.next()

And then, another thing I see is similar use of the map, filter, zip functions, which here in Python2.5 land return an entire list (this won't be an issue in Python3, however). For the time being, if you need a lazy map/filter, either use generator expressions or use the itertools module

from itertools import imap, ifilter

def even(x):
return not(x % 2)

print sum(ifilter(even, imap(int, "1 2 3 4 5 6 7 8 9".split())))

In this case, the imap/ifilter works better than generator expressions (IMO) because of the need for the strings as integers in the "even" function. With GE I'm forced to either call the int function twice, or to break it up across lines:

print sum(int(x) for x in "1 2 3 4 5 6 7 8 9".split() if even(int(x)))
# OR
ints = (int(x) for x in "1 2 3 4 5 6 7 8 9".split())
print sum(x for x in ints if even(x))

But, since this is lazy, the filtering is done as the mapping takes place (neither are bundled into any temporary lists or anything)

Likewise, using "range" instead of "xrange" for integer iteration is also a wasteful practice (this will be fixed in Python3 as well).

To wrap it up, just for fun (no practical value to a Python coder at all), how would you make a lazy sequence in Python if it didn't support generators or iterators? The way the lisp gods did before us... (lazy cons streams)

def fibonacci(a=0, b=1):
return (a, lambda: fibonacci(b, a + b))

def take(n, seq):
if n:
return (seq[0], lambda: take(n - 1, apply(seq[1])))

def lmap(f, seq):
while seq:
yield f(seq[0])
seq = apply(seq[1])

print sum(lmap(int, take(1000, fibonacci())))

(I know, I cheated with the lmap function, but Pythons recursive limit makes it a bit impractical)

Wednesday, July 23, 2008

GAE: Batch operations

Given the limit on individual requests with GAE, doing things like batch updates to your database (live) can be an interesting challenge. I recently had to update all of the votes for Challenge-You! to support a new rating system. The best approach I found was an AJAX-style solution. Have one request that returns a list of references to the entities you need to change (or if you're working with more than 1000 entities, you might want to partition the list into chunks by date or some other order, and work with one chunk at a time). Keeping those references in a stack in Javascript, pop them off one-by-one and send each to the controller that applies the change, repeating until the stack is empty. If the operations aren't too expensive, you could even send a handful of them per-request to speed things up. It's also trivial to implement some kind of progress bar or other graphical update to keep you informed.

And example of the Javascript code (using jQuery) that handles the client side:
var data = null;

function doupdate() {
if(data.length == 0) return;
$.ajax({
type:"POST",
url:"doupdate",
data:"id=" + data.pop(),
success:function(msg) {
setTimeout(function() {
doupdate();
}, 1000);
}
});
}

$(document).ready(function() {
$.ajax({
type:"POST",
url:"getlist",
data:"",
success:function(msg) {
data = msg.split(" ");
doupdate();
}
});
});

In this example, my site would have two urls for handling this, "getlist" which returns a space-delimited list of references/ids (however you're handling the entities) and "doupdate" which takes in one reference and applies the needed change.

Sunday, July 20, 2008

Speed kiddies

Speed kiddie:

Someone who learns or uses a low-level (often compiled) language for the constant coefficient performance increase. These people often have no regards for computational complexity and are usually confused by asymptotic notation. They frequently suffer from severe cases of premature optimization and believe that manageability comes second to runtime speed (in all applications). You can often find them trolling around web-forums bragging about their pointer or bit manipulation skills (the code is generally characterized by non-reusable design and a significant lack of readability).

Examples:
  • "What do you mean my algorithm runs at O(n!)? It's twice as fast as yours with this small test set"
  • "How do you write efficient code in Python... It doesn't even have pointers!"
  • "I don't need to write readable code, I'm not going to forget my own code..."
  • "High level languages are for the weak, real men enjoy pain... PAIN!!!"
  • "Lisp doesn't offer anything new to me, C has macros too!"
  • "RAWRRR POOINTERRRZZZ!!!!!!!!!!!!!!!!!!"

Sunday, January 13, 2008

Javascript == bad

Along the way while writing Epy, I've noticed some things about Javascript (a language I once considered perfectly reasonable) that are in serious need of a revamp (at least at the time of writing this). Here are just a few of the (many) issues I have with Javascript:

  1. Namespaces. In the world of most Javascript, everything is global or function-scope, there seems to be no use of namespace or module-style development at all. It's possible to simulate namespaces by wrapping entire modules in an object definition, but this is usually too much of a hassle for practicality.
  2. Objects. Javascript's object system is, well, crap. Sure, it's ad-hoc like Python, but it has no formal way of creating simple constructors. You can use the "this" object in a function, or create a new instance of "Object" to append methods to, but there are no standards for constructing an object.
  3. Objects do not support destructors. Many might argue that a memory-safe language like Js doesn't need them, but destructors have more use than freeing memory. They serve a very useful purpose in a lot of OO designs, memory-safe or not.
  4. Javascript does not support associative arrays! You can use the "Array" object, or even the "Object" object to assign associative values to using the index operators, but this is deceiving... Any time you use "obj['index'] = something" you could replace it with "obj.index = something" which causes major problems with naming collisions. This style of object/hashtable is bad, there should be a clear distinction between the two.
  5. Javascript is not cross-browser compatible... Don't get me started...
Yet Javascript remains at the heart of our internet! Projects like Epy could help to solve some of these problems by abstracting away the issues allowing you to develop your client-side scripting in a more pleasant syntax (such as Python). But maybe someday we'll just use another language all-together :)