Wednesday, August 6, 2008

Python: know when to be lazy

Lazy is good. One Python trend I've noticed quite a bit is the severe under-usage of generators. I often see list comprehension being used in situations where building a temporary list is not only wasteful, but more work.

Example:

print sum([x * 2 for x in something])

In this case, LC is being used to build a temporary list whose only purpose is to be iterated by "sum". Instead of constructing this list, why not pass an iterable object to "sum" and have it accumulate the sum from that? Well, not only can you, but it's even less to type (a whopping two characters less):

print sum(x * 2 for x in something)

Simply excluding the square brackets takes it from a list-comp to a generator expression, which is lazily evaluated. To demonstrate, proof that generators don't actually compute anything until they are pulled:

def f(x):
print x

a = [f("LC") for x in xrange(10)]
b = (f("GE") for x in xrange(10))

print "outside"
b.next()

print "outside"
b.next()

So naturally, these can be a really useful and compact way to transform a sequence. But the generator-expression syntax doesn't really allow for very complex stuff... Lucky Python also supports generator functions using the "yield" keyword that allows you to lazily preserve state between calls to "next"

def powersof2():
x = 1
while True:
yield x
x *= 2

g = powersof2()
print g.next()
print g.next()
print g.next()

And then, another thing I see is similar use of the map, filter, zip functions, which here in Python2.5 land return an entire list (this won't be an issue in Python3, however). For the time being, if you need a lazy map/filter, either use generator expressions or use the itertools module

from itertools import imap, ifilter

def even(x):
return not(x % 2)

print sum(ifilter(even, imap(int, "1 2 3 4 5 6 7 8 9".split())))

In this case, the imap/ifilter works better than generator expressions (IMO) because of the need for the strings as integers in the "even" function. With GE I'm forced to either call the int function twice, or to break it up across lines:

print sum(int(x) for x in "1 2 3 4 5 6 7 8 9".split() if even(int(x)))
# OR
ints = (int(x) for x in "1 2 3 4 5 6 7 8 9".split())
print sum(x for x in ints if even(x))

But, since this is lazy, the filtering is done as the mapping takes place (neither are bundled into any temporary lists or anything)

Likewise, using "range" instead of "xrange" for integer iteration is also a wasteful practice (this will be fixed in Python3 as well).

To wrap it up, just for fun (no practical value to a Python coder at all), how would you make a lazy sequence in Python if it didn't support generators or iterators? The way the lisp gods did before us... (lazy cons streams)

def fibonacci(a=0, b=1):
return (a, lambda: fibonacci(b, a + b))

def take(n, seq):
if n:
return (seq[0], lambda: take(n - 1, apply(seq[1])))

def lmap(f, seq):
while seq:
yield f(seq[0])
seq = apply(seq[1])

print sum(lmap(int, take(1000, fibonacci())))

(I know, I cheated with the lmap function, but Pythons recursive limit makes it a bit impractical)