Sunday, February 22, 2009

The names behind the languages

If we're going to continue to spill blood in these language wars, it's important that we all know who our generals are (and who our enemy generals are). Feel free to offer corrections or extensions to the list.

C - Dennis Ritchie







Perl - Larry Wall
 








Monday, January 19, 2009

Venn Diagrams with google charts

I recently had the need to view some data sets as venn diagrams, so I found google's chart api and hacked out a little AJAX application to do so, you can try it out here. And, for the fun of it, a VERY simple Python implementation:



import urllib
from PIL import Image
import StringIO

def venn_diagram_url(a, b, c, width=400, height=200):
values = [width, height, len(a), len(b), len(c)]
values.append(len(a & b))
values.append(len(a & c))
values.append(len(c & b))
values.append(len(a & b & c))
base_str = 'http://chart.apis.google.com/chart?'
args_str = 'cht=v&chs=%sx%s&chd=t:%s,%s,%s,%s,%s,%s,%s'
return base_str + (args_str % tuple(values))

a = set((1, 2, 3))
b = set((2, 4))
c = set((3, 4, 5))

data = StringIO.StringIO(urllib.urlopen(venn_diagram_url(a, b, c)).read())
Image.open(data).show()


Not much to it... But maybe someone will find it useful.

Tuesday, November 25, 2008

Trampolining for recursion...

In essence, CPS (continuation passing style) is a method of calling functions where instead of returning a value, you pass the results to another function (the continuation) provided as an argument. I wont go too deep into the reasoning behind this, as it's usually not used directly but rather as a middle representation of programming languages by some compiler/interpreter (read more on the wikipedia link).

The big downside is that you're bombarding the stack with function frames for every call, going deeper and deeper, but never returning (you just keep passing what would be the return value to another function).

One nifty method of avoiding this is to use a trampoline. A trampoline resembles a stream/lazy-list to anyone who's read SICP. Essentially, instead of calling functions from within another function, you return a bounce "call" to the trampoline which makes the call for you, doing the same with the return value of that call until it gets a land (a non-bounce return). This avoids the use of the stack for function calls, but allows you to use function closure as a means of saving state.

That being said, here is a simple trampoline implementation in Python showing a recursive fibonacci function implemented in CPS.

Here is a normal fibonacci function, memoized, that fails due to exceeding Pythons recursion limit. The problem being, of course, that you have to recurse down deep enough to use the leaf calls to accumulate to the trunk...

Here, however, is essentially the same function (memoized) implemented using a trampoline. The trampoline allows it to "recurse" all the way to the leaf calls because the calls are actually being made external to the function and state is being preserved in the with_a/with_b closures.

Practical for typical application development? Probably not...

A neat concept to play around with? Definitely!

(Somewhat) related wikipedia articles: continuation, functional-programming, tail-call-optimization, thunk, closure

Wednesday, August 6, 2008

Python: know when to be lazy

Hanging around on Challenge-You! has exposed me to some good code, and some bad code... But mostly bad code :) It's an opportunity to take in the different approaches people use when solving the same problem, and the trends that exist between languages and apparent skill-level. One Python trend I've noticed quite a bit is the severe under-usage of generators. I often see list comprehension being used in situations where building a temporary list is not only wasteful, but more work.

Example:

print sum([x * 2 for x in something])

In this case, LC is being used to build a temporary list whose only purpose is to be iterated by "sum". Instead of constructing this list, why not pass an iterable object to "sum" and have it accumulate the sum from that? Well, not only can you, but it's even less to type (a whopping two characters less):

print sum(x * 2 for x in something)

Simply excluding the square brackets takes it from a list-comp to a generator expression, which is lazily evaluated. To demonstrate, proof that generators don't actually compute anything until they are pulled:

def f(x):
print x

a = [f("LC") for x in xrange(10)]
b = (f("GE") for x in xrange(10))

print "outside"
b.next()

print "outside"
b.next()

So naturally, these can be a really useful and compact way to transform a sequence. But the generator-expression syntax doesn't really allow for very complex stuff... Lucky Python also supports generator functions using the "yield" keyword that allows you to lazily preserve state between calls to "next"

def powersof2():
x = 1
while True:
yield x
x *= 2

g = powersof2()
print g.next()
print g.next()
print g.next()

And then, another thing I see is similar use of the map, filter, zip functions, which here in Python2.5 land return an entire list (this won't be an issue in Python3, however). For the time being, if you need a lazy map/filter, either use generator expressions or use the itertools module

from itertools import imap, ifilter

def even(x):
return not(x % 2)

print sum(ifilter(even, imap(int, "1 2 3 4 5 6 7 8 9".split())))

In this case, the imap/ifilter works better than generator expressions (IMO) because of the need for the strings as integers in the "even" function. With GE I'm forced to either call the int function twice, or to break it up across lines:

print sum(int(x) for x in "1 2 3 4 5 6 7 8 9".split() if even(int(x)))
# OR
ints = (int(x) for x in "1 2 3 4 5 6 7 8 9".split())
print sum(x for x in ints if even(x))

But, since this is lazy, the filtering is done as the mapping takes place (neither are bundled into any temporary lists or anything)

Likewise, using "range" instead of "xrange" for integer iteration is also a wasteful practice (this will be fixed in Python3 as well).

To wrap it up, just for fun (no practical value to a Python coder at all), how would you make a lazy sequence in Python if it didn't support generators or iterators? The way the lisp gods did before us... (lazy cons streams)

def fibonacci(a=0, b=1):
return (a, lambda: fibonacci(b, a + b))

def take(n, seq):
if n:
return (seq[0], lambda: take(n - 1, apply(seq[1])))

def lmap(f, seq):
while seq:
yield f(seq[0])
seq = apply(seq[1])

print sum(lmap(int, take(1000, fibonacci())))

(I know, I cheated with the lmap function, but Pythons recursive limit makes it a bit impractical)

Friday, July 25, 2008

GAE is insane!

A new google app engine update now allows for 10, yes TEN free apps for everybody (as well as downloadable logs)! Now I have no excuse not to give all of my project ideas a try, I guess the question is "what next?"

Some quick updates on my first GAE project, Challenge-You! (programming challenges and pastebin, soon to be generally code-centric site):
  • Forfeiting (if a challenge is too hard, or if you don't have time to solve it and just want to get straight to the answer and solutions, you can forfeit, but then you will be unable to submit a solution yourself and will be added to the our hall-of-shame for that challenge)
  • Downloading solution and pastebin source code in various formats (HTML, RTF, LaTex, SVG, BBCode, colored terminal...)
  • Show/hide button and...
  • Enhance button that adds information such as token types (when mouse-overed) as well as direct inline links to google code search for function/variable/class names.
  • Opened challenges without required answers up for anyone to view solutions. The only solutions that aren't directly viewable now are ones that require an answer which you haven't given yet (you have to solve them to see the solutions)
I removed the graph from the front page because generating the global language counts was starting to be a performance issue (I'm going to eventually reimplement it by keeping the usage for each language in its own table, but I have a lot of other stuff to do first) but I can assure you, Python is still very far in the lead for most used language on the site :)

Wednesday, July 23, 2008

GAE: Batch operations

Given the limit on individual requests with GAE, doing things like batch updates to your database (live) can be an interesting challenge. I recently had to update all of the votes for Challenge-You! to support a new rating system. The best approach I found was an AJAX-style solution. Have one request that returns a list of references to the entities you need to change (or if you're working with more than 1000 entities, you might want to partition the list into chunks by date or some other order, and work with one chunk at a time). Keeping those references in a stack in Javascript, pop them off one-by-one and send each to the controller that applies the change, repeating until the stack is empty. If the operations aren't too expensive, you could even send a handful of them per-request to speed things up. It's also trivial to implement some kind of progress bar or other graphical update to keep you informed.

And example of the Javascript code (using jQuery) that handles the client side:
var data = null;

function doupdate() {
if(data.length == 0) return;
$.ajax({
type:"POST",
url:"doupdate",
data:"id=" + data.pop(),
success:function(msg) {
setTimeout(function() {
doupdate();
}, 1000);
}
});
}

$(document).ready(function() {
$.ajax({
type:"POST",
url:"getlist",
data:"",
success:function(msg) {
data = msg.split(" ");
doupdate();
}
});
});

In this example, my site would have two urls for handling this, "getlist" which returns a space-delimited list of references/ids (however you're handling the entities) and "doupdate" which takes in one reference and applies the needed change.

Sunday, July 20, 2008

Speed kiddies

Speed kiddie:

Someone who learns or uses a low-level (often compiled) language for the constant coefficient performance increase. These people often have no regards for computational complexity and are usually confused by asymptotic notation. They frequently suffer from severe cases of premature optimization and believe that manageability comes second to runtime speed (in all applications). You can often find them trolling around web-forums bragging about their pointer or bit manipulation skills (the code is generally characterized by non-reusable design and a significant lack of readability).

Examples:
  • "What do you mean my algorithm runs at O(n!)? It's twice as fast as yours with this small test set"
  • "How do you write efficient code in Python... It doesn't even have pointers!"
  • "I don't need to write readable code, I'm not going to forget my own code..."
  • "High level languages are for the weak, real men enjoy pain... PAIN!!!"
  • "Lisp doesn't offer anything new to me, C has macros too!"
  • "RAWRRR POOINTERRRZZZ!!!!!!!!!!!!!!!!!!"