Showing posts from April, 2011

Naive Bayes (and author detection)

I've been playing around with various classification algorithms lately, so I wrote a really simplified discrete naive bayes classifier in Python. No emphasis on sample correction, simplicity was key here, but it still works quite well.

from operator import itemgetter
from collections import defaultdict

class BayesClassifier:

def __init__(self):
self.total_count = 0 # Observations of individual attributes
self.class_count = defaultdict(int) # Observations of cls
self.attrs_count = defaultdict(int) # Observations of (cls, attrs)
self.correction = 0.0001 # Prevent multiplication by 0.0

def train(self, cls, attrs):
''' Add observation of 'attrs' as being an instance of 'cls' '''
self.class_count[cls] += 1
for attr in attrs:
self.attrs_count[(cls, attr)] += 1
self.total_count += 1

def rate(self, cls, attrs):
''' Return probability rating of 'attrs' being an instance o…