Saturday, November 5, 2011

Planetary states API

Update: This API is deprecated. Use the new json api instead.

I needed a way to deal with planetary positions and velocities and found NASA's HORIZONS and the ephemerides. But I wanted a simpler interface than telnet or lugging around the massive ephemeris files with my applications. So instead, I wrote a simple JSON api for dealing with ephemeris files.

Suppose one wanted to get the chebyshev coefficients for computing mercury's state for today's date (November 5th), the URL query would look like this:
http://www.astro-phys.com/api/coeffs?date=2011-11-5&bodies=mercury

Which would return a JSON object whose structure looks like this:

{
"date": 2455870.5,
"results": {
{"mercury": {
"coeffs": ...
"start": 2455856.5,
"end": 2455872.5
}
}
}


Where "coeffs" contains the chebyshev coefficients for evaluating the state of mercury between the julian dates 2455856.5 and 2455872.5

To simplify it even further, you can grab the state of mercury at 9:30am on November 5th 2011 by using this url:

http://www.astro-phys.com/api/states?date=2011-11-5+9:30am&bodies=mercury


Which would return:

{
"date": 2455870.89583,
"results": {
"mercury": [
[30007449.557, -50119248.882, -29922524.4351],
[2879610.10503, 2030853.04543, 786401.74378]
]
}
}


Where the first array in "mercury" is the position vector (x, y, z) and the second array is the velocity vector (vx, vy, vz)

Multiple planets can be entered, comma-separated.

Applications requiring entire ephemeride records can use this:

http://www.astro-phys.com/api/records?date=2011-11-5+9:30am


This will give you...

{
"date": 2455870.89583,
"start": 2455824.5,
"end": 2455888.5,
"results": {
"mercury": [
[...]
],
...
}
}


Where date is the date asked for, start is the beginning of the record and end is the end of the record. "results" contains every ephemeris body mapped to a list of its coefficient chunks for that record.

The ephemeris being used is DE406 for the time being, though I may add others later (and a backwards compatible means of specifying).

It doesn't include the full date range of DE406 yet, it only contains dates between 2000-2200 (I'll be adding more as needed, if you request a range increase it will probably be granted).

To see a web application using it in action, visit http://www.astro-phys.com and click "start" (its streaming the records to evaluate positions for the planets, the interface can be dragged and zoomed with the mouse).

Lastly, the constants section of the ephemeris is also available from the url

http://www.astro-phys.com/api/constants


When querying for coefficients, you can't ask for earth or moon directly. You have to use "earthmoon" (the earthmoon barycenter) and "geomoon" (the geocentric moon) and compute their states from those. However, when querying for "states", astro-phys does this for you.

PS: All api queries can also take a '&callback=somefunction' to be treated as jsonp. This works great with jquery's getJSON

Here's an example using jQuery.getJSON (note, when date is missing the current time is assumed)
var url = 'http://www.astro-phys.com/api/states?callback=?';
$.getJSON(url, {bodies: 'mercury'}, function(data) {
var p = data.results.mercury[0];
var v = data.results.mercury[1];
alert('Position:\nx='+p[0]+'\ny='+p[1]+'\nz='+p[2]);
alert('Velocity:\nx='+v[0]+'\ny='+v[1]+'\nz='+v[2]);
});

Saturday, May 14, 2011

Davy's law

Davy's Law: Computers can't compute/predict themselves.

No physical computer is capable of losslessly determining it's effect on every state in it's state space by means of internal simulation. That is, the only way it can accurately achieve the result of a state is by actually being put into that state.

The limiting factor here is the necessity of storing the entirety of it's state AND its rule set within it's alloted state (with room to spare for performing computation). This would be in violation of the pigeonhole principle's effect on lossless compression. Not to mention that if it were possible, it could simulate it simulating it simulating it... And unless it can solve the halting problem, that's probably just not a good feature for a system to have.

Davy's law does not prohibit computers from computing with isolated portions of it's state. It also does not state that there aren't some global state computations that are possible. That is, some states can be losslessly compressed and manipulated in this form. Rather it postulates that a computer cannot compute the results of every state for it's entire state space.

Thursday, April 28, 2011

Naive Bayes (and author detection)

I've been playing around with various classification algorithms lately, so I wrote a really simplified discrete naive bayes classifier in Python. No emphasis on sample correction, simplicity was key here, but it still works quite well.

from operator import itemgetter
from collections import defaultdict

class BayesClassifier:

def __init__(self):
self.total_count = 0 # Observations of individual attributes
self.class_count = defaultdict(int) # Observations of cls
self.attrs_count = defaultdict(int) # Observations of (cls, attrs)
self.correction = 0.0001 # Prevent multiplication by 0.0

def train(self, cls, attrs):
''' Add observation of 'attrs' as being an instance of 'cls' '''
self.class_count[cls] += 1
for attr in attrs:
self.attrs_count[(cls, attr)] += 1
self.total_count += 1

def rate(self, cls, attrs):
''' Return probability rating of 'attrs' being an instance of 'cls' '''
result = float(self.class_count[cls]) / self.total_count
for attr in attrs:
result *= self.attrs_count.get((cls, attr), self.correction)
return result / pow(self.total_count, len(attrs))

def classify(self, attrs):
''' Return most likely class that 'attrs' belongs to '''
rated_classes = [(self.rate(cls, attrs), cls) for cls in self.class_count]
rated_classes.sort(key=itemgetter(0), reverse=True)
return rated_classes[0][1]


Playing around with it I used various spam/not-spam training sets and various categorical training sets. Attributes can be labeled by the user instead of just "bag of words" lists by tagging the values in the attrs list, such as ['weekday:wed', 'weather:sunny', 'humidity:high']. Likewise, positional attributes can easily be tagged with their index ['0:this', '1:works', '2:well']. Its trivial to write a function that turns lists, objects, dicts, data models into such tagged attribute lists.

But playing with the algorithm in its "bag of words" form, I thought it would be neat to see how it does with authorship detection. Using an approach similar to spam/not-spam I trained it to classify quotes by author based on word and punctuation probabilities. In this example it parses brainyquote.com to train from the first pages of quotes by a given author, then tests the classifier with known quotes that weren't included on the first page. In a real world scenario, you'd want to train it with a much larger corpus, but in this case it works fairly well.

Here is it learning to classify between Richard Dawkins, George W Bush, and Charles Dickens :) (yes, I chose them for word contrast)
from urllib import urlopen
from BeautifulSoup import BeautifulSoup

def getwords(text):
''' Split text into words and useful punctuation tokens '''
import string
text = text.replace("'", '')
# Remove useless punctuation
for c in string.punctuation:
if c not in '.,;?!':
text = text.replace(c, ' ')
# Keep useful punctuation
for c in '.,;?!':
text = text.replace(c, ' puncuation:%s ' % c)
text = text.lower()
return [str(word) for word in text.split() if len(word) > 3]

def getquotes(author):
''' Return list of quotes by author from brainyquote.com '''
base_url = 'http://www.brainyquote.com/quotes/authors/%s/%s.html'
soup = BeautifulSoup(urlopen(base_url % (author[0], author)).read())
td = soup.find('td', {'align': 'left', 'valign': 'top', 'width': 440})
quotes = []
for quote in td.findAll('span', {'class': 'body'}):
quotes.append(quote.string)
return quotes

bayes = BayesClassifier()

# Train bayes with quotes by author
for author in ['richard_dawkins', 'george_w_bush', 'charles_dickens']:
for quote in getquotes(author):
bayes.train(author, getwords(quote))

test_data = [
['Bush',
"Government does not create wealth. The major role for the government is to create an environment where people take risks to expand the job rate in the United States."],
['Dawkins',
"There may be fairies at the bottom of the garden. There is no evidence for it, but you can't prove that there aren't any, so shouldn't we be agnostic with respect to fairies?"],
['Dickens',
"I have known a vast quantity of nonsense talked about bad men not looking you in the face. Don't trust that conventional idea. Dishonesty will stare honesty out of countenance any day in the week, if there is anything to be got by it."],
]

# Test bayes with untrained quotes
for author, quote in test_data:
guess = bayes.classify(getwords(quote))
print 'Classified as %s, should be %s' % (guess, author)