Need some color on this blog. Heres some detritus from data-sketching yesterday.

→ 2012-01-13         

CLUSTERING DATA IN PYTHON
[code] Python (2.6), GPL

Previously, I demonstrated clustering approximate geographic points from OpenPaths to identify places of interest. Heres the code I used to accomplish that, which can be applied to any dataset, it doesnt have to be limited to two dimensions. The algorithm is known as agglomerative hierarchical clustering because it works by repeatedly grouping the closest two nodes together, starting with each data point as a node, and ending with the entire set in a single monster node. (“Closest” is defined by any distance metric; for many applications, it will be euclidean distance, but for geographic data, Im using the haversine distance formula.) Along the way, youve constructed a binary tree which represents the hierarchical relationships between the vectors in the set. The result is a kind of ad-hoc taxonomy, and is frequently used to hypothesize relatedness between images, documents, proteins, users, etc. (heres a nice diagram courtesy Razvan Musaloiu-E.)

To use agglomerative clustering as a classifier, however, what we really want is just a flat set of clusters. It’s akin to choosing the branches of the tree that best represent the natural divisions in the data. Cut too close to the trunk and the clusters will be too general; cut by the leaves and youve got too much noise. With geographic information from OpenPaths, at least, we have a heuristic to decide what is appropriate — the platform is only going to be accurate to a quarter-mile or so, so we can look for the largest branches that do not exceed that limit. The code provides a method, get_pruned, that takes such a parameter and returns a resulting set of clusters.

Clustering packages for python are out there, but I prefer touching the code and this is a dirt simple implementation that nonetheless should prove effective for most creative coding purposes.

→ 2012-01-10         


STREET VIEW VIDEO VIA OPENPATHS API

[code] Python (2.6), GPL

Brainstorming with @blprnt this morning about what people might do with the new OpenPaths API, we thought it would be pretty awesome to see every place you’ve ever been via Google Street View.

Loading all of that up through the Google Maps interface seemed overly burdensome, so we figured there must be a way to pull the static tiles. Turns out there is (though it’s unofficial). @jaimethompson breaks it down for you.

From there, it was pretty straightforward to pull the points, scrape the images, and assemble the video. It includes points from September ‘10 to the present and a dozen or so cities, beginning in LA I think, but NYC clearly dominates. Non-urban spots arent captured well, and in Googleland it’s never winter. You might also notice that the granularity of the video increases at the end. That’s because at a certain point I start using the forthcoming OpenPaths app, which samples periodically, rather than the data from iTunes backups, which only looks at novel locations. The API pulls from both.

Want your own? I did this with python as usual — you can grab the code here if you’re interested (youll need PIL and the latest OpenCV bindings installed to export the video). This is a bit of a soft launch for the OP API as we gradually work in new features. Let me know if anyone gives it a try (especially if youre using a different language).

Noncoders fear not — we’ll hopefully be integrating something like this (but cooler and more blprnty) directly into the OpenPaths interface in the near future.

→ 2011-07-07         

A SIMPLE PROCESSING-INSPIRED DRAWING INTERFACE FOR PYTHON

[code] Python (2.6), GPL

In my recent work at the lab, I’ve been doing some simple datavis to make sense of the various biometrics I’ve been gathering. I’m committed to python for this sort of thing, but I’ve found it wanting for a dirt simple drawing library, absurd as it may seem given the number of available graphics packages. The power of Processing / processing.py, Nodebox, Field, matplotlib, etc, mean that they end up imposing on, if not dictating, the flow of a program. I wanted a single import that gives me basic, pythonic drawing commands and which can show me the result in as lightweight a way as possible.

I ended up putting together my own, which is a wrapper for PIL and aggdraw.

The idea certainly wasn’t to match the drawing capabilities of Processing et al, and the interface has some serious limitations (there’s no animation, no context stack, performance is questionable, etc). But it does succeed in letting me take a chunk of data from numpy, normalize it, and feed directly to drawing commands with only a single line of setup.

For instance, to just graph a time series and check it out:

    data = normalized numpy array at a given sampling rate
    ctx = drawing.Context(1024, 768, relative=True, flip=True)
    for x, y in enumerate(data):
        ctx.line(float(x) / len(data), 0, float(x+1) / len(data), y)
    ctx.show()


This is a work in progress, and I may develop it further as requirements arise, but it will remain simple! Minimal documentation and a few easy examples are present in the sourcecode.

Edit: example of the module in action, visualizing fitbit data.


Update (11-07-13): animation

Thanks to built-in event capabilities of OpenCV, I’ve grafted on the ability to work with animation. ctx.frame() outputs the current canvas to a window. ctx.clear() clears it. Put those in a loop and off you go.

Random accumulating squares:

    ctx = Context(640, 480)
    while True:
        x, y = random() * ctx.width, random() * ctx.height
        width, height = random() * 200, random() * 200
        fill = random(), random(), random(), random()
        ctx.rect(x, y, width, height, fill=fill)
        ctx.frame()
→ 2011-06-23         

EUCLIDEAN RHYTHMS: BJÖRKLUND’S ALGORITHM IN PYTHON

[code] Python (2.6), GPL

After encountering some buzz about it online, I read and was inspired by Godfried Toussaint’s paper, “The Euclidean Algorithm Generates Traditional Musical Rhythms”. In short, he demonstrates how many classic rhythms, particularly of African origin, can be described by a ubiquitous mathematical principle first documented by Euclid and even used for timing patterns in neutron accelerators.

However, while I found many implementations of the algorithm in various languages, all of the ones I tried (in Ruby, Python, Java, and Javascript) return inaccurate results! Trying 13 steps with 5 pulses was an easy way to break most of them. Luckily, Toussaint’s source, Björklund, provides C code in his paper The Theory of Rep-Rate Pattern Generation in the SNS Timing System. I translated this into Python (2.6), and found the result to be elegant, efficient, and accurate.

→ 2011-03-20