CLUSTERING DATA IN PYTHON
[code] Python (2.6), GPL
Previously, I demonstrated clustering approximate geographic points from OpenPaths to identify places of interest. Heres the code I used to accomplish that, which can be applied to any dataset, it doesnt have to be limited to two dimensions. The algorithm is known as agglomerative hierarchical clustering because it works by repeatedly grouping the closest two nodes together, starting with each data point as a node, and ending with the entire set in a single monster node. (“Closest” is defined by any distance metric; for many applications, it will be euclidean distance, but for geographic data, Im using the haversine distance formula.) Along the way, youve constructed a binary tree which represents the hierarchical relationships between the vectors in the set. The result is a kind of ad-hoc taxonomy, and is frequently used to hypothesize relatedness between images, documents, proteins, users, etc. (heres a nice diagram courtesy Razvan Musaloiu-E.)
To use agglomerative clustering as a classifier, however, what we really want is just a flat set of clusters. It’s akin to choosing the branches of the tree that best represent the natural divisions in the data. Cut too close to the trunk and the clusters will be too general; cut by the leaves and youve got too much noise. With geographic information from OpenPaths, at least, we have a heuristic to decide what is appropriate — the platform is only going to be accurate to a quarter-mile or so, so we can look for the largest branches that do not exceed that limit. The code provides a method, get_pruned, that takes such a parameter and returns a resulting set of clusters.
Clustering packages for python are out there, but I prefer touching the code and this is a dirt simple implementation that nonetheless should prove effective for most creative coding purposes.
A month ago I was hiking in the Fossil Ridge Wilderness Area. I wore several sensors over the four days — unfortunately, I lost the Fitbit and the Q-Sensor had a clock error which corrupted its data. Disappointing, but I did have two Garmin devices that held up, recording position and heartrate.
I finally managed to create a map from the data. I suppose I could have done this more efficiently with Google Earth, but I decided to parse the GPX files and draw my own paths with python, overlaying the result on a Google terrain map, which looks a lot nicer. The brighter red the path, the higher my heartrate — however, Im not sure that this ended up revealing anything particularly compelling. It is just, a la Certeau, a curious relic that substitutes for a rich experience.
Regardless, making it was a good exercise. Two complementary pieces of code were essential when doing this kind of thing outside of a mapping platform. First, when finding the geographic distance between two latitude/longitude pairs, Euclidean distance doesnt work, as we’re operating on an elliptical sphere. For that, there is the haversine formula:
”“” Convert the distance between two points, specified (lon, lat), \
to miles (or kilometers)
”“”
LON, LAT = 0, 1
pt0 = math.radians(pt0[LON]), math.radians(pt0[LAT])
pt1 = math.radians(pt1[LON]), math.radians(pt1[LAT])
lon_delta = pt1[LON] - pt0[LON]
lat_delta = pt1[LAT] - pt0[LAT]
a = math.sin(lat_delta / 2)**2 + math.cos(pt0[LAT]) * math.cos(pt1[LAT]) * math.sin(lon_delta / 2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
d = 6371 * c # radius of Earth in km
if miles:
d *= 0.621371192
return d
Secondly, in order to plot data on a two-dimensional plane and have it line up with an image from Google, I needed to use the formula for the Mercator projection. This one is compliments of OpenStreetMap. It gives you coordinates relative to the globe, which have to be further scaled.
”“” Project a (lon, lat) point to x,y space using
the Mercator projection
http://wiki.openstreetmap.org/wiki/Mercator#Python_Implementation
”“”
def merc_x(lon):
r_major = 6378137.000
return r_major * math.radians(lon)
def merc_y(lat):
if lat > 89.5:
lat = 89.5
if lat < -89.5:
lat = -89.5
r_major = 6378137.000
r_minor = 6356752.3142
temp = r_minor / r_major
eccent = math.sqrt(1 - temp**2)
phi = math.radians(lat)
sinphi = math.sin(phi)
con = eccent * sinphi
com = eccent / 2
con = ((1.0 - con) / (1.0 + con))**com
ts = math.tan((math.pi / 2 - phi) / 2) / con
y = 0-r_major * math.log(ts)
return y
return merc_x(pt[0]), merc_y(pt[1])
NO MORE PASSWORD REUSE
[code] Python (2.6), GPL
Im far from being a security expert, but I recognize the real danger in our poor password practices (particularly the reuse of passwords between sites). Elaborate routines for contructing passwords are well-intentioned, but ridiculous, so says both my mother and XKCD. Though what the comic suggests is an improvement, I think it’s still too much effort to remember different passphrases for different sites.
I think LastPass and the like have the right idea, have a vault of passwords and one master password to unlock it, but, kind of hiliariously, this has some meta-security problems of its own. Really, the problem with LastPass is that it stores data at all.
What I want is a piece of code that takes the url of a site where I want to have an account and then spits out a password. As long as I have access to the code, or can remember the algorithm behind it, I can generate my password for any site on demand when I need to log in. Nothing stored on the system, nothing to remember, a unique, ridiculous looking and ridiculously strong password for every site.
Such an algorithm would have some constraints. Namely:
- knowledge of the algorithm should not allow hackers to generate my passwords (so we’re probably going to have to use a memorized, non-random salt)
- the algorithm should work for all password forms (probably a maximum / minimum length, and some characters are likely unacceptable)
- I should be able to memorize the algorithm in case I lose my code (not perform it manually, but it should be easy to re-program if necessary)
- it should be accessible on all my systems (ie, offline and via a web form for iPhone or when using unfamiliar devices)
- it should, in fact, generate cryptographically difficult passwords
This short python script is what I came up with. It runs both as a commandline tool and as a cgi script on the web, and generates an alphanumeric + puncutation password. It requires a master salt to generate the passwords, but this can be a simple short dictionary word that I wont have to write down.
There are some drawbacks, naturally. Foremost is that the use of a web query is a weak link, because presumably Im submitting my master salt in plain text over the network. Not that anyone would know what I was doing, but security through obscurity is a not a great feature to have included. Secondly, there’s no provision for changing a password if it’s somehow compromised, ie, it generates one password per url. Also, this isnt hugely practical for people like my mother to use that arent going to be running a commandline tool or hitting weird urls. And finally, of course, I have to run this script anytime I want to log in anywhere. However, that’s probably going to be faster than consulting my passwords.txt file, which I would have to do if I actually kept unique passwords for every site.
Regardless, I think this is actually going to work, making my life more securish. But Im sure a similar approach has been developed before. What do you think? Is this code flawed or am I on to something? Publishing the code here is my own mini-security experiment (it does expose some additional info, like password length and the set of possible characters, but I think we’ll still be ok).
STREET VIEW VIDEO VIA OPENPATHS API
[code] Python (2.6), GPL
Brainstorming with @blprnt this morning about what people might do with the new OpenPaths API, we thought it would be pretty awesome to see every place you’ve ever been via Google Street View.
Loading all of that up through the Google Maps interface seemed overly burdensome, so we figured there must be a way to pull the static tiles. Turns out there is (though it’s unofficial). @jaimethompson breaks it down for you.
From there, it was pretty straightforward to pull the points, scrape the images, and assemble the video. It includes points from September ‘10 to the present and a dozen or so cities, beginning in LA I think, but NYC clearly dominates. Non-urban spots arent captured well, and in Googleland it’s never winter. You might also notice that the granularity of the video increases at the end. That’s because at a certain point I start using the forthcoming OpenPaths app, which samples periodically, rather than the data from iTunes backups, which only looks at novel locations. The API pulls from both.
Want your own? I did this with python as usual — you can grab the code here if you’re interested (youll need PIL and the latest OpenCV bindings installed to export the video). This is a bit of a soft launch for the OP API as we gradually work in new features. Let me know if anyone gives it a try (especially if youre using a different language).
Noncoders fear not — we’ll hopefully be integrating something like this (but cooler and more blprnty) directly into the OpenPaths interface in the near future.
A SIMPLE PROCESSING-INSPIRED DRAWING INTERFACE FOR PYTHON
[code] Python (2.6), GPL
In my recent work at the lab, I’ve been doing some simple datavis to make sense of the various biometrics I’ve been gathering. I’m committed to python for this sort of thing, but I’ve found it wanting for a dirt simple drawing library, absurd as it may seem given the number of available graphics packages. The power of Processing / processing.py, Nodebox, Field, matplotlib, etc, mean that they end up imposing on, if not dictating, the flow of a program. I wanted a single import that gives me basic, pythonic drawing commands and which can show me the result in as lightweight a way as possible.
I ended up putting together my own, which is a wrapper for PIL and aggdraw.
The idea certainly wasn’t to match the drawing capabilities of Processing et al, and the interface has some serious limitations (there’s no animation, no context stack, performance is questionable, etc). But it does succeed in letting me take a chunk of data from numpy, normalize it, and feed directly to drawing commands with only a single line of setup.
For instance, to just graph a time series and check it out:
ctx = drawing.Context(1024, 768, relative=True, flip=True)
for x, y in enumerate(data):
ctx.line(float(x) / len(data), 0, float(x+1) / len(data), y)
ctx.show()
This is a work in progress, and I may develop it further as requirements arise, but it will remain simple! Minimal documentation and a few easy examples are present in the sourcecode.
Edit: example of the module in action, visualizing fitbit data.
Update (11-07-13): animation
Thanks to built-in event capabilities of OpenCV, I’ve grafted on the ability to work with animation. ctx.frame() outputs the current canvas to a window. ctx.clear() clears it. Put those in a loop and off you go.
Random accumulating squares:
while True:
x, y = random() * ctx.width, random() * ctx.height
width, height = random() * 200, random() * 200
fill = random(), random(), random(), random()
ctx.rect(x, y, width, height, fill=fill)
ctx.frame()
EUCLIDEAN RHYTHMS: BJÖRKLUND’S ALGORITHM IN PYTHON
[code] Python (2.6), GPL
After encountering some buzz about it online, I read and was inspired by Godfried Toussaint’s paper, “The Euclidean Algorithm Generates Traditional Musical Rhythms”. In short, he demonstrates how many classic rhythms, particularly of African origin, can be described by a ubiquitous mathematical principle first documented by Euclid and even used for timing patterns in neutron accelerators.
However, while I found many implementations of the algorithm in various languages, all of the ones I tried (in Ruby, Python, Java, and Javascript) return inaccurate results! Trying 13 steps with 5 pulses was an easy way to break most of them. Luckily, Toussaint’s source, Björklund, provides C code in his paper The Theory of Rep-Rate Pattern Generation in the SNS Timing System. I translated this into Python (2.6), and found the result to be elegant, efficient, and accurate.
MESHCAL: GEOMETRY CORRECTION WITH JITTER
[code] Max/MSP/Jitter (5), GPL
Creating the Contemporary Issues Forum at the National Museum of American Jewish History was a great challenge for my Jitter skills — the installation features no fewer than twelve cameras and a host of interactive modes.
As you can imagine, there was a lot of geometry correction going on. Because video was being re-projected back onto the recording surface, not only did the projected image have to match the geometry of the wall, but it also had to internally align to the post-it notes on the wall. Geometry correction solutions that I found online and in the forums would address the plane alignment, but not the internal deformation of the image, which is necessary to account for the camera and projector lenses.
jit.gl.mesh to the rescue. Using opengl planes and a transformation matrix, I built an object, dubbed meshcal, that allows adjustment of both the corners of a plane and horizontal and vertical axes across a grid.
The code let us pull off the magic trick of having projected people seemingly interact with physical objects on the wall in an endless record and project cycle. Hopefully this can be of use of others needing fine-tuned geometry correction for a similar situation, or seeking to get deeper into jit.gl.mesh, which can get pretty hairy.
MINAIR: RUNNING PURE AS3 WITH ADL
[code] Adobe Flex SDK 3 / AS3, GPL
With the enthusiasm these days over awesome developments with HTML5, Processing, and the like, it may be an odd time to be working in Flash. The thing is, for interfaces in an installation setting, I would argue that Flash/AIR remains the simplest and best-looking platform for animating video and text along with interactive elements.
While I appreciate the intentions behind AIR, Im not interested in distributing applications. Instead, Im typically looking for a front end for an installation largely running in, likely, python. The AIR Debug Launcher (ADL) is a great tool for testing AIR apps, and increasingly Ive been (mis)using it in this kind of production environment as well. My workflow is 100% AS3, no FLAs here, and I use the Flex compiler, keeping things open source.
For projects that mix python code with AS3, I wanted a single directory that I could include in a project and invoke a Main class from a simple shell script. Additionally, I wanted a host of AS3 classes that Ive assembled over the years to be at the ready.
minair accomplishes this. As of yet, it’s a bit nascent, and some settings have to be altered manually between projects. But point this at your Flex SDK, and you have a minimal AIR framework running on ADL that’s ideal for an installation frontend.
Typically, I use a socket connection with a simple protocol to communicate between AIR and python, where I keep the bulk of the work (state, interfacing with hardware, databases, etc) — that is implemented in a Bridge class which is included in the framework. Additionally, we’ve got loaders, asset handlers, utils, display objects… Id love to document these, it wont happen, but I include them here nonetheless.
TXTML: SUBVERSIVE MOBILE STORYTELLING WITH TEXT-MESSAGING
PHP5, GPL
[code] updated 2010 November
Together with its interpreter and messaging engine, TXTML (TeXT-message Markup Language) comprises a system for creating interactive text-messaging applications.
TXTML encourages natural and open-ended exchanges that emphasize context over commands, allowing the author to dynamically tailor applications to the current location, time, and history of the user. The language is an elegant, domain-specific XML-variant which calls on an extensible library of functional modules. These include methods for natural language processing, user administration, content management, dynamically generated content via Atom/RSS feeds, and location tracking. The language’s nonlinear structure enables complex applications to be simply composed, whether narrative artworks, games, surveys, or interpretive content.
TXTML was not designed to create standard text-message applications such as mailing lists or lookup services. Rather, it is a experimental platform for investigating text-messaging as a narrative medium. It’s inspired by INFORM, AIML, and VXML, but with the particular interactive concerns of text-messaging in mind. TXTML powers artwork by Knifeandfork, including a piece called The Wrench (TXTML sourcecode available). Knifeandfork coined the term Subversive (Mobile) Storytelling to describe their recent work — the use of mobile phones to transform our experience of narrative by intertwining it with daily life. Check out this paper for a more in-depth discussion.
TXTML is free software and available for use and modification under the GPL license. Please contact us with ideas, concerns, technical questions, and inspirations.
For more info see txtml.org











