The Road to Geothon

Posted Sat Sep 22 @ 10:33:38 AM PDT 2012

I've been experimenting with scaled vector graphics, and cartography. I downloaded a bunch of data from Natural Earth and plotted it with SVG. That alone was pretty fun, and I learned a lot, but I wanted to do something a little more interactive.

I downloaded the entire CIA World Factbook, and built a state machine in Python to parse the messy HTML (I tried to use Beautiful Soup, but I couldn't figure out how to extract the information I needed). I threw all the facts into a database, and then finely extracted more information from each field (latitude and longitude for the geographic center of the country, the common name of the country, the capital name and location, etc).

I then tried to merge the fact database with the Natural Earth data and ran into some problems. I used the "fips10" code to do the merge because that was the only field the CIA Factbook, and the Natural Earth data had in common. Unfortunately, things didn't quite match up. Natural Earth data lacked a fips10 code for two countries (I can't remember which), and it also contained two countries that didn't exist (Somaliland and Northern Cyprus).

I needed to make changes to the Natural Earth data so the merge would work seamlessly. I downloaded a copy of Quantum GIS, and played with that. I figured out how to merge "shapes", so I combined Northern Cyprus with Cyprus, and Somaliland with Somalia. I also added the fips10 code for the two countries that lacked it.

After that minor surgery, I successfully merged it with the CIA Factbook. And it resulted in half a megabyte of world map data.

I wrote a few hundred lines of JavaScript (in addition to the hundreds that plot the map) to turn the map into a country guessing game.

The result? A little program that tests your geography knowledge, and helps you improve.

Give Geothon a try. Not to brag, but I've already learned all of the Americas (including the Caribbean) and Africa, most of Europe, and the Middle East.

Overheard on HN

Posted Tue Sep 18 @ 05:04:57 PM PDT 2012

...HN is just full of college kids who think making a social web app with geolocation and a REST API is going to turn them into gazillionaires when they pivot to something useful (after all, the idea doesn't matter, it's the people who count) and then get acquired by Facebook for the GDP of a small country.

- Silhouette

I completely took that out of context, but I thought it was pretty funny.

UDP Punching

Posted Sun Sep 09 @ 11:40:20 AM PDT 2012

I wanted to experiment with building a peer-to-peer application. The first step was to figure out how to connect hosts behind a NAT. Because the hosts don't each have their own public IP address, you can't connect them directly. One potential solution is to get each host to configure their respective routers to forward port x to their local IP address. But there is a clever way to trick the router into forwarding packets to the right host behind a NAT, without requiring a user to do anything. It's called UDP Punching.

Here's how it works (the 4 step process described on the Wikipedia entry is slightly inaccurate, and leaves out important details):

  1. A server listens for UDP messages on some known public IP and port.
  2. HostA (behind a NAT) sends a UDP packet to the server.
  3. The server records the source IP address and source port of the packet (I'll refer to them as HostA_IP, HostA_Port). There are two important things to keep in mind:
    • The source IP address points back to HostA's router (not HostA, because it is behind a NAT)
    • The source port points to the entry in HostA's router's NAT table that is used to forward packets back to HostA.
  4. HostB (behind a NAT) sends a UDP packet to the server.
  5. The server records the source IP address and source port (like step 3).
  6. The server sends a packet to HostA_IP:HostA_Port that contains the IP address of HostB (HostB_IP) and the port of HostB (HostB_Port)
  7. The server sends a packet to HostB_IP:HostB_Port that contains the IP address of HostA (HostA_IP) and the port of HostA (HostA_Port)

In other words, during step 6 and 7, the server informs each host of the other host's IP and Port. Continuing on:

  1. The server has done its job. It is not used again.
  2. Using the address information HostA got from the server in step 6, HostA sends a packet to HostB_IP:HostB_Port. This tells HostA's NAT to forward packets from HostB_IP:HostB_Port back to HostA.
  3. Using the address information HostB got from the server in step 7, HostB sends a packet to HostA_IP:HostA_Port. This tells HostB's NAT to forward packets from HostA_IP:HostA_Port back to HostB.

An interesting thing about these last two steps is that the packets only have to reach each host's own router. So you can set the TTL of these packets to something like 2.

At this point, each host can freely send UDP packets to the other because each host's router has the correct entry in the NAT table.

I wrote a simple client and server you can use to try it out.

Taking a Consistent Snapshot of MySQL Slave

Posted Mon Aug 27 @ 05:38:04 PM PDT 2012

I have a MySQL master-slave setup, and every night I backup the slave. That consists of stopping the slave (with mysql> STOP SLAVE), reading and recording the values of Relay_Master_Log_File and Exec_Master_Log_Pos (with mysql> SHOW SLAVE STATUS), and doing mysqldump $dbname | gzip > $dbname.sql.gz on each database.

After that is finished, I run mysql> START SLAVE, and the slave catches back up with the master.

There is a problem with that method though. It takes a long time (in my case, 20 minutes). If the master fails during that period, you've lost all the write queries from the time you stopped the slave, to the time the master crashed.

Fortunately, fixing the problem is very simple. Instead of issuing mysql> STOP SLAVE, you stop just the SQL_THREAD with mysql> STOP SLAVE SQL_THREAD. The slave still saves the writes from the master to the relay log, but it doesn't execute them until you issue mysql> START SLAVE.

That gives you a consistent snapshot of your databases, and the ability to apply the latest write queries if the master crashes.

Lisp

Posted Mon Aug 20 @ 11:54:18 AM PDT 2012

Lisp has all the visual appeal of oatmeal with fingernail clippings mixed in. (Other than that, it's quite a nice language.)

- Larry Wall

Robust Instant Payment Notification for Django

Posted Wed Aug 15 @ 09:53:30 AM PDT 2012

Yesterday, I had two problems with PayPal's instant payment notifications system:

  • PayPal's postback service wasn't always responding (https://www.paypal.com/cgi-bin/webscr)
  • The postback service was giving me the empty string instead of "INVALID" or "VERIFIED"

Because of the shoddy IPN script I wrote--that didn't take into account those two problems--I had to manually process the orders. I don't like doing that because my customers expect instant access to their account.

In between processing orders by hand, I beefed up my IPN script to handle these edge cases. The trick is to respond with a HTTP 500 status code when things go wrong. If PayPal sees the 500, it will resend the IPN again in a few minutes. That's still not perfect, since my customers have to wait a few minutes, but that is shorter than the time it takes me to notice the problem, and process it by hand.

This is the script I came up with:

Fun with SVG

Posted Sun Aug 05 @ 10:06:58 AM PDT 2012

I've been doing some fancy stuff (at least I think so) with SVG, and I'm about ready to release my new project that relies on it heavily. But before I do, I thought I would write about the experience I had working with SVG. I tested my app in IE 9, Firefox 14.0.1, Chrome 21.0.1180.60, Safari 5.1.7, and Opera 12.01. Here are some observations:

  • In Opera, if you call getScreenCTM on an SVG element right after you add it to the page, you get a matrix with all zeros. If you try to invert it, naturally, it fails and throws a SVG_MATRIX_NOT_INVERTABLE exception. To fix that, delay the call by a few milliseconds (call it after you pull some data from an AJAX request, or just use a setTimeout).
  • In IE 9, when the SVG is drawn, it overflows the bounds of the viewport. To fix that, add this to your CSS: svg { overflow: hidden; }
  • In Firefox, getBoundingClientRect() returns completely bogus values on SVG elements.
  • In IE 9, making an element's height 100% does not make it take up all the vertical space of it's parent element. To fix, I had to put html, body, div {height: 100%;} in my CSS.
  • In Firefox, to bind to a scroll event, you use element.addEventListener('DOMMouseScroll'..., and in everything else, you use element.addEventListener('mousewheel'...

Those are all the quirks I remember off the top of my head. I feel like there were quite a few more.

Mercator for Dummies

Posted Tue Jul 24 @ 12:27:40 PM PDT 2012

I'm trying out SVG, and one of the things I wanted to do was build a map. I found a GeoJSON data dump of the world, and plotted it on a Cartesian coordinate system. This is what I ended up with:

World map

Not the prettiest thing in the world. Everything looks smashed vertically.

One thing you could try is scaling up the y axis by 2. It kind of makes sense, since longitude (x axis) goes from 0 to 360, and latitude (y axis) goes from -90 to 90 (a range equal to half of the longitude). If you do that, you end up with this:

World map

Still not right. After some Googling around, I discovered you can't simply plot latitude and longitude on a Cartesian plane, and expect it to look ok. You need to use some kind of projection, since latitude and longitude is meant for a sphere.

The most common projection (I think) is a Mercator projection, which preserves the longitude, but stretches the latitude the further it is away from the equator.

Here is a Mercator projection and inverse, implemented in Python:

def latLonToMercator(lat, lon): """Project lat lon (in degrees) using a Mercator projection""" # mercator doesn't work on the poles assert(abs(lat) < 90.0) # convert to radians lat *= math.pi/180 # mercator projection lat = math.log((1+math.sin(lat))/math.cos(lat)) # put back into degrees lat *= 180/math.pi # If you want, clamp down on antarctica, which gets huge #if lat < -200: # lat = 200 return lat, lon def mercatorToLatLon(mercator_lat, mercator_lon): """Invert coords projected with Mercator back into lat and lon""" # convert to rads mercator_lat *= math.pi/180 # inverse of mercator mercator_lat = math.atan(math.sinh(mercator_lat)) # convert back to degrees mercator_lat *= 180/math.pi return mercator_lat, mercator_lon

Before you plot the latitude and longitude points on a plane, you run each through the projection. What you get out is the new coordinate for the point. When you plot those, you get this:

Now that looks right...except for Antarctica which is humongous. Not sure what to do about that.

One Thing I Hate About Python

Posted Sat Jul 07 @ 09:04:53 AM PDT 2012

Python is my favorite language to program in. I think the Zen of Python really fits my development style. In particular, I love this item:

There should be one-- and preferably only one --obvious way to do it.

But Python, like every language has its flaws. One thing that really drives me crazy is the fact that Python uses and, or, not instead of &&, ||, !, respectively.

I suppose you could justify the verbosity by claiming (1) it is more readable, and (2) it doesn't require you to use the shift key.

I find (2) to be weak because many of the other operators in Python require the shift key (for example: +,**,*, %, <<, >>, >, >, !=, etc). So why isn't + referred to as "plus", or "add" in Python?

(1) is more subjective. But I think it is clear as day that &&, || and ! are more readable. See for yourself:

if not (foo == 'abc' and bar == 'bac' or zoo == '123'):

vs

if !(foo == 'abc' && bar == 'bac' || zoo == '123'):

The !, && and || symbols work as visual separators between the boolean conditions, which I think improves the readability of more complex conditionals. But maybe that's just me.

Linear Animation Looks Bad

Posted Sat Jun 30 @ 12:50:39 PM PDT 2012

Whenever I need to do animation in JavaScript, I completely rely on jQuery's animate call. For kicks-and-giggles I tried to do the animation without jQuery.

By using setInterval() to do the tweening at a reasonable frame rate, you get an animation. The tricky part is determining the appropriate way to interpolate the points from the starting value to the final value. I thought simply doing a linear interpolation would produce the desired effect. For example, if I want to change the width of an image from 100px to 200px in 500 milliseconds, and my tween callback fires every 50 milliseconds, then the image would be resized like so:

Time Delta Image Size
0 N/A 100
50 10 110
100 10 120
150 10 130
200 10 140
250 10 150
300 10 160
350 10 170
400 10 180
450 10 190
500 10 200

After implementing that, I discovered it doesn't quite look right. To the eye, it looks rough (I don't know why -- it just does).

So why was jQuery's animation so smooth? I looked through their code and discovered this:

easing: { linear: function( p ) { return p; }, swing: function( p ) { return ( -Math.cos( p*Math.PI ) / 2 ) + 0.5; } },

jQuery is using some fancy swing function to do the easing. Using the previous example, a swing would produce this kind of transition:

Time Delta Image Size
0 N/A 100
502102
1007109
15011120
20014134
25016150
30015165
35014179
40011190
4507197
5003200

For your viewing pleasure, here is a linear and swing animation. See if you can guess which is which...

Newer Posts Older Posts


It takes considerable knowledge just to realize the extent of your own ignorance. - Thomas Sowell