A Map of the Geographical Structure of Wikipedia Links

[caption id="attachment_766" align="alignleft" width="150"]Wikipedia Click to enlarge![/caption]

There are a lot of Wikipedia visualizations. Some concentrate on article contents, others on the links between articles and some use the geocoded content (like in my previous blog post).

This new visualization is novel because it uses the geographical content of Wikipedia in conjunction with the links between articles. In other words, if a geocoded article (that is, an article associated with a location like a city) links to another geocoded article, a line will be drawn between these two points. The result can be found on the map on the left.

Read on for zoomed views, slideshows, browsable maps, etc.


Scroll down to see the slideshows, pretty pictures and interactive maps.

[caption id="attachment_808" align="alignright" width="150"]image1 Click to enlarge![/caption]

The first thing I had to do was to extract the geographical data included in the articles and the links between the articles. Instead of parsing the very complicated Wikipedia markup, I chose to use the good work done by the folks at GeoNames. In the download section, there a SQL file with the name of every geocoded Wikipedia article. Then, I downloaded all English articles in Wikipedia (9GB compressed, about 40GB uncompressed) and used a bit of Regex magic to extract reentrant links (that is, hyperlinks that link to geocoded articles). After these steps, I was left with two datasets: a list of all geocoded articles and a list of all links between articles.

To draw the map, I used the same technology I developed for my map of scientific collaborations. I had to adjust the tool to add features like other geographical projections (the Mercator projection, while simple, makes Greenland seems as large as Africa), linear transformations, etc. The datasets computed in the previous steps were then parsed and drawn by my mapping tool. I then played with the colors in Photoshop to convert the outputted grayscale map to color. To build the browsable and overlay maps, I used the fantastic MapTiller tool. By the way, the input projection for this tool is Equidistant Cylindrical - knowing this would have saved me a lot of time!


[huge_it_gallery id="4"]

This slideshow contains zoomed parts of the map of different countries, continents and regions. Click on a picture to enlarge it. Browse to the bottom of this blog post to download the full size map (200M pixels - 18MB JPEG file).

Browsable Map

This map is projected using a Robinson projection; it is a "compromise" projection meaning that while it doesn't resolve all the problems found in many projections, it minimizes most distortions.

Click here to open this map in a new window

Google Map Overlay

Like the title suggests, this map is overlaid onto a Google Map so that cities, countries and other landmarks can be easily situated. Obviously, populated areas contain a lot of Wikipedia articles.

Click here to open this map in a new window

Data & High Resolution Files

There's also a high resolution file, but Amazon was charging me a pretty penny to host and serve it, so I removed it. Let me know if you want it; I'll send you the file. I also have a 1.7G pixels file, but it is too large to host here, so let me know if you need it. It uses the Equidistant Cylindrical projection, not the Robinson one like the other high resolution file.

The input dataset (30MB compressed, around 95MB uncompressed) can be downloaded here, the fields should be self-explanatory.

The drawing tool will be eventually open sourced, but I need time to clean it up.