A Map of the Geographical Structure of Wikipedia Links

Wikipedia

Click to enlarge!

There are a lot of Wikipedia visualizations. Some concentrate on article contents, others on the links between articles and some use the geocoded content (like in my previous blog post).

This new visualization is novel because it uses the geographical content of Wikipedia in conjunction with the links between articles. In other words, if a geocoded article (that is, an article associated with a location like a city) links to another geocoded article, a line will be drawn between these two points. The result can be found on the map on the left.

Read on for zoomed views, slideshows, browsable maps, etc.

Methodology

Scroll down to see the slideshows, pretty pictures and interactive maps.

Click to enlarge!

The first thing I had to do was to extract the geographical data included in the articles and the links between the articles. Instead of parsing the very complicated Wikipedia markup, I chose to use the good work done by the folks at GeoNames. In the download section, there a SQL file with the name of every geocoded Wikipedia article. Then, I downloaded all English articles in Wikipedia (9GB compressed, about 40GB uncompressed) and used a bit of Regex magic to extract reentrant links (that is, hyperlinks that link to geocoded articles). After these steps, I was left with two datasets: a list of all geocoded articles and a list of all links between articles.

To draw the map, I used the same technology I developed for my map of scientific collaborations. I had to adjust the tool to add features like other geographical projections (the Mercator projection, while simple, makes Greenland seems as large as Africa), linear transformations, etc. The datasets computed in the previous steps were then parsed and drawn by my mapping tool. I then played with the colors in Photoshop to convert the outputted grayscale map to color. To build the browsable and overlay maps, I used the fantastic MapTiller tool. By the way, the input projection for this tool is Equidistant Cylindrical – knowing this would have saved me a lot of time!

Slideshow

Wikipedia
Mediterranean Sea
United Kingdom
India
Western Europe
United States of America
Caribbean Sea
Some parts of Asia
Australia and New Zealand

Wikipedia

Mediterranean Sea

United Kingdom

India

Western Europe

United States of America

Caribbean Sea

Some parts of Asia

Australia and New Zealand

This slideshow contains zoomed parts of the map of different countries, continents and regions. Click on a picture to enlarge it. Browse to the bottom of this blog post to download the full size map (200M pixels – 18MB JPEG file).

Browsable Map

This map is projected using a Robinson projection; it is a “compromise” projection meaning that while it doesn’t resolve all the problems found in many projections, it minimizes most distortions.

Click here to open this map in a new window

Google Map Overlay

Like the title suggests, this map is overlaid onto a Google Map so that cities, countries and other landmarks can be easily situated. Obviously, populated areas contain a lot of Wikipedia articles.

Click here to open this map in a new window

Data & High Resolution Files

You can download an high resolution file of the map here. It’s quite big at 200M pixels. It’s a 18MB JPEG file and could crash your browser; even more so if you are using a smartphone. I also have a 1.7G pixels file, but it is too large to host here, so let me know if you need it. It uses the Equidistant Cylindrical projection, not the Robinson one like the other high resolution file.

The input dataset (30MB compressed, around 95MB uncompressed) can be downloaded here, the fields should be self-explanatory.

The drawing tool will be eventually open sourced, but I need time to clean it up.

14 Thoughts on “A Map of the Geographical Structure of Wikipedia Links

  1. Pingback: Los enlaces de Wikipedia representados en un mapa del mundo

  2. Pingback: Los enlaces de Wikipedia representados en un mapa del mundo | Riberatwitter.es

  3. Pingback: wikipedia : Cartographie des liens et des contripbutions

  4. Pingback: Los enlaces de Wikipedia representados en un mapa del mundo | WebZar

  5. Chris Helenius on January 30, 2013 at 8:01 am said:

    Is it possible to extract the article names from the nodes?

    There’s a dot in northern Finland, N 65° 51′ 0” E 29° 54′ 0”, that has numerous connections radiating toward south-west, but… there’s nothing there.

    Only one article nearby, https://en.wikipedia.org/wiki/Joukamojärvi, but the article has only two outward links to Finnish domains, and no internal links apart from the infobox. The area is just empty taiga forest.
    I’m really intrigued about what can be so important up there.

  6. Amazing work, kudos!

    With regards to what Chris says, I was intrigued enough to try to find the culprit by following the links from nearby Bothnian Bay and studying the linked pages. I would bet that the point corresponds to the article for the Baltic Sea, which has loads of links and is not where one would expect:

    http://en.wikipedia.org/wiki/Baltic_Sea

    I have been perusing the article history, but can’t find any recent location-related vandalism. So it’s a mistery ;)

    • Thanks.

      Yeah, the first render I did had a bit of weird stuff that I had to correct. For example, the FCC was located in Russia and all radio related pages would like to the FCC. I think the geodata in wikipedia is less clean than we think.

  7. Pingback: Os links de Wikipedia representados em um mapa do mundo

  8. Pingback: Wikimedia Research Newsletter, January 2013 — Wikimedia blog

  9. This is beautiful. Bravo!

  10. Pingback: ベスト・インフォグラフィック(週間)2013年2月1週

  11. Tinpot Gamer on February 15, 2013 at 6:31 am said:

    I’m intrigued by the big line across the south Atlantic. The only thing I can find there with a brief search is Bouvet Island but that’s uninhabited and glacial, so likely not a Tuvalu .tv domain situation.
    Perhaps that’s links that relate to the ocean itself?

  12. “A Map of the Geographical Structure of Wikipedia Links” -> “A Map of the Geographical Structure of English Wikipedia Links”

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Post Navigation