Where in the world is Wikipedia?

Explore how traffic to Wikimedia projects is distributed around the globe.

Download this subset

About this data

This dataset contains the proportion of traffic to each public Wikimedia project, from each known country, with some caveats.

Details

Wikimedia properties receive 125,000 requests every second, for myriad projects and from myriad countries. Too little of it is made available to third-party researchers, due to an understandable and laudable desire to avoid compromising the privacy of our users. So instead, we analyse it ourselves.

Part of the analysis we perform is high-level geolocation: investigating the idea that where our traffic comes from has implications for systemic bias and reach. This is /also/ work that third-parties do really well. We've decided to release a high-level dataset of geodata, to assist these researchers in their work. This tool represents a simple attempt to visualise it and make it explorable.

Data preparation

This dataset represents an aggregate of 1:1000 sampled pageviews from the entirety of 2014. The pageviews definition applied was the Foundation's new pageviews definition; additionally, spiders and similar automata were filtered out with Tobie's ua-parser. Geolocation was then performed using MaxMind's geolocation products.

There are no privacy implications that we could identify; The data comes from 1:1000 sampled logs, is proportionate rather than raw, and aggregates any nations with <1% of a project's pageviews under 'Other'.

Reusing this data

The data is released into the public domain under the CC-0 public domain dedication, and can be freely reused by all and sundry. Iff you decide you want to credit it to people, though, the appropriate citation is:

Keyes, Oliver (2015) Geographic Distribution of Wikimedia Traffic
Download all data