This dataset contains the proportion of traffic to each public Wikimedia project, from each known country, with some caveats.
Wikimedia properties receive 125,000 requests every second,
for myriad projects and from myriad countries. Too little of it is
made available to third-party researchers, due to an understandable and
laudable desire to avoid compromising the privacy of our users.
So instead, we analyse it ourselves.
Part of the analysis we perform is high-level geolocation:
investigating the idea that where our traffic comes from has
implications for systemic bias and reach. This is /also/ work that third-parties
do really well. We've decided to release a high-level dataset of
geodata, to assist these researchers in their work. This tool
represents a simple attempt to visualise it and make it explorable.
This dataset represents an aggregate of 1:1000 sampled pageviews from the entirety of 2014. The pageviews definition applied
was the Foundation's
new pageviews definition; additionally, spiders and similar automata were filtered out with Tobie's ua-parser.
Geolocation was then performed using MaxMind's geolocation products.
There are no privacy implications that we could identify; The data comes from 1:1000 sampled logs, is proportionate rather than raw, and aggregates any nations with <1% of a project's pageviews
Keyes, Oliver (2015) Geographic Distribution of Wikimedia Traffic