MaxMind Evaluation

Many of the analytics services at WMF rely upon geocoding ip addresses associated with web requests. To do this, we use the maxmind service, which provides updated mappings between IP blocks and city-/country-/region-level information. These mappings live in a simple binary database files which can be downloaded from their website with the appropriate credentials. Until recently, the details of this process were relatively unexamined, often leading to the use of a binary database of unknown origins which could have been out of date by several years. As a result of some unexpected changes in page views by country counts, we have decided to examine the possibile effects of using an out of date binary database.

Overall Mismatches As a Function of Months Out of Date
For this analysis, we took a sampled squid log file from February 1, 2013 (/a/squid/archive/sampled/sampled-1000.log-20130101.gz) and geocoded it with the latest GeoIP database as of each month within the most recent 6 months and then every 6 months for the most recent 2 years. On the suspicion that certain countries might be more prone to this sort of error, we also filtered the results according to the country from which the most recent database considers the request to have originated.

Month over Month Change in Geocoded Requests
We also wanted to get a sense of how dramatic the changes in geocoded requests from a particular request might be. To do this in a way which was agnostic to the total number of request coming from a country, we looked at the month over month percentage change and looked at the top 10 countries according to the maximum month over month percent change. The graph below is rather concerning because it suggests that for many countries, the reported traffic will vary wildly between database versions which were used. Even a relatively large country like Canada shows up with a 100% month over month change.