Wikimedia Maps/Tile generation report

Introduction
The map infrastructure store vector-tiles in a Cassandra DB and can store data from zoom levels 0 to 15 where each zoom level has 2zoom tiles. Tilerator generates 22 tiles per second in average for each CPU available. Generating the whole planet can take more than a month for 24 CPU's.

Debian upgrade to stretch
Because of the Debian upgrade to stretch, OSM data needed to be reloaded and all tiles needed to be generated from scratch. The tile generation started at 10/22/2018 and it's an ongoing process until now (last update was 12/19/2018). After some delay on tile generation we started keeping manual log through time regarding the tile generation health using a node tool that collect metrics generated by Tilerator, which can be seem in the following table: It is possible to see that Tilerator processed more tiles than expected and also, after some jobs failing with message "TTL exceeded" the number of tiles processed dropped and Tilerator rerun the jobs. It's not known if the work is committed and tiles were stored in Cassandra, even though maps1004 increased its size of generated tiles compared with maps1001 (old master). The remaining jobs were stopped at 2019-09-01 15:23 freeing CPU to enable the OSM sync replication script.

Tile processing reached near 100% (01-08-2019)
After a long run of tile generation jobs, and some difficulties to reach an end, the team decided that would be reasonable to continue with the migration plan without waiting for the full tile processing. Analyzing the collected data we figured out that only a small part of the tiles are stored and we might have enough tiles stored.

Also, it was found a failure in the strategy to regenerate the full planet tiles. It was documented a different process that is faster if you have a set of tiles pre-generated before to use as a basis to generate only the specific indexes. The next time that full planet tile regeneration is needed, we should consider this approach which would generate tiles according to the following benchmark:

Useful Commands
The following Tilerator command was used to schedule and queue tile generation jobs:

Problems during tile generation

 * Tile generation for zoom levels 14 and 15 failed. Tilerator tries to run jobs with too much tiles and got the error.
 * Solution: restart jobs using Tilerator UI

What we have learned
A few things that was learned during this process are worth to mention:


 * Generating an entire zoom level is painful and slow and we should avoid at all costs for zoom levels greater than 13
 * This work is an incentive for a refactor of the available documentation
 * There isn't yet a reliable tool for Cassandra data copy, which would save us a lot of time too