Reading/Search Engine Optimization/Sitemaps test

This page describes Wikimedia Audiences and Wikimedia Technology's work to improve Wikipedia presence in search results by creating XML sitemaps for search crawlers (tracked in T198965 on Phabricator). This extended study is a follow-up to the inconclusive analysis of an earlier effort on Italian Wikipedia (cf. ).

Introduction
The following languages have had sitemaps created and submitted to the Google Search Console:


 * Indonesian (idwiki)
 * Italian (itwiki)
 * Korean (kowiki)
 * Dutch (nlwiki, nds_nlwiki)
 * Punjabi (pawiki, pnbwiki)
 * Portuguese (ptwiki)

The following language have been kept from the sameAs A/B test to be used as "controls" in this test:


 * Bhojpuri (bhwiki)
 * Cherokee (chrwiki)
 * Kazakh (kkwiki)
 * Catalan (cawiki)
 * French (frwiki)
 * Yoruba (yowiki)
 * Kalmyk (xalwiki)

Methods
We performed the analysis using the methodology introduced by Brodersen et al. (2015) wherein a Bayesian structural time series (BSTS) model is trained on the pre-intervention period of the set of control time series unaffected by the intervention. That model is used to generate predictions of the counterfactual time series – "what if sitemaps were not deployed?" in our case – and then we compared the predicted time series with the actual time series to infer the impact. This is the same approach employed by Xie et al. (2019) to asses the impact of the Hindi Wikipedia awareness campaign.

The model of search engine-referred traffic among treated wikis included various seasonality components and three "control" time series which we assume to be unaffected by the intervention:


 * direct (non-referred) traffic to treated wikis
 * search engine-referred traffic to "control" wikis
 * direct (non-referred) traffic to "control" wikis

We utilized 10-fold forward-chaining cross-validation (CV) to estimate the MAPE of the models and assess the accuracy of our model in predicting the counterfactual. Since we were analyzing 60 days of traffic post-intervention, we evaluated the model on 10 blocks of 60 days leading up to the intervention, using all the data available relative to each of the evaluation blocks ("folds").

Discussion
…