Reading/Search Engine Optimization/Sitemaps test

This page describes Wikimedia Audiences and Wikimedia Technology's work to improve Wikipedia presence in search results by creating XML sitemaps for search crawlers (tracked in T198965 on Phabricator). This extended study is a follow-up to the inconclusive analysis of an earlier effort on Italian Wikipedia (cf. ).

Introduction
The following languages have had sitemaps created and submitted to the Google Search Console:


 * Indonesian (idwiki)
 * Italian (itwiki)
 * Korean (kowiki)
 * Dutch (nlwiki, nds_nlwiki)
 * Punjabi (pawiki, pnbwiki)
 * Portuguese (ptwiki)

The following language have been kept from the sameAs A/B test to be used as "controls" in this test:


 * Bhojpuri (bhwiki)
 * Cherokee (chrwiki)
 * Kazakh (kkwiki)
 * Catalan (cawiki)
 * French (frwiki)
 * Yoruba (yowiki)
 * Kalmyk (xalwiki)

Methods
We performed the analysis using the methodology introduced by Brodersen et al. (2015) wherein a Bayesian structural time series (BSTS) model is trained on the pre-intervention period of the set of control time series unaffected by the intervention. That model is used to generate predictions of the counterfactual time series – "what if sitemaps were not deployed?" in our case – and then we compared the predicted time series with the actual time series to infer the impact. This is the same approach employed by Xie et al. (2019) to asses the impact of the Hindi Wikipedia awareness campaign.

The model of search engine-referred traffic among treated wikis included a local trend and various seasonality & autoregressive components:


 * AR(5)
 * Day of week
 * Week of year
 * Month of year
 * Christmas & New Year as holidays

as well as three "control" time series which we assume to be unaffected by the intervention:


 * direct (non-referred) traffic to treated wikis
 * search engine-referred traffic to "control" wikis
 * direct (non-referred) traffic to "control" wikis

We utilized 10-fold forward-chaining cross-validation (CV) to estimate the MAPE of the models and assess the accuracy of our model in predicting the counterfactual. Since we were analyzing 60 days of traffic post-intervention, we evaluated the model on 10 blocks of 60 days leading up to the intervention, using all the data available relative to each of the evaluation blocks ("folds").

Results and Discussion
Using a model trained on daily traffic from 2016-02-05 (when we began tracking search engine-referred traffic separately from externally-referred traffic in general) through 2018-11-14 (the day before the intervention) to forecast a counterfactual, we found no statistically significant evidence of SEO improvement. The daily 95% Credible Interval for the estimated impact consistently includes 0, which causes us to reject the hypothesis that sitemaps increase visits from search engines. These results are consistent with the results we saw in a previous analysis (which employed different methodology), wherein we did not find convincing evidence of impact. Given the lack of convincing evidence, we believe that sitemaps are not worth the effort of generation, deployment, and maintenance.