Reading/Search Engine Optimization/Sitemaps test

This page describes Wikimedia Audiences and Wikimedia Technology's work to improve Wikipedia presence in search results by creating XML sitemaps for search crawlers (tracked in T198965 on Phabricator). This extended study is a follow-up to the inconclusive analysis of an earlier effort on Italian Wikipedia. Using Bayesian structural time series modeling to analyze the potential causal impact of sitemaps on search engine-referred daily traffic, we estimated the effect of sitemaps on page views per day to be a statistically insignificant increase of 3.38% on mobile and 1.54% on desktop on average. The daily 95% credible intervals of inferred SEO impact consistently included 0 during the 60 days post-release, which is consistent with the earlier analysis which yielded statistically insignificant results. The lack of convincing evidence suggests that sitemaps are, perhaps, not worth our effort to generate and maintain for our sites.

Introduction
The following languages have had sitemaps created and submitted to the Google Search Console:


 * Indonesian (idwiki)
 * Italian (itwiki)
 * Korean (kowiki)
 * Dutch (nlwiki, nds_nlwiki)
 * Punjabi (pawiki, pnbwiki)
 * Portuguese (ptwiki)

The following language have been kept from the [|sameAs A/B test] to be used as "controls" in this test:


 * Bhojpuri (bhwiki)
 * Cherokee (chrwiki)
 * Kazakh (kkwiki)
 * Catalan (cawiki)
 * French (frwiki)
 * Yoruba (yowiki)
 * Kalmyk (xalwiki)

Methods
We performed the analysis using the methodology introduced by Brodersen et al. (2015) wherein a Bayesian structural time series (BSTS) model is trained on the pre-intervention period of the set of control time series unaffected by the intervention. That model is used to generate predictions of the counterfactual time series – "what if sitemaps were not deployed?" in our case – and then we compared the predicted time series with the actual time series to infer the impact. This is the same approach employed by Xie et al. (2019) to asses the impact of the Hindi Wikipedia awareness campaign.

The model of search engine-referred traffic among treated wikis included a local trend and various seasonality & autoregressive components:


 * AR(5)
 * Day of week
 * Week of year
 * Month of year
 * Christmas & New Year as holidays

as well as three "control" time series which we assume to be unaffected by the intervention:


 * direct (non-referred) traffic to treated wikis
 * search engine-referred traffic to "control" wikis
 * direct (non-referred) traffic to "control" wikis

We utilized 10-fold forward-chaining cross-validation (CV) to estimate the MAPE of the models and assess the accuracy of our model in predicting the counterfactual. Since we were analyzing 60 days of traffic post-intervention, we evaluated the model on 10 blocks of 60 days leading up to the intervention, using all the data available relative to each of the evaluation blocks ("folds").

Results and Discussion
Using a model trained on daily traffic from 2016-02-05 (when we began tracking search engine-referred traffic separately from externally-referred traffic in general) through 2018-11-14 (the day before the intervention) to forecast a 60-day counterfactual from 2018-11-15 through 2019-01-14, we found no statistically significant evidence of SEO improvement. We modeled the mobile and desktop traffic separately and the results can be seen in and. In each figure, the top half shows the predictions $$\hat{y}$$ (with a 95% Credible Interval) in red and the actual time series $$y$$ in black; the bottom half is split between showing the estimated absolute impact $y - \hat{y}$ in blue and estimated relative impact $$\frac{y - \hat{y}}{\hat{y}}$$ in green.

Although the estimated impact is above 0 on most days – suggesting a possible positive effect – the daily 95% CI consistently includes 0, which means our model has not found evidence of impact on visits from search engines with the data we have. These results are consistent with what we saw in the previous analysis (which employed different methodology), wherein we did not find convincing evidence of impact. Given that lack of convincing evidence and our action plan, we are led to believe that sitemaps are probably not worth the effort of generation, deployment, and maintenance. However, if we wished to investigate sitemaps further on a second set of wikis, we would need to test on many more wikis and randomly select some of them to receive the treatment.