Executive Summary

This report is requested by product leaders to study the intervention impact on the number of edits on Portuguese Wikipedia after turning off IP editing. It concentrated on synthesizing a counterfactual through a forecast model that would estimate the edits on Portuguese Wikipedia if IP editing had not been turned off and estimating the impact by comparing the predicted edits to the actual edits. The findings, based on the model, show no conclusive evidence that turning off IP editing has a negative impact on editing activity.

Introduction

Portuguese Wikipedia turned off IP editing on October 4th, 2020. Since then, we observed non-reverted edits (excluding bot and revert edits) decreasing in the following months. In the 20/21 fiscal year Q2, edits on Portuguese Wikipedia decreased by 0.91% year over year (Figure 1).

Considering edits on all wikipedias increased by 13.5% during the same period, we want to examine whether Portuguese Wikipedia would have seen the same increase if not turning off IP editing.

To answer this question, we provided analysis using Prophet time series forecasting method,^[1] predicted the intervention impact on edits on Portuguese Wikipedia.

Data Characteristics

In order to estimate edits without intervention impact, we obtained monthly non-reverted edits (excluding bot and revert edits) from wmf.mediawiki_history table. The data gathered consisted of variables over a span of 69 months from July 2015 to February 2021. The variables given on a monthly basis include: edits on Portuguese Wikipedia (ptwiki), edits on all other wikipedias, and edits on all other wiki projects in Portuguese language.

To explore the pattern of edits on Portuguese Wikipedia, we looked at the historical data and graphically represented it in Figure 2.

The edits almost kept flat in the last five years with a slight downward trend. The five-year trend is not purely dominated by yearly or monthly seasonality patterns. It indicated some other factors are impacting the edits. A trend only model cannot explain all the factors. We chose to use a causal model to conduct analysis and selected some wikis which are correlated with Portuguese Wikipedia as the control regressors to reflect the impact of global events in the model. After exploring 311 Wikipedias and 8 projects in Portuguese language, we selected below projects as control regressors based on correlation coefficient.

Irish Wikipedia (gawiki)
Russian Wikipedia (ruwiki)
Sicilian Wikipedia (scnwiki)
Yiddish Wikipedia (yiwiki)
Portuguese Wikivoyage (ptwikivoyage)

We also explored edits by geo countries. 95% edits on Portuguese Wikipedia are from Brazil and Portugal. It has a good correlation with the edits on English Wikipedia from Brazil and Portugal. However the data is only available for a short period, not enough to forecast yearly seasonality and trend. If we have sufficient data, edits by geo countries could be a good control regressor candidate.

Model Selection

After evaluating the data and using statistical methods, we constructed multiple models for consideration.

1) model consisting of trend, seasonality, wikipedia regressors;

2) model consisting of trend, seasonality, Portuguese project regressors;

3) model consisting of trend, seasonality, Wikipedia regressors, Portuguese Wikivoyage regressor;

4) model consisting of trend, seasonality, Wikipedia regressors, Portuguese Wikivoyage regressor, pageview regressor.

We trained models using monthly data from July 2015 to September 2019 (the month before intervention), conducted 9 folds cross-validation to estimate the mean absolute percentage error (MAPE) and evaluated the accuracy of the models (Appendix A table1, Appendix B table2). After analyzing these models, we are able to determine the 3rd model is the most trustworthy model.

The model is structured by below three components: (Figure 3) $Edits=EditsByTrend+EditsByYearlySeasonality+EditsByExtraRegressors$ $(gawiki,ruwiki,scnwiki,yiwiki,ptwikivoyage)$

However, this model has room to improve. We discovered autocorrelation in residuals with one month lagging (Appendix B, table3). Given that Prophet is a wrapped model solution, to fine tune the model (for example, to include an autoregressive component) we have to consider some other statistical models with more flexibility. It could be our next step to consider.

Forecast

With the above trained model, we forecasted five months (from October 2020 to February 2021) edits without intervention. (Figure 4)

The black dots are the historical data in the pre-intervention period. Blue line and blue area are estimation and its 95% prediction interval. Red line is the actual edits after intervention. The actual number of edits is within the 95% prediction interval. In Appendix A Table1, the estimated absolute intervention impact (Actual - Prediction without intervention ) is not constantly below 0, -- meaning there is no evidence of edits decreasing due to intervention specifically.

Conclusion

As mentioned in the forecast session, we did not see the actual edits in the post-intervention period are significantly lower than the predicted edits without intervention. This statistical analysis presents evidence against the hypothesis that turning off IP editing negatively impacted editing activity. However, as mentioned in the model selection session, the current model has some limitations and room for improvement. A more accurate forecasting model may be able to yield results favoring the hypothesis. Furthermore, with more interventions - especially in a randomized controlled experiment design - would help us learn more about the relationship between IP editing and editing activity.

Appendix A: Forecast

Table 1: Intervention Impact
Month	Prediction w/o intervention	Prediction Lower Limit (95% PI)	Prediction Upper Limit (95% PI)	Actual w/ intervention	Absolute intervention impact	Relative intervention impact
2020-10-01	181569	154498	208989	183585	2016	1.11%
2020-11-01	182536	155625	208771	172090	-10446	-5.72%
2020-12-01	158032	128804	182893	179582	21550	13.64%
2021-01-01	207637	181810	233151	180340	-27297	-13.15%
2021-02-01	185493	158854	212808	165528	-19965	-10.76%

Appendix B: Model Diagnostics

Table 2: Cross Validation
Horizon	MAPE
1 month	10.99%
2 months	7.77%
3 months	10.59%
4 months	8.44%
5 months	11.73%

Table 3: Residual Check
Assumption	Diagnostic method	Conclusion
Normality	Histogram	Histogram is fairly bell-shaped, normality assumption holds.
	Kolmogorov-Smirnov test	P value=0.875 > 0.05, normality assumption holds
Linearity	Residual plot	Residual plot has no obvious pattern. Linearity assumption holds.
Constant variance (Homoscedasticity)	Residual plot	Residual plot has no obvious pattern. Constant variance assumption holds.
Autocorrelation	Durbin-Watson test	Durbin-Watson statistic=1.008, positive autocorrelation is affecting the model
	ACF-test	ACF lag 1 stands out. There is autocorrelation in residuals.

References

↑ "Prophet | Forecasting at scale. - Facebook ...." http://facebook.github.io/prophet/.

[1] "Prophet | Forecasting at scale. - Facebook ...." http://facebook.github.io/prophet/.

[1]