Analytics/Research and Data/status

Last update on: 2014-12-monthly

2013-08-monthly
 In August, we attended WikiSym and Wikimania. Dario Taraborelli gave a keynote address on actionable Wikipedia research at WikiSym, where several other Wikipedia research papers were presented. At Wikimania, we hosted two sessions focused on Wikimedia data and analytics tools. We also worked with Platform engineering this month on analyzing and visualizing HTTPS failure rates by country, in preparation for the switch to HTTPS as a default. We released new dashboards for the launch of notifications on 5 other Wikipedias and continued to provide ad-hoc support to teams in Editor Engagement. Last, we continued screening and interviewing candidates for an open research analyst position. 

2013-09-monthly
This month, Aaron Halfaker joined the research team as a full-time employee. We started to reorganize the team structure and engagement model in coordination with the Analytics developers. We performed a survival analysis of new editors in preparation for new experiments led by the Growth team, and worked with the team to iron out the data collection and experimental design for the fortcoming iteration of GettingStarted.

We worked with product owners to determine the initial research strategy for features with key releases scheduled for the next two quarters (Mobile Web, Beta Features, Multimedia, Flow, Universal Language Selector, Content translation). We started a cohort analysis of conversion rates for mobile vs desktop account registrations; the results will be published on Meta shortly.

We drafted a proposal to host tabular datasets in a dedicated namespace and solicited feedback from interested parties (particularly the Wikidata community). We also started fleshing out the Labs2 proposal, an outreach program for academic researchers and community members, launched at Wikimania 2013 in Hong Kong. We co-hosted the second IRC research office hours and prepared for the first Wikimedia research hackathon, an offline/online event to be held in various locations worldwide on November 9, 2013.

Last, we contributed to the September 2013 issue of the Wikimedia research newsletter.

2013-10-monthly
This month, we continued to support Growth and Mobile as the team's focus areas for this quarter. We published the results of the latest GettingStarted test run by the Growth team, we completed the cohort analysis for Mobile user acquisition and we worked with the Mobile team to prepare the launch a new test for new user activation, currently underway.

We analyzed active editor trends to determine whether the Wikipedia total active editors data for September 2013 (~67k) represented an anomalous change from seasonality and the long-term trend, and concluded that this was not the case. The results of this analysis call for the need to apply time series analysis and forecasting methods to other key performance indicators that the Foundation publishes on a daily or monthly basis.

We continued to work with the analytics engineers to provide requirements for Wikimetrics (with a particular focus on UserMetrics feature parity), and to perform data QA and validate the output of the application for metrics that were recently implemented.

We completed a round of consultations with internal stakeholders to identify research needs of each team in the organization and determine their priority. We presented a review of our activities for Q1 and plans for Q2 at the Analytics Quarterly meeting. We identified "metric standardization" as one of the goals the team will focus on in this quarter.

We organized and announced the inaugural Wiki Research Hackathon, a global event hosted in 8 locations in 5 countries, bringing together Wikimedia researchers, academics and community members to work on wiki research projects. The hackathon — the first event organized in the context of the Labs2 initiative — will take place on November 9, 2013.

2013-11-monthly
This month, we started work on metrics standardization, one of the team's quarterly goals. We published a number of supportive analyses of new user acquisition, activation and retention as well as "active editors" to assess issues and potential benefits of new definitions. The outcome of this analysis will inform design decisions for new dashboards focused on editor engagement.

In collaboration with the Platform team, we ran an A/B test to determine performance gains of localStorage. The results indicate that the use of localStorage significantly improves the site's performance for the end user: Module storage is faster. Readers whose pages load slower tend to browse less. Mobile browsers don't seem to benefit substantially from caching.

We published the results of a test designed to explore if displaying a short tutorial could improve the first-edit completion rate of newly-registered users on mobile devices. The results support the hypothesis, indicating that edit guiders are a good onboarding strategy for new mobile users.

We ran an analysis of anonymous editor acquisition as background research for new onboarding strategies designed by the Growth team and found that editors who edit as an IP right before registering an account are our most productive newcomers.

On November 9, 2013 we hosted the inaugural Labs2 Wiki Research Hackathon: it was the first in a series of global events meant to "facilitate problem solving, discovery and innovation with the use of open data and open-source tools" (read the full announcement). Highlights from the event are available in the latest issue of the Research Newsletter. We are planning to host a new hackathon in Spring 2014 and we are actively [mailto:wrh@wikimedia.org seeking] volunteers to host local and virtual meetups.

2013-12-monthly
This month, we kicked off a series of monthly research showcases as an opportunity for the team to share what we're learning about Wikimedia editors and projects, and new features and programs the Foundation is rolling out. Aaron Halfaker presented research on anonymous editors. The first showcase was targeted at an internal audience but we're considering making future showcases open to anyone via a public stream.

We analyzed the cause and impact of major over-reporting on page views in the last months of 2013. We filtered bogus traffic from the data, and published updated reports.

We also continued work on metrics standardization and presented the rationale for this project and the results of the initial round of analysis we conducted.

This month also saw the completion of the third volume of the research newsletter, which this year covered a total of 196 publications reviewed by volunteer contributors. A retrospective of research covered in the newsletter in 2013 will be published later in January.

2014-01-monthly
We conducted a thorough review of traffic data and trends and confirmed a downward trend in desktop pageviews in 2013. This trend is not reflected in desktop unique visitors or mobile traffic. We are working on complementing pageviews with other traffic metrics that will help us better monitor readership trends. We engaged with external parties (Google and comScore) to obtain data about referral and mobile traffic respectively.

We completed research on article creation trends on the largest Wikipedias and found substantial differences between different language Wikipedias; specifically, where anonymous editors are allowed to create articles, their success rate (% of articles kept) is substantially higher than that of newly registered editors. We also found that articles that started as Articles for Creation (AfC) and userspace drafts have a near 100% success rate, but the transition that English Wikipedia made toward directing newcomers to start AfC drafts appears to have substantially reduced the amount of successful articles created by newcomers, presumably due to the large review backlog.

We published an update on Visual Editor usage on Wikipedia projects where the editor is enabled by default.

We continued work on metrics standardization for the editor engagement vital signs project and published supportive analysis on definitions and parameter exploration for two proposed standardized user classes: new editor and productive new editor.

We worked with the Analytics Development and Legal teams to articulate use cases and the retention and anonymization strategy for data subject to the retention guidelines, in particular with respect to user agents.

We welcomed Sahar Massachi as a research contractor supporting the team with data analysis for fundraising tests and iterated on new modeling strategies for estimating test success (such as the number of dollars per banner impression). Before he joined us, Sahar worked with the fundraising team, where most recently he focused on writing tools to help the team easily and quickly understand the results of each test.

2014-02-monthly
 This month, we welcomed Leila Zia as the newest addition to the team. Leila joins the Foundation as a research scientist after completing a PhD in management science and engineering at Stanford University. Her work will initially focus on modeling editor lifecycles to better understand what affects their survival and retention.

We hosted the first public Research and Data showcase, a monthly showcase of research conducted by the team and other researchers in the organization. This month, we presented two studies on Wikipedia article creation trends and on the measurement of mobile browsing sessions. The showcase is hosted at the Wikimedia Foundation and live streamed on YouTube every 3rd Wednesday of the month at 11.30am Pacific Time.

We attended the 17th ACM Conference on Computer-supported cooperative work and Social Computing (CSCW '14) in Baltimore. Research on Wikipedia and wiki-based collaboration has been a major focus of CSCW in the past, and this year three Wikipedia research papers were presented. We hosted a session to discuss collaboration opportunities for researchers interested in tackling problems of strategic importance for Wikimedia (a detailed CSCW '14 report will follow on wiki-research-l).

We started creating public documentation for data sources and tools used by the team for research and data analysis and porting docs previously hosted on internal wikis (for example: analytics/geolocation).

We continued to provide ad-hoc support to various teams at the Foundation and worked closely with the Growth and Mobile teams to prepare and review results for their respective quarterly reviews.

2014-03-monthly
 This month we concluded the first stage of work on metrics standardization. We created an overview of the project with a timeline and a list of milestones and deliverables. We also gave an update on metrics standardization during the March session of the Research and Data monthly showcase. The showcase also hosted a presentation by Aaron Halfaker on his research on the impact of quality control mechanisms on the growth of Wikipedia.

We published an extensive report from a session we hosted at CSCW '14 on Wikipedia research, discussing with academic researchers and students how to work with researchers at the Foundation.

We submitted 8 session proposals for Wikimania '14, authored or co-authored by members of the research team.

We attended the Analytics team's Q3 quarterly review during which we presented the work performed by the team in the past quarter and our goals for the upcoming quarter (April-June 2014).

We completed the handover of Fundraising analytics tools and knowledge transfer in preparation for a new full-time research position that we will be opening shortly to support the Fundraising team.

We continued to provide support to teams in focus area (Growth and Mobile) with an analysis of the impact of the rollout of the new onboarding workflows across multiple wikis; an analysis of mobile browsing sessions and ongoing analysis of mobile user acquisition tests. We also supported the Ops team in measuring the impact of the deployment of the ULSFO cluster, which provides caching for West USA and East Asia.

2014-04-monthly
<section begin="2014-04-monthly"/>This month we kicked off work on mobile traffic metrics. We presented a plan for modeling and implementing these metrics, in coordination with the Analytics Dev team. Oliver Keyes gave an update on pageview data for desktop, mobile web and apps at the monthly metrics meeting.

We continued to provide support for the Vital Signs project by working with the Dev team on metrics and code requirements, as well as visualization and data presentation options.

Aaron Halfaker presented his work on Snuggle – an observation and mentoring system for new Wikipedians – at the 2014 ACM Conference on Human Factors in Computing Systems (CHI '14).

We published longitudinal data on editor activation and mobile vs desktop new user acquisition across the largest Wikipedias.

We posted a job opening for a full-time Research Analyst to support Fundraising and become part of our team.

We started work on the editor lifecycle and editor trajectories, with the goal of understanding the drivers of active editors and power editors and modeling the survival of contributors to WIkimedia projects.

We provided ad-hoc support to the Product team for the onboarding of the new Executive Director.

This month we also released tools to perform analysis of Wikimedia data. Aaron Halfaker published a Python library called mediawiki-utilities for extracting and processing data from MediaWiki installations, slave databases and xml dumps. Oliver Keyes released WikipediR, an R wrapper for the MediaWiki API, aimed particularly at the Wikimedia 'production' wikis, such as Wikipedia.<section end="2014-04-monthly"/>

2014-05-monthly
<section begin="2014-05-monthly"/>This month we worked on a more granular definition of Active Editors to model the main drivers of this metric and its decomposition into different segments of the editor population.

We presented new results on the activation of newly registered users and on collaboration patterns in article creation (which triggered some discussions on-wiki: & ).

We continued work on modeling active editor survival, preliminary results will be presented at the June Research Showcase.

We attended the Zurich Hackathon and worked with community members on various projects of analytics relevance, in particular:
 * Resources for wiki-tool builders: &
 * Data for GLAMs:

We screened and interviewed candidates for the recently opened fundraising research position.<section end="2014-05-monthly"/>

2014-06-monthly
<section begin="2014-06-monthly"/>This month we refined the Editor Model – a proposal to model the main drivers of monthly active editors – and expanded the documentation of the corresponding metric definitions. We applied this model to teams designing editor engagement features (Growth, Mobile) and supported them in setting targets for the next fiscal year.

We analyzed the early impact of the tablet desktop-to-mobile switchover on traffic, |total edit volume, unique editors, and new editor activation.

We hosted the June 2014 edition of the research showcase with two presentations on the effect of early socialization strategies and on predictive modeling of editor retention.

We released wikiclass, a library for performing automated quality assessment of Wikipedia articles.

We released longitudinal data on the daily edit volume for all wikis with VisualEditor enabled, since the original rollout.

We continued work on an updated definition for PageViews.

Finally, we held our quarterly review (Q4-2014) and presented our goals for the next quarter (Q1-2015).<section end="2014-06-monthly"/>

2014-07-monthly
<section begin="2014-07-monthly"/>This month, we completed the documentation for the Active Editor Model, a set of metrics for observing sub-population trends and setting product team goals. We also engaged in further work on the new pageviews definition. An interim solution for Limited-duration Unique Client Identifiers (LUCIDs) was also developed and passed to the Analytics Engineering team for review.

We analyzed trends in mobile readership and contributions, with a particular focus on the tablet switchover and the release of the native Android app. We found that in the first half of 2014, mobile surpassed desktop in the rate at which new registered users become first-time editors and first-time active editors in many major projects, including the English Wikipedia. An update on mobile trends will be presented at the upcoming Monthly Metrics meeting on July 31.

Development of a standardised toolkit for geolocation, user agent parsing and accessing pageviews data was completed.

We supported the multimedia team in developing a research study to objectively measure the preference of Wikipedia editor and readers.

We hosted the July research showcase with a presentation by Aaron Halfaker of 4 Python libraries for data analysis, and a guest talk by Center for Civic Media's Nathan Matias on the use of open data to increase the diversity of collaboratively created content.

We prepared 8 presentations that we will be giving or co-presenting next week at Wikimania in London. We also organized the next WikiResearch hackathon that will be jointly hosted in London (UK) (during the pre-conference Wikimania Hackathon) and in Philadelphia (USA) on August 6-7, 2014.

We filled the fundraising research analyst position: the new member of the Research & Data team will join us in September and we'll post an announcement on the lists shortly before his start date.

Lastly, we gave presentations on current research at the Wikimedia Foundation at the Institute for Scientific Interchange (Turin) and at the DesignDensity lab (Milan).<section end="2014-07-monthly"/>

2014-08-monthly
<section begin="2014-08-monthly"/>This month we hosted the WikiResearch hackathon, a dedicated research track of the Wikimania hackathon. 3 demos of research code libraries were broadcast during the event and several research ideas filed on Meta. Highlights from the hackathon include: Quarry (a web client to query Wikimedia's slave databases on Labs); wpstubs (a social media bot broadcasting newly categorized stubs on the English Wikipedia); an algorithmic classification of articles due to be re-assessed from the English Wikipedia WikiProject Medicine's stubs. We gave or participated in 8 presentations during the main conference.

We published a report on mobile trends expanding the data presented at the July 2014 Monthly Metrics meeting. We started work on referral parsing from request log data to study trends in referred traffic over time.

We generated sample data of edit conflicts and worked on scripts for robust revert detection. We published traffic data for the Medicine Translation Taskforce, with a particular focus on traffic to articles related to Ebola.

We wrote up a research proposal for task recommendations in support of the Growth team's experiments on recommender systems. We analyzed qualitative data to assess the performance of Cirrus Search "morelike" feature for identifying articles in similar topic areas. We provided support for the experimental design of a first test of task recommendations. We performed an analysis of the result of the second experiment on anonymous editor acquisition run by the Growth team.

We hosted the August 2014 research showcase with a presentation by Oliver Keyes on circadian patterns in mobile readership and a guest talk by Morten Warncke-Wang on quality assessment and task recommendations in Wikipedia.

We also gave presentations on Wikimedia research at the Oxford Internet Institute, INRIA, Wikimedia Deutschland (slides) and at the Public Library of Science (slides). Aaron Halfaker presented at OpenSym 2014 a paper he co-authored on the impact of the Article for Creation workflow on newbies (slides, fulltext).<section end="2014-08-monthly"/>

2014-09-monthly
<section begin="2014-09-monthly"/>This month we onboarded Ellery Wulczyn as the newest addition to the Research & Data team. Ellery recently finished a Computer Science Masters program at Stanford and joins us as a full-time research analyst after completing a summer fellowship with University of Chicago's Data Science for Social Good program. His focus at WMF is going to be fundraising research and analytics. Welcome, Ellery!

We completed the definitions, documentation and requirements for a new set of metrics to be implemented in Vital Signs.

We completed a first draft of a page view definition, which is currently being discussed. We supported the mobile team with baseline traffic reports for Apps and Mobile Web.

We participated in the preparatory sessions for the design of an open consultation led by the Community Liaison team as well as in regular meetings to support the strategy consultation process.

We held our Q1-2015 quarterly review, reviewed the team's progress against Q1 goals and posted our proposed Q2 goals. <section end="2014-09-monthly"/>

2014-10-monthly
<section begin="2014-10-monthly"/>This month we started recruiting a full-time research analyst dedicated to traffic and readership research and analytics.

We created a number of reports on mobile readership, including: daily snapshots of pageviews by device class and access method; a report on OS version breakdown for iOS app users and overall app usage; updated stats on the impact of the tablet redirect.

We started the documentation on mobile microcontributions and provided instrumentation requirements and experimental design support for a series of tests planned by the Mobile team to be rolled out in this quarter. We also started background research on missing claims in Wikidata as a function of item classes and language, to help inform the design of new types of microcontributions.

We hosted our monthly research showcase, with a presentation by Aaron Halfaker on Wikipedia as a socio-technical system and a guest talk by BarcelonaMedia's David Laniado on sentiment analysis of editor discussions.

We made substantial progress on defining reader engagement metrics. We also supported several external requests for data, including a request to extract a complete set of references with PubMed Identifiers and a dump of revert metadata for the English Wikipedia.

We completed a preliminary analysis of HHVM's effect on newly registered editors.<section end="2014-10-monthly"/>

2014-11-monthly
<section begin="2014-11-monthly"/>In November, we supported the Fundraising team with the preparation and kickoff of the English fundraising campaign.

We started applying our new page view definition towards a number of reports and presentations, including an update for the Wikimedia board on readership trends.

We continued supporting the Mobile team with data on mobile traffic and prepared the launch of two controlled tests of microcontributions on the beta version of the mobile site. We performed preliminary analysis and QA of the data in preparation of a larger test to be launched on the stable site in January. We concluded data analysis for the test of HHVM and found no conclusive evidence that HHVM substantially affects newcomer engagement in Wikipedia, but hypothesized that HHVM would have effects elsewhere.

We hosted a research showcase with Yan Chen (University of Michigan) as a guest speaker. We finalized a formal collaboration with a team of researchers at Stanford University, to be launched in December. A workshop we submitted to CSCW '15 on creating industry/academic partnerships for open collaboration research was accepted and will be held at the conference in Vancouver on March 14, 2015.<section end="2014-11-monthly"/>

2014-12-monthly
<section begin="2014-12-monthly"/>We gave a at the December 2014 Monthly Metrics meeting, highlighting growth and decline in pageviews to Wikimedia projects across different global areas and by access type. We continued to provide research support to the English fundraising campaign throughout the month and to the Mobile team with code instrumentation, experimental design and data analysis.

We kicked off a collaboration with a research team at Stanford University (Bob West and Jure Leskovec) aimed to understand and improve link coverage in Wikipedia across languages. In collaboration with this team, we submitted a workshop to ICWSM '15 that was accepted in the conference program and will be held in Oxford on May 26, 2015.

We kicked off a new IEG-funded project to bring a machine learning based revision scoring service to Wikimedia Labs with User:とある白い猫 and User:He7d3r.

We published an overview of the most edited articles in 2014 (press). We published an open data corpus from the Article Feedback pilot in collaboration with the Product team. The corpus contains over 1.5 million messages posted on three major Wikipedias (English, French, German) over the course of a year.

We hosted our monthly showcase with a presentation on mobile readership by Oliver Keyes and a guest talk on flu forecasting with Wikipedia data by Reid Priedhorsky (Los Alamos National Laboratory).<section end="2014-12-monthly"/>