Product Analytics/Reporting Guidelines

The Product Analytics team produces several types of reports:

One-time substantial reports after the completion of a project and/or analysis. These are typically published as either a wiki-page or a PDF.
One shot analysis of specific questions from product teams. These are often published as comments on a Phabricator ticket.
Recurring reports such as weekly or monthly statistics about a project. These are typically published internally, or available externally through a shared analytics resource.

Common guidelines[edit]

All types of reports should describe the purpose of the project and the analysis.
Metrics should be defined, and reference relevant standardizations where applicable (e.g. standardized retention metric).
There's no need to create a standalone report if simply reporting the results in a relevant Phab ticket will suffice.

Types of reports[edit]

Phabricator comments[edit]

Sometimes it is enough to post results of your analysis as a comment on the relevant Phabricator task.

Provide sufficient detail about the methods used to gather data (e.g. provide a SQL query, define date ranges).
Define any relevant assumptions.
Uploading graphs directly to Phabricator is fine. If community members ask, the graphs might also be uploaded to Commons.

Generating tables[edit]

Python:

# df is a Pandas DataFrame

df.to_markdown(index = False)

R:

library(knitr)

cat(kable(data_frame, format = "markdown"))

In either case you will need to remove the alignment row (the second row which has colons that in most Markdown flavors is used to specify column alignment but which does not work in Phabricator's flavor).

Recurring reports[edit]

Recurring reports are expected to be more lightweight and do not need to provide deeper discussions/analysis of the data, they are instead expected to be more of a data summary.
If the report is generated from a Jupyter Notebook, include a button to show/hide the code.
- the wmfdata package has a function for this: utils.insert_code_toggle()
- For creating an HTML version from the command line use python -m jupyter nbconvert --to html on Jupyter
Make it clear when the report was last updated and what range of data it contains.
The report should be easily accessible to relevant stakeholders (e.g. by having it hosted publicly if possible).

Substantial reports[edit]

(for lack of a better name)

These reports should include an executive summary of the results and the recommendations that follow from the analysis.
If the definition of a given method/metric becomes substantial, consider moving it to an Appendix and referring to it as reading for those who are interested.
Publishing the reports on-wiki (e.g. as a sub-page of a team's pages on MediaWiki-wiki) enables translation into other languages through standardized translation practices on wikis.
For PDF reports, we recommend you use R Markdown and wmfpar template.

Publishing reports[edit]

PDFs[edit]

For PDF reports – e.g. generated from R Markdown using the template from the {wmfpar} R package – the recommendation is to upload it to Wikimedia Commons. After uploading, edit the file to have the following meta information:

=={{int:license-header}}==
{{WMF-staff-upload|license=cc-by-sa-4.0}}
{{Wikimedia trademark}}

See Impact of sitemaps on Italian Wikipedia search engine-referred traffic for example.

HTML[edit]

For example, if converting a Jupyter notebook to HTML.

analytics.wikimedia.org[edit]

When publishing from analytics cluster (stat100X hosts), follow these instructions. This will make pages available on analytics.wikimedia.org – e.g. https://analytics.wikimedia.org/published/reports/wikipedia-android-app/suggested-edits-v2.html and https://analytics.wikimedia.org/published/reports/wikipedia-android-app/metrics/

NOTE: Jupyter restricts user's write permissions to within home directory for security reasons. So be sure to copy or move files into /srv/published via SSH in Terminal, as opposed to Terminal inside Jupyter. If you're planning on scheduling a recurring job via crontab to re-run and publish a report with some frequency, that has to be done via SSH (not Jupyter Terminal) also.

people.wikimedia.org[edit]

There's also the option of hosting the page in your personal directory on people.wikimedia.org. For example, my listing of Wikimedia/Wikipedia-related R packages.

You can either upload the files using an SFTP client like Transmit or use scp in Terminal. Another method is to put the files in a public git repository, clone the repo to ~/public_html on people.wikimedia.org and schedule cron job to git pull every now and then.

To restrict access to the files to users of the wmf and nda groups (similar to how Superset is restricted), follow these instructions: T290693#7343430

wikimedia-research.github.io[edit]

You can version control your analyses and reports as git repositories and store them on GitHub or GitLab. In addition to the wikimedia organization we also have an organization on GitHub called wikimedia-research where we tend to store our analysis repos, but you can also store repos under your account. Either way, to make HTML reports available there you can publish it with GitHub Pages.

In the repository's settings, find the section for GitHub Pages and pick the source. If the HTML file is in the repository root, select the main branch option. It is recommended to keep code and queries separate from the report and to put the R Markdown document named "index.Rmd" in a docs/ sub-directory and pick the other option for source. Whichever directory you use as the source to publish with GitHub Pages, be sure to include an empty .nojekyll file to tell the system to not use Jekyll for creating a website.^[1] The easiest way to do that is to touch .nojekyll in Terminal before you commit.

For an example repository, see docs/ inside Discovery-Search-Adhoc-RelevanceSurveys and the resulting page at https://wikimedia-research.github.io/Discovery-Search-Adhoc-RelevanceSurveys/

Future work[edit]

Using the Template:Wikimedia engineering project information to describe the report and defining the project's start and end dates will automatically categorize the report into the relevant date-based categories for WMF projects. [FIXME: have a Product Analytics-specific template for this]
If the report covers a large range of data, consider adding the ability to filter/focus on parts of the data through dynamic graphs. [FIXME: we need to know how to do this]

↑ https://github.blog/2009-12-29-bypassing-jekyll-on-github-pages/

[1] ttps://github.blog/2009-12-29-bypassing-jekyll-on-github-pages/

[1]