Jump to content

Product Analytics/Superset Access

From mediawiki.org
This page refers to instances of Superset and Turnilo that provide access to data in the Analytics Data Lake. For information about how to access Fundraising's instance of Superset, see Fundraising Access Request or email fr-analytics@wikimedia.org.
Screenshot of an edit counts dashboard
Screenshot of a traffic dashboard
Gallery of chart types available in Superset
Screenshot of hourly edit counts in Turnilo

Hello, dear reader! If you're on this page you're probably interested in checking out Superset.

Apache Superset (available at superset.wikimedia.org) is a dashboarding and data exploration tool. The screenshots on the right showcase two examples of it – reading and editing metrics dashboards created by Connie Chen – and illustrate some core features like formatted text (which can include links), menus for user input, and interactive charts. There are a lot of ways to slice and dice the data and many chart options in Superset (see gallery of options on the right) to visualize that data with.

Turnilo (available at turnilo.wikimedia.org) is a data exploration tool and can be thought of as a much lighter version of Superset. All the datasets available in one are also available in the other because both of these tools hook up into the same backend database called Apache Druid, where these datasets are stored as "data cubes" (aka "Druid Datasources" in Superset). Unlike Superset, Turnilo does not let you create dashboards and its charting capabilities are very limited. If you're familiar with the concept of pivoting when working with spreadsheets, Turnilo is basically just for doing that.

Access

[edit]

If you've tried to open either of those links (or any Superset/Turnilo links in the past), you've seen something like this:

Wikimedia Developer Single Sign-On Portal

It's understandable (and completely okay, even!) if you're not sure what exactly it's asking you or how to get past that.

Let's demystify it a bit.

A Wikimedia developer account is the account you use for developer services hosted by WMF. It's also known as "LDAP account/credentials" within the Wikimedia tech world, and you can create one yourself. The developer account password and the UNIX shell username you pick will be the login info you will use to access Superset/Turnilo.

Why?

[edit]

Great question! There are two reasons:

  1. The data in Druid/Superset/Turnilo has not been cleared for public release. In very few cases there are public versions of datasets – and those have been altered for public consumption for security/privacy reasons like the geoeditors dataset which has a private version and a public version – and in other cases the data may never become publicly available.
  2. LDAP info is how account access, group membership, and permissions are programmatically managed by the ops engineers on the SRE and Analytics Engineering teams. They have a whole system[1] set up for managing key-based authentication, permissions, and access to things like SWAP for running queries and performing analyses, the data lake (which includes events from EventLogging and traffic data that has IP addresses), and BI tools like Superset – which ties into the general system that they use to manage all of Wikimedia's infrastructure.

Besides being able to use Superset and Turnilo, having a developer account enables you to contribute to Wikimedia projects (including MediaWiki Core and MediaWiki extensions) on Gerrit (similar to GitHub) and use Wikimedia Cloud Services like Cloud VPS (similar to Amazon EC2) and Toolforge for things like making your own Wikipedia & Commons bots.

Requesting access

[edit]

Once you have a developer account, it's time to request addition to the appropriate groups (this will determine your level of data access). Your first step will depend on whether you need to request membership to the wmf or the nda LDAP groups.

  • The wmf group is generally for full-time req# staff. You request membership through the Wikimedia Identity Management System (IDM), accessible here once you have a developer account.
  • The nda group is for external collaborators and contractors who have signed an NDA. You will request nda group membership via Phabricator in the next step.

Membership in the wmf or the nda LDAP groups will grant you basic access to Superset & Turnilo, but not private data-based dashboards. If you only want basic access and you've already requested wmf membership, you're done! If this does not apply to you, it's time to create a Phabricator Access Request task.

Using this form, create a new task.

  • For "Requested Group Membership", enter nda (if applicable to you) and/or analytics-privatedata-users if you need to access private data-based dashboards.
  • For "SSH public key", leave this blank. If you are looking for a deeper level of access than private Superset dashboards, you will need to provide an SSH key and the process is documented here.
  • Other fields are relatively self-explanatory. Here is an example to use as a reference.

It might take a few days for this request to be processed but once it's done, it's done. Congratulations, you're one of the cool kids now!

Aside: if this is your first time hearing about it, Phabricator is the tool used for project management, software bug reporting and feature requests at Wikimedia. Refer to these instructions for creating a Phabricator account by logging in with your Wikimedia unified account or your developer account. Whichever one you don't pick can be linked to your Phabricator account later[2] to enable you to login with either.

Training

[edit]

If you're interested in learning more about Superset/Turnilo and what you can do with them, you can book an appointment with anyone on the Product Analytics team to have a certified data professional give you a tour of Turnilo, dashboarding in Superset, and explain the various datasets available within those tools. Training can be done 1:1 or in groups. See our Office Hours page for more details.

We also have a training video available on YouTube to anyone with a @wikimedia.org account.

Further reading

[edit]

References

[edit]