Product Analytics/Dashboarding Guidelines

From mediawiki.org

Publishing/sharing[edit]

Before publishing and/or sharing your Superset dashboard, please double check that you have:

  • contact information
  • correct access and permissions for your data and your audience

Refer to the sections below for details.

Contact Info[edit]

Use the following template for the information at the top or bottom of the dashboard as Markdown:

This dashboard is maintained by {NAME}, [Product Analytics](https://www.mediawiki.org/wiki/Product_Analytics). If you have questions or feedback please email {name}@wikimedia.org or product-analytics@wikimedia.org

Permissions[edit]

Virtual datasets[edit]

For Presto-based charts that rely on virtual datasets derived from event data, make sure the stakeholder has been added to analytics-privatedata-access group.

If they are not, ask them to request access through Phabricator. Refer T286746 to as an example.

Physical datasets[edit]

This is for tables in Hive with files in Hadoop only. Data ingested into Druid will automatically have appropriate permissions.

For charts that rely on Hive tables added as physical datasets, make sure that users outside of your group have read access to the files in Hadoop:

hdfs dfs -chmod -R o+r <path to your table>
If you have a recurring job that updates the dataset, you need to manually update the permissions with this command every time the job completes.
Example[edit]

Suppose you did your ETL and created a countries.csv that you then make available in Hive via:

import wmfdata as wmf

wmf.hive.load_csv(
    "countries.csv",
    field_spec="name string, iso_code string, economic_region string, maxmind_continent string",
    db_name="canonical_data",
    table_name="countries"
)

You add it as a physical dataset within Superset and create a chart that relies on it. To make sure that everyone can view that chart (and dashboard) you would update permissions with:

hdfs dfs -chmod -R o+r /user/hive/warehouse/canonical_data.db/countries

If you loaded data into Hive manually and have the data available elsewhere, change the path accordingly.