Reading/Web/Advice when writing EventLogging schemas

When building a schema, there is always a risk of exposing information to an attacker about users. It's thus important to think about the questions your schema would like answered and be careful about the sort of data you log.

The analytics team have provided best privacy practices for when writing schemas.

We advise that for new schemas we avoid trying to capture everything possible. We also advise talking to an analytics team member prior to deploying a schema to production. Fixing privacy problems can be expensive and it's worth the investment up front!

More than one schema?
It's possible that privacy issues can be avoided by splitting schemas into multiple schemas. For example if you plan to answer questions such as "What is the most popular page printed on Wikipedia?" as well as "How many people print in the Vector skin on mobile?" it might be necessary to use 2 schemas to capture this information without sacrificing our user's privacy. Probably, both schemas would need to be sampled, otherwise, the timestamp field could be used to re-link them (if the schema is sampling user sessions instead of events, the sessions should also be sampled independently in both schemas).