Talk:EventLogging/UserAgentSanitization

Reasoning for processing at time of data collection
The lead paragraph sets up the reasoning for cleaning up user agents for the release of public datasets and then, seemingly without justification, states that processing will occur before user agents are stored. Why would we process *before* storing the user agent? It seems to me that a more logical time to perform this processing is prior to release of a dataset. I like this alternative better because processing user agents is messy and new patterns in user agent strings are likely to break our processing strategy from time to time. If we only store post-processed data, we don't have the opportunity to process it again with updated processing strategies. --Halfak (WMF) (talk) 15:58, 8 January 2014 (UTC)
 * Why wouldn't it? Avoiding to store private data altogether allows not to worry about privacy issues for the data in question. --Nemo 16:02, 8 January 2014 (UTC)