Topic on Talk:Product Analytics/Comparison datasets

Privacy friendly data storage

4
Ainali (talkcontribs)

It would be cool if you also stored the dataset on the Wikimedia servers. Thanks.

Neil Shah-Quinn (WMF) (talkcontribs)

Hmm, what exactly do you have in mind? It would be quite easy for us to export the data files and upload them somewhere (does Commons support CSV?).

If you're thinking about on-wiki lists, I personally (meaning, I can't speak for my manger 😊) would be excited to work on using this data to enhance on-wiki lists like meta:List of Wikipedias, but I don't know whether the Meta community would even support that and at the moment I don't feel like I have time to investigate.

Ainali (talkcontribs)

Yes, there is a Data namespace on Commons. But really, that should have been the first option. It is really strange that WMF staff is using and linking to these kinds of services that are not only proprietary but well known for collecting the users' data.

Neil Shah-Quinn (WMF) (talkcontribs)

The reason I chose Google Sheets it that it provides an excellent experience browsing, filtering, and sorting data without requiring any setup from the viewer or maintenance from the data provider.

I did briefly consider the possibility of using a tool on Cloud Services. I'm sure there are some open-source tools that can provide a similar data browsing interface, but the overhead of configuring and maintaining something like that would be way too high.

I have heard of the Data namespace, although in general it is very little-known. So simply based on that it shouldn't be so surprising for it not to be the first choice.

But the much bigger issue is that it is a very poor experience for actually browsing the data: no sticky headers, no filtering, no nice formatting (alternating row colors, thousands separators, rounding of fractional parts, etc.), nothing like tabs to make a multi-page tool, and so on. (This isn't really a criticism of the Data namespace because its main purpose is storage, not presentation.)

Of course, it's possible to download data from the Data namespace and open it in the spreadsheet tool of your choice, and in fact we have started saving the data as CSVs in the GitHub repo which provides the same option. (I'm planning to update the introduction in the spreadsheet to note this option; I can also do it in the Toolhub entry and the couple other places where it's publicly linked). We could potentially supplement this with uploading to the Data namespace.

But these options don't satisfy the use case of low-friction, user-friendly, zero-maintenance data browsing. For that, I'm not aware of anything better than Google Sheets but I'm open to suggestions.

I understand the dislike for Google and its very questionable privacy practices. But I think we also have to balance that with pragmatism. DuckDuckGo is my search engine of choice, but sometimes it gives bad results, and when that happens, I also go to Google.

Reply to "Privacy friendly data storage"