Topic on Talk:Wikimedia Apps/App Analytics

NRuiz (WMF) (talkcontribs)

You should sample not because the backend cannot digest all events but because you do not need all events to get a signal from your data.

FYI that the hadoop backend for eventlogging has no issue with scale (and we are working on making querying friendlier on this storage)

NRuiz (WMF) (talkcontribs)

Also, any test/sampling configuration should be set server side and read by the app(I think you are mentioning this in passing below in the document).As you do not want to have to release app updates to correct issues with analytics reporting.

JMinor (WMF) (talkcontribs)

Thanks Nuria. I do think there is a value/cost calculation that enters our decisions around sampling. You are saying there is diminished signal value from storing all data, and since storing all data is a bad thing™ with associated costs and risks, it is not worth doing without some specific need. Either way this is not motivated by EventLogging scaling ability, and I can edit that out.

Reply to "Sampling"