Talk:Requests for comment/Server-side Javascript error logging

EventLogging
Rather than reimplement something EventLogging-ish, it probably makes sense to either require EventLogging or make it pluggable. There can be a core implementation (possibly using a MW API as the RFC suggests, with corresponding client-side JS module) and an EventLogging listener that sends it to the standard EventLogging server setup.
 * Event logging is a tier-2 service not designed for bursts of traffic at all. Actually bursts of traffic are likely to affect and break all users of the service. Also being a tier-2 system (see: EventLogging/OperationalSupport) it is understood that EL can be down for a couple days without issues so it is not a good choice for client side logging which should be tier-1.NRuiz (WMF) (talk) 08:53, 6 August 2014 (UTC)

It may be enough to use mw.track; WikimediaEvents could subscribe and transform it into an EventLogging event. mw.track doesn't have any core subscribers, but perhaps a simple one (that could be turned off) that only listened to onerror could be added as part of this). Superm401 - Talk 00:16, 25 July 2014 (UTC)


 * My main concern is that EventLogging has been known to have trouble under unexpectedly high load. If we deploy a software update that causes a JS error in a large percentage of users, that could REALLY spike the load on the system with millions of very-similar entries. If EL can handle this, then great! If not, what would it take to either improve the EL infrastructure or build something similar that can handle that? --brion (talk) 00:26, 25 July 2014 (UTC)

What I have heard is that the way EventLogging transmits events could scale to significantly higher loads but the way it stores them (in MySQL) not so much. And we probably want something more searchable anyway. Is it possible to reuse the EL infrastructure but do something custom at the last step instead of writing into a DB? --Tgr (WMF) (talk) 00:29, 25 July 2014 (UTC)
 * Eventlogging is designed for discrete events and low load. As I pointed above EL is a tier-2 service (see: EventLogging/OperationalSupport), in any case tier-1 like logging should be. While there are similarities between what EL does and client side logging would do there are more differences, the main one is bursts of traffic. Also EL data is neither public nor searchable. Seems to me that you need a specific client side mechanism with adjustable sampling rates, client side throttling and searchable backend over which alarms can be set. You have no need for schema versioning (one of EL strong points), very long term storage  or an event capsule NRuiz (WMF) (talk) 08:53, 6 August 2014 (UTC)
 * Nuria, what I meant by reusing is that the error logger would have its own JS code and its own Vagrant endpoint, would use the same mechanism as EventLogging (Kafka?) to send log events from Vagrant, and would have a different consumer process at the and than EL. Do you think this could work, or would bursts still be problematic? --Tgr (WMF) (talk) 16:40, 6 August 2014 (UTC)
 * Tgr (WMF) I get your idea, I think. But it will still be problematic as the burst are not handled well precisely by the part that is in between varnish and kafka. Also, EL is build around schema validation which in this case does not apply. I understand there are similarities but I would use a similar but much, much lighter system whose client besides being subjected to a sampling rate has throttling. Also I would log to flat files rather than a db, or kafka or hadoop. Even to logstash, by its very nature is info that will be outdated after a quarter ad maximum. Also there aren't several consumers of the info but just one. Overall is a much simpler system when it comes to design but much higher volume that EL is. Do reach me in IRC if you want to talk about this further. 87.221.167.158 17:39, 6 August 2014 (UTC)

The other way this could cause a spike is when errors are generated by some event which fires extremely often (say, a broken scroll or mousemove handler). This should probably be throttled on the client side though. --Tgr (WMF) (talk) 00:31, 25 July 2014 (UTC)

Reusability and data processing
I understand that your solution needs to scale to WMF needs, but please make it so that it is also usable by other sites.

We do have a solution at translatewiki.net which implements error catching and delivery to the server, but we lack good processing and displaying. Compared to PHP error logs, JS are of much lower quality (lack of info, l10n as mentioned in the RFC) but also lots of user specific issues due to user scripts or browser extensions – something you cannot reproduce easily and there isn't enough information to understand what is causing it. Hence it has not seen much use in practice.

I suggest you do more research for candidate solutions for processing and display part – preferably so that we are not tried to one component which is not easy to install by third parties.

There is no mention of tracking the version of MediaWiki core, extensions and what gadgets are enabled. I think these are important information to know when new errors are appearing or old ones going away.

--Nikerabbit (talk) 10:06, 25 July 2014 (UTC)

For small sites, errors would be logged to a dedicated channel via the normal MediaWiki logging architecture; they are free to configure whatever handling they want for that channel. Any code that we write for processing should be reusable in such a scenario, that's a good point (the easiest way would be to expose them as Monolog filters). For display, small sites can probably use software built specifically with this aim (see Non-MediaWiki examples).

Good point about the version information, added that to the proposal. --Tgr (WMF) (talk) 19:07, 25 July 2014 (UTC)

Target audience
Were it implemented, would gadgets, global scripts, user scripts be able to use the server-side logging? --Gryllida 07:02, 27 July 2014 (UTC)

Any Javascript error would be logged, regardless of the source. --Tgr (WMF) (talk) 23:20, 27 July 2014 (UTC)

Opting out
Would global scripts, user scripts, gadgets, or any code in principle be able to opt-out of server-side logging? Would you consider such ability to opt out necessary, and if so, in what circumstances? Gryllida 07:02, 27 July 2014 (UTC)

Probably not. What would be the point of opting out, anyway? I can see users wanting to opt out for privacy reasons, and supporting that should be straightforward, but why would a gadget need to opt out? --Tgr (WMF) (talk) 23:23, 27 July 2014 (UTC)

I had debugging use-case in mind, during which a user breaks things on purpose. Then users opt-out may work. :-)

I'll be happy to see more discussion on this topic, if needed, such as for other use-cases I may have missed. --Gryllida 08:55, 29 July 2014 (UTC)

Errors coming from userscripts / dynamic code can just be filtered out on the server side (and have to be anyway, browser extensions tend to generate all sort of crap), no need for users to bother with that. The errors come with a stack trace, so it's easy to see whether they happen in MediaWiki core, an extension, a gadget or something else. --Tgr (WMF) (talk) 22:41, 29 July 2014 (UTC)

Other languages people
The proposal page is only in English. How would you like non-English projects people (who don't visit mw.org and who don't read mailing lists) to know about this request for comment? --Gryllida 08:57, 29 July 2014 (UTC)

RfC meeting
This RFC has been scheduled to be discussed in the Architecture RfC meeting today, 2014-09-03. Sorry for the late notice.--Qgil-WMF (talk) 15:49, 3 September 2014 (UTC)