Architecture meetings/RFC review 2013-12-04

Wednesday, December 4, 2013 at 10:00 PM UTC at

Requests for Comment to review
Propose your own RFCs:


 * Requests for comment/Simplify thumbnail cache
 * Requests for comment/Structured logging
 * Requests for comment/Json Config pages in wiki (if it's in a stable enough state for discussion)

Meeting summary
Meeting started by MaxSem at 22:01:28 UTC (full logs).

  https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-04 (TimStarling, 22:02:40)

 RFC: Simplify thumbnail cache (TimStarling, 22:05:57)  https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache (TimStarling, 22:06:04) https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes (paravoid, 22:19:35) ACTION: AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage (TimStarling, 22:41:57) option 5 generally favoured, possibly with modifications, we will proceed with design work on it (TimStarling, 22:43:57)</li> ACTION: bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer (TimStarling, 22:44:18)</li></ol>

</li> RFC: Structured logging (TimStarling, 22:45:45)  https://www.mediawiki.org/wiki/Requests_for_comment/Structured_logging (TimStarling, 22:45:58)</li> ACTION: ori-l to expand RFC (TimStarling, 22:59:49)</li> https://github.com/mhart/gelf-stream (gwicke, 23:00:41)</li> JSON generally favoured as long as a plain text format can be also made available (TimStarling, 23:00:50)</li> transport selection based on URI-style destination string (TimStarling, 23:01:21)</li></ol> </li></ol>

Meeting ended at 23:05:10 UTC (full logs).

Action items

 * 1) AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage
 * 2) bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer ✅
 * 3) ori-l to expand RFC

Action items, by person

 * 1) AaronSchulz
 * 2) AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage
 * 3) bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer ✅
 * 4) bd808
 * 5) bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer ✅
 * 6) ori-l
 * 7) ori-l to expand RFC

People present (lines said)

 * 1) TimStarling (90)
 * 2) paravoid (74)
 * 3) gwicke (44)
 * 4) AaronSchulz (44)
 * 5) bd808 (42)
 * 6) ori-l (36)
 * 7) parent5446 (24)
 * 8) aude (15)
 * 9) RoanKattouw (8)
 * 10) bawolff (6)
 * 11) MaxSem (5)
 * 12) subbu (3)
 * 13) meetbot-wm (3)
 * 14) Krinkle (2)

Generated by MeetBot 0.1.4.

Full log
22:01:28 &lt;MaxSem&gt; #startmeeting 22:01:28 &lt;meetbot-wm&gt; Meeting started Wed Dec 4 22:01:28 2013 UTC. The chair is MaxSem. Information about MeetBot at https://bugzilla.wikimedia.org/46377. 22:01:28 &lt;meetbot-wm&gt; Useful Commands: #action #agreed #help #info #idea #link #topic. 22:01:37 &lt;MaxSem&gt; #chair TimStarling 22:01:37 &lt;meetbot-wm&gt; Current chairs: MaxSem TimStarling 22:01:45 &lt;parent5446&gt; Ah there we go 22:02:02 &lt;MaxSem&gt; yay, I hacked a bot!:P 22:02:34 &lt;TimStarling&gt; ok, so there are 3 RFCs on the wiki page 22:02:40 &lt;TimStarling&gt; #link https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-04 22:03:11 &lt;TimStarling&gt; do we have people here who want to talk about them, and are there any others that those present want to add? 22:03:29 &lt;bd808&gt; Ori would like to request that the logging rfc be &quot;not first&quot; as he is AFK until 22:30Z 22:04:03 * aude waves :) 22:04:21 &lt;TimStarling&gt; well, we have you and paravoid, we could talk about &quot;Simplify thumbnail cache&quot; 22:04:30 &lt;paravoid&gt; indeed 22:04:34 &lt;paravoid&gt; that's why I'm here :) 22:04:48 &lt;TimStarling&gt; ah, and there's the third author 22:04:49 &lt;paravoid&gt; and now AaronSchulz too :) 22:05:33 &lt;bd808&gt; Sounds good to me 22:05:56 &lt;paravoid&gt; so, bd808 since you proposed this for discussion (and wrote all the text :)), do you want to take point? 22:05:57 &lt;TimStarling&gt; #topic RFC: Simplify thumbnail cache 22:06:04 &lt;TimStarling&gt; #link https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache 22:06:32 &lt;bd808&gt; I mostly just collected notes from paravoid and AaronSchulz :) 22:06:46 &lt;bd808&gt; But sure. 22:06:55 &lt;TimStarling&gt; bd808: which is your preferred option? 22:07:03 &lt;paravoid&gt; is the problem statement clear enough to everyone? 22:07:14 &lt;gwicke&gt; pretty clear to me 22:07:15 &lt;parent5446&gt; So basically we want to get thumbnails off of Swift. 22:07:29 &lt;bd808&gt; And make purges easier 22:07:41 &lt;aude&gt; as long as we can still generate thumbnails of arbitrary size ( on cache miss), it seems fine 22:07:50 &lt;aude&gt; they don't have to be stored forever 22:08:23 &lt;TimStarling&gt; well, I don't think bd808 does want thumbnails off of swift, based on his talk page comments 22:08:39 &lt;gwicke&gt; is there an implementation of the purge pattern match already? 22:09:01 &lt;RoanKattouw&gt; The RFC text suggests that thumbs would move off of Swith 22:09:03 &lt;RoanKattouw&gt; *swift 22:09:09 &lt;RoanKattouw&gt; &quot;3. Configure MediaWiki imagescalers to stop storing generated thumbnails in Swift&quot; 22:09:21 &lt;AaronSchulz&gt; right 22:09:22 &lt;aude&gt; what exactly are the imagescalers (excuse my ignorance) 22:09:34 &lt;bd808&gt; gwicke: There is not an implementation yet, but AaronSchulz has been dying to start working on that 22:09:35 &lt;AaronSchulz&gt; RoanKattouw: I think some would stay for a while though 22:09:41 &lt;aude&gt; in this context* 22:09:51 &lt;AaronSchulz&gt; like media that supports pages and can have many thumbs for one file version 22:09:56 &lt;bd808&gt; I think that &quot;most&quot; would move off of swift. 22:09:59 &lt;gwicke&gt; I would imagine the idea is something like hashing different thumbs to the same cache entry, and then vary on the size? 22:10:03 &lt;RoanKattouw&gt; aude: They are Apache machines dedicated to image scaling 22:10:05 &lt;paravoid&gt; aude: mediawiki application servers that scale uploaded content to thumbnails per request 22:10:06 &lt;AaronSchulz&gt; if we don't use vcl_hash tricks on those, they will have to work the old fashioned way 22:10:11 &lt;aude&gt; RoanKattouw: paravoid thanks 22:10:15 &lt;MaxSem&gt; what about thumbs that are extremely slow to render? 22:10:16 &lt;AaronSchulz&gt; until they get refactored somehow or something 22:10:22 &lt;parent5446&gt; On the note of the imagescalers, do we know if they can handle that 5x increase in utilization? 22:10:25 &lt;TimStarling&gt; the problem is infinite growth of thumbnail storage 22:10:35 &lt;RoanKattouw&gt; HTTP request for nonexistent thumb comes in, thumb is generated locally, stored, HTTP response with thumb goes out 22:10:37 &lt;TimStarling&gt; MaxSem: the thing that is slow is the fetch of the original 22:10:44 &lt;bd808&gt; AaronSchulz has pointed out that some media types are very time consuming to extract thumbs from and should probably be retained in durable storage. 22:10:47 &lt;AaronSchulz&gt; MaxSem: we use ssds in varnish (not just memory cache) 22:10:51 &lt;TimStarling&gt; the actual image scaling part is pretty fast, and can easily be scaled up 22:11:17 &lt;TimStarling&gt; so that should answer parent5446's question also 22:11:23 &lt;parent5446&gt; Yep, thanks. 22:11:27 &lt;AaronSchulz&gt; we also have some simple &quot;ping limiting&quot; in place for thumb.php 22:11:33 &lt;TimStarling&gt; yes, we can absolutely scale 5x as many images, but we can't fetch the originals that fast 22:11:38 &lt;aude&gt; bd808: what about having some fixed size thumbnails for some stuff? 22:11:46 * aude thinking of gigantic tiffs and videos 22:11:48 &lt;AaronSchulz&gt; to avoid too much LRU churn or wasted I/O and CPU from trolling a bit 22:11:57 &lt;bd808&gt; The new MediaViewer feature has shown that generating everything on the fly may be slower than people are used to. 22:11:58 &lt;aude&gt; then stuff can be scaled from those? 22:11:58 &lt;paravoid&gt; TimStarling: why do you think so? 22:12:13 &lt;AaronSchulz&gt; bd808: new thumbnail sizes? 22:12:40 &lt;paravoid&gt; bd808: could you talk a little about that? I haven't heard anything and this sounds interesting 22:12:40 &lt;AaronSchulz&gt; we are also still replicated writes across a DC in a synchronous manner that I can't stand 22:12:43 &lt;bd808&gt; aude: That would be a possibility and actually the subject of https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes 22:12:43 &lt;TimStarling&gt; paravoid: why do I think we can't fetch originals that fast? 22:12:46 &lt;AaronSchulz&gt; *replicating 22:12:52 &lt;aude&gt; bd808: i know 22:12:56 &lt;AaronSchulz&gt; bd808: that doesn't help ;) 22:12:58 &lt;TimStarling&gt; paravoid: I thought there was a comment from you to that effect 22:13:16 &lt;AaronSchulz&gt; aude: we do something like that with TMH 22:13:32 &lt;AaronSchulz&gt; if two different sized thumbs are requested for the same time position of a video 22:13:32 &lt;bd808&gt; MediaViewer asks for new thumb sizes 22:13:40 &lt;AaronSchulz&gt; a reference frame is used for scaling 22:13:40 &lt;aude&gt; it's contrary to allowing arbitrary sized, but maybe certain cases it makes sens to have special handling for some types of files 22:14:09 &lt;aude&gt; AaronSchulz: makes sense 22:14:11 &lt;bd808&gt; Perfomance is getting better with some changes made by the team, but initial testing was found to be 2-5 seconds for many thumbs to generate 22:14:29 &lt;TimStarling&gt; we can start the image scaling at parse time 22:14:39 &lt;bd808&gt; The changes they have made are basically to &quot;bucket&quot; thumb sizes 22:15:37 &lt;bd808&gt; I really like TimStarling's idea of adding a 3rd layer of varnish 22:15:45 &lt;paravoid&gt; I didn't understand the bucket thumb sizes part 22:16:17 &lt;gwicke&gt; we could also consider generating small thumbs from a smaller standard thumb size 22:16:32 &lt;parent5446&gt; &quot;4. Store &quot;standard&quot; thumbnails permanently and others with TTL (and possibly last use updating)&quot; 22:16:36 &lt;parent5446&gt; Also something worth considering 22:16:46 &lt;gwicke&gt; that would also help IO 22:17:01 &lt;gwicke&gt; and should be faster for video thumbs too 22:17:10 &lt;paravoid&gt; videoscaling is not part of this discussion 22:17:12 &lt;aude&gt; gwicke: essentially what i tried to say 22:17:22 &lt;paravoid&gt; or video thumbs 22:17:25 &lt;bd808&gt; paravoid: The original extension used the screen width of the browser to call for a thumb. Now they are using a series of sizes (histogram basically) and calling for the size closest to the screen width 22:17:25 &lt;bawolff&gt; I assume if we do the three layers of varnish thing, we would increase the max cache time? 22:17:29 &lt;AaronSchulz&gt; video thumbs already do that in TMH and indeed are not part of the rfc 22:17:45 &lt;gwicke&gt; k 22:18:15 &lt;bd808&gt; bawolff: I would guess that the 3rd layer would be backed by spinning disk and use LRU eviction based on sapce 22:18:22 &lt;bd808&gt; *space 22:18:37 &lt;paravoid&gt; and TTLs, and manual PURGEs 22:18:55 &lt;paravoid&gt; it does add some complexity, though. 22:18:56 &lt;AaronSchulz&gt; if you store standard sizes the URLs to purge are known (thus don't need swift) 22:19:11 &lt;AaronSchulz&gt; though changing the standard sizes would require running a one-off script 22:19:20 &lt;paravoid&gt; so, the standard sizes is a separate discussion 22:19:25 &lt;paravoid&gt; there is a separate RFC 22:19:29 &lt;paravoid&gt; it's very much related, though. 22:19:35 &lt;paravoid&gt; https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes 22:19:52 &lt;AaronSchulz&gt; and may be useful for the file types exempt from the bucketing 22:20:00 &lt;TimStarling&gt; bd808: it sounds like this MediaViewer extension needs some wider review 22:20:02 &lt;gwicke&gt; I was just considering it as an option for speeding up generation of non-standard thumbs 22:20:14 &lt;gwicke&gt; especially for speeding up the IO part of that 22:20:34 &lt;bd808&gt; TimStarling: I'm sure they would welcome feedback. Brion has been giving them some attention. 22:20:52 &lt;paravoid&gt; about the IO: storing millions of tiny files in spindles is very inefficient 22:20:56 &lt;TimStarling&gt; where will MediaViewer be used? 22:21:16 &lt;bd808&gt; It is currently deployed to all wikis I believe. 22:21:24 &lt;paravoid&gt; I don't expect Varnish to change that much, although it would change the fact that we are not going to store 3 copies 22:21:26 &lt;TimStarling&gt; on what pages is it activated? 22:21:36 &lt;paravoid&gt; TimStarling: it's part of the new &quot;beta features&quot; thing 22:21:37 &lt;TimStarling&gt; page views or image description pages? 22:21:41 &lt;AaronSchulz&gt; The main thing with this rfc is about having run-of-the-mill jpgs/pngs stored only in varnish and totally LRU and I wouldn't see much benefit to use reference thumbnails for that 22:21:46 &lt;bawolff&gt; Its hidden behind a preference 22:21:54 &lt;paravoid&gt; TimStarling: so you have to explicitly enable it as an experimental feature 22:22:02 &lt;Krinkle&gt; So it's deployed everywhere, but opt-in via beta preferences. It is exposed by clicking on an image thumb after enabling it. 22:22:04 &lt;TimStarling&gt; yes, and after you enable it, where does it appear? 22:22:08 &lt;AaronSchulz&gt; paravoid: that's how it always starts :) 22:22:10 &lt;bd808&gt; TimStarling: It's in the &quot;beta features&quot; program 22:22:18 &lt;TimStarling&gt; thanks Krinkle 22:22:22 &lt;gwicke&gt; AaronSchulz: if the IO portion of scaling can handle that, then that would certainly be simpler 22:22:29 &lt;Krinkle&gt; TimStarling: I had trouble discovering it as well, because we're trained to think that clicking a thumb opens the file page :) 22:22:56 &lt;paravoid&gt; gwicke: handle what? 22:23:02 &lt;gwicke&gt; AaronSchulz: but if IO becomes a bottleneck then reference thumbnails (even a single 1024x1024 bounding box one) could help a lot 22:23:07 &lt;paravoid&gt; sorry, lost in the subthreads of this discussion :) 22:23:29 &lt;gwicke&gt; paravoid: handle potential spikes in miss rates 22:23:57 &lt;gwicke&gt; in case varnish machines go down, there is a deploy issue or the like 22:24:29 &lt;paravoid&gt; so your preference seems to be alternative strategy (5), correct? 22:24:48 &lt;bd808&gt; Failure tolerance and (ab)use of vcl_hash I think are the big open questions with any of the schemes 22:25:03 &lt;bd808&gt; paravoid: personal I like (5) the best 22:25:18 &lt;TimStarling&gt; well, regarding vcl_hash, there is the secondary key feature mentioned 22:25:22 &lt;paravoid&gt; bd808: except the &quot;implementing LRU in a Swift middleware&quot; schemes 22:25:24 &lt;gwicke&gt; paravoid: a single thumb could also live in swift 22:25:32 &lt;TimStarling&gt; which might be &quot;months&quot; away, which doesn't sound so long to wait really 22:25:48 &lt;paravoid&gt; varnish 4.0 technology preview 1 got released... today 22:25:49 &lt;gwicke&gt; not sure that it would need to be LRUed 22:25:51 * AaronSchulz doesn't really get 5 22:25:57 &lt;bawolff&gt; Having 1 htcp packet purge everything sounds really nice 22:26:15 &lt;paravoid&gt; I haven't checked if it includes surrogate keys, though, and a deployment within the WMF is many months away indeed. 22:26:29 &lt;gwicke&gt; so close to 3) combined with the vcl_hash proposal 22:26:44 &lt;TimStarling&gt; AaronSchulz: 5 was my suggestion on the talk page 22:26:52 &lt;bd808&gt; LRU in swift is good but there was some question as to the performance of swift in deleting files 22:26:52 &lt;bd808&gt; I think you actually raised that paravoid ? 22:27:00 &lt;TimStarling&gt; AaronSchulz: follow the ref link 22:27:27 &lt;paravoid&gt; yes, as an open question, not as a known issue 22:27:57 &lt;TimStarling&gt; also, I am not sure if list traversal in a vcl_hash scheme is really worth worrying about 22:28:15 &lt;TimStarling&gt; there are two ways to look at the performance of it: throughput and latency 22:28:21 &lt;paravoid&gt; to be clear, we are excluding TIFF/PDF/Djvu from this discussion, correct? 22:29:07 &lt;TimStarling&gt; throughput: multiply the *mean* number of thumbnails per source by the time per link traversal 22:29:21 &lt;AaronSchulz&gt; paravoid: pretty much 22:29:23 &lt;TimStarling&gt; now, the mean is not large, you don't need to worry about djvu/pdf for that 22:29:37 &lt;TimStarling&gt; maybe for a djvu with 1000 pages it might take 1ms to traverse 22:29:45 &lt;TimStarling&gt; but that doesn't impact the throughput very much 22:29:53 &lt;TimStarling&gt; the other way to look at it is latenc 22:29:54 &lt;TimStarling&gt; y 22:30:03 &lt;bawolff&gt; Why exclude tiff. Tiff with many pahes are very rare 22:30:10 &lt;paravoid&gt; we have tons of of pdf/djvus with hundreds of pages * 5 thumbnails per page 22:30:10 &lt;bawolff&gt; *pages 22:30:18 &lt;TimStarling&gt; then you ask: what is the largest possible number of thumbnails on a given image and will that add user-visible latency to requests for that image? 22:30:34 &lt;AaronSchulz&gt; I think they could be added if it's fine on average, but they were to be excluded in first phases 22:30:46 &lt;TimStarling&gt; the limit there would be say 50ms of latency 22:30:54 &lt;paravoid&gt; that's an interesting approach, TimStarling 22:31:46 &lt;TimStarling&gt; I would expect linked list traversal in phk's style of C to be pretty fast 22:31:50 &lt;gwicke&gt; is there a need to have all entries end up on a single backend varnish? 22:31:56 &lt;TimStarling&gt; like, a lot less than a microsecond 22:31:59 &lt;bd808&gt; With the current application logic the upper bound is something like the width of the original image. 22:32:08 &lt;gwicke&gt; the purge requests are going to all varnishes I guess 22:32:12 &lt;paravoid&gt; I think it was mark who was mostly concerned about that, I don't have counterarguments. 22:32:47 &lt;bd808&gt; Actually width * number of pages I suppose. Do we vary on other dimensions? 22:32:49 &lt;AaronSchulz&gt; TimStarling: so 5 is just vcl_hash+second cache layer to deal with those eviction issues, OK 22:33:26 &lt;TimStarling&gt; AaronSchulz: yes 22:33:41 &lt;paravoid&gt; &quot;second&quot;, but yes :) 22:33:50 &lt;AaronSchulz&gt; I was confused at first since I thought it was a complete alternative 22:33:55 &lt;bawolff&gt; Bd808: not normally. Svg has language, tiff has lossless/lossy 22:33:57 &lt;paravoid&gt; I'd say &quot;additional, spindle-backed&quot; 22:34:13 &lt;AaronSchulz&gt; miser! :p 22:34:39 &lt;TimStarling&gt; AaronSchulz: third cache layer, really 22:34:44 &lt;gwicke&gt; so can't the variants for a single thumb be distributed across several backends to limit request latency? 22:35:01 &lt;paravoid&gt; TimStarling: or fourth, for esams/ulsfo 22:35:08 &lt;bd808&gt; memory -&gt; ssd -&gt; disk -&gt; scaler 22:35:09 &lt;paravoid&gt; let's stop counting cache layers, though :) 22:35:10 &lt;TimStarling&gt; yeah 22:35:30 &lt;paravoid&gt; gwicke: we'd have to write an custom director for this 22:35:33 &lt;AaronSchulz&gt; are you counting frontend+backend varnish (e.g. CARP)? 22:35:40 &lt;AaronSchulz&gt; I assume swift would not be part of this 22:35:45 &lt;paravoid&gt; the current ones are &quot;random&quot;, &quot;wrr&quot; and &quot;chash&quot; (which mark wrote) 22:35:48 &lt;paravoid&gt; it's not rocket science 22:35:50 &lt;gwicke&gt; paravoid, would that be difficult? 22:36:00 &lt;bd808&gt; Swift would only be used in (5) to fetch originals 22:36:10 &lt;AaronSchulz&gt; right, but not a cache layer 22:36:10 &lt;TimStarling&gt; wouldn't chash do it already? 22:36:22 &lt;gwicke&gt; probably depends on what you feed into the hash 22:36:45 &lt;paravoid&gt; well, yeah, I guess you could hack it up by appending a random replica number to the URL in vcl_hash 22:36:47 &lt;gwicke&gt; if that can be manipulated in VCL, then it might be relatively simple 22:37:06 &lt;paravoid&gt; it's a bit ugly, though, but either way, possible 22:37:07 &lt;AaronSchulz&gt; so we are over halfway though this meeting just to note 22:37:18 &lt;TimStarling&gt; paravoid: by variants, gwicke means thumbnail sizes, right? 22:37:24 &lt;TimStarling&gt; which are already in the URL 22:37:32 &lt;gwicke&gt; TimStarling, yes 22:37:43 &lt;TimStarling&gt; AaronSchulz: well, people seemed to take a while to warm up 22:37:50 &lt;gwicke&gt; they'd map to the same linked variant chain though 22:37:56 &lt;gwicke&gt; in storage 22:37:57 &lt;TimStarling&gt; it seems like the longer we run with it, the faster we make progress 22:38:10 &lt;AaronSchulz&gt; not saying we need to stop, just noting the time 22:38:19 &lt;gwicke&gt; but that's in the backend 22:38:49 &lt;gwicke&gt; if chash is purely url-based (which it is afaik), then we should already get a quasi-random distribution across backends 22:39:01 &lt;paravoid&gt; correct 22:39:05 &lt;gwicke&gt; so latency might not be that bad 22:39:06 &lt;paravoid&gt; I understood something different, I'm sorry. 22:39:12 &lt;TimStarling&gt; ok, so paravoid, what do you think of option 5? 22:39:23 &lt;TimStarling&gt; we need some conclusions and action items now 22:39:24 &lt;AaronSchulz&gt; vcl_hash + extra cache layer, starting with png/jpg and doing others later sounds reasonable? 22:39:52 &lt;TimStarling&gt; is anyone against option 5? 22:39:59 &lt;gwicke&gt; fine with me 22:40:02 &lt;paravoid&gt; I'm okay with option 5 22:40:03 &lt;paravoid&gt; but 22:40:14 &lt;paravoid&gt; I think we might need to consider just expanding the existing cache layer 22:40:14 &lt;gwicke&gt; although I could also live with storing a handful standard sizes in swift 22:40:24 &lt;gwicke&gt; at least one 'large screen size' thumb 22:40:31 &lt;AaronSchulz&gt; paravoid: right 22:40:40 &lt;TimStarling&gt; ok, well either way, we need the same MW support 22:40:44 &lt;paravoid&gt; SSDs are getting cheaper these days, it might not be worth it 22:40:51 &lt;paravoid&gt; nod, either way it doesn't matter much 22:40:59 &lt;AaronSchulz&gt; do we care if vcl_hash puts more hot thumbnails on single boxes? 22:41:02 &lt;TimStarling&gt; MW needs to be adapted to stop storing thumbnails, to just stream them out instead 22:41:26 &lt;TimStarling&gt; who will plan that? AaronSchulz? 22:41:29 * AaronSchulz is open to playing around with the hash since the htcp stream hits everything anyway, they'd still get the purges (as noted) 22:41:57 &lt;TimStarling&gt; #action AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage 22:42:07 &lt;AaronSchulz&gt; TimStarling: it would be a config switch I always assumed 22:42:14 &lt;bd808&gt; Will this need to be a feature flag option or can core change unilaterally? 22:42:17 &lt;TimStarling&gt; easy action for you then 22:42:23 &lt;paravoid&gt; mediawiki streams out thumbnails now anyway 22:42:28 &lt;AaronSchulz&gt; I also want it to send a header for vcl to use to determine the hash 22:42:37 &lt;paravoid&gt; it just stores them too 22:42:39 &lt;AaronSchulz&gt; I don't want some ugly regexes in vcl trying to look for thumbs 22:42:49 &lt;AaronSchulz&gt; it would be cleaner for the vcls to look for a custom header IMO 22:43:00 &lt;paravoid&gt; that's not possible I'm afraid 22:43:08 &lt;paravoid&gt; vcl_hash is called on the request path, not the response path 22:43:18 &lt;bd808&gt; AaronSchulz: It has to match the request URL right? 22:43:33 * bd808 doesn't type as fast as paravoid 22:43:42 &lt;AaronSchulz&gt; paravoid: hmm, right 22:43:57 &lt;TimStarling&gt; #info option 5 generally favoured, possibly with modifications, we will proceed with design work on it 22:44:09 &lt;paravoid&gt; thank you TimStarling 22:44:18 &lt;TimStarling&gt; #action bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer 22:44:35 * bd808 nods 22:44:53 &lt;paravoid&gt; do we need an action item for mediawiki to treat PDF/Djvu in a different way? 22:45:03 &lt;paravoid&gt; or is this part of the previous &quot;stream out&quot; action? 22:45:10 &lt;TimStarling&gt; paravoid: you can put notes on the talk page about that 22:45:20 &lt;paravoid&gt; okay 22:45:22 &lt;TimStarling&gt; we have time for a very quick look at one other RFC 22:45:32 &lt;paravoid&gt; ori-l just joined :) 22:45:36 &lt;paravoid&gt; right on time 22:45:37 &lt;parent5446&gt; Ori's here so we can briefly look at logging. 22:45:45 &lt;TimStarling&gt; #topic RFC: Structured logging 22:45:58 &lt;TimStarling&gt; #link https://www.mediawiki.org/wiki/Requests_for_comment/Structured_logging 22:46:04 &lt;AaronSchulz&gt; csteipp, ori-l: http://pastebin.com/phDgyNHi 22:46:16 &lt;gwicke&gt; +1 on using JSON 22:46:27 &lt;ori-l&gt; AaronSchulz: thanks 22:47:00 &lt;bd808&gt; gwicke: I looked at other alternatives but json seemed the clear winner 22:47:05 &lt;parent5446&gt; OK, so my one question with this is why we need to specify our own serialization format for logs. Maybe it'd be nice to have a &quot;MediaWiki serialization format&quot;, but at the same time our logging system should be open to whatever format the sysadmin wants to output into. 22:47:24 &lt;RoanKattouw&gt; ori-l: This looks sweet 22:47:32 &lt;ori-l&gt; RoanKattouw: it's bd808's! 22:47:35 &lt;gwicke&gt; bd808, https://www.mediawiki.org/wiki/Talk:Requests_for_comment/Structured_logging#We_are_considering_a_similar_approach_for_Parsoid_36348 22:47:48 &lt;ori-l&gt; parent5446: I mostly agree, but JSON also constrains the type of data you can emit 22:47:49 &lt;RoanKattouw&gt; Would the recorded fields like vhost and ip be extensible? On the WMF cluster I'd like to add XFF, for instance 22:47:58 &lt;TimStarling&gt; would this have multiple backends? structured and plain text? 22:48:02 &lt;RoanKattouw&gt; (Had to hack that up manually not to long ago to debug 127.0.0.1 problems) 22:48:09 &lt;paravoid&gt; +1 to a modular approach 22:48:13 &lt;parent5446&gt; That's why I proposed we use something like monolog. 22:48:15 &lt;bd808&gt; RoanKattouw: yes. It should be extensible 22:48:27 &lt;parent5446&gt; It allows us to add our JSON format, while also supporting literally everything else. 22:48:34 &lt;bd808&gt; I would actually support monolog as well 22:48:40 &lt;MaxSem&gt; &lt;3 the greppability of plaintext 22:48:40 &lt;ori-l&gt; TimStarling: you could have a PlainTextLogEmitter that munges the array into something human-readable 22:48:47 &lt;TimStarling&gt; ori-l: yeah 22:49:01 &lt;ori-l&gt; a la getTraceAsString 22:49:18 &lt;parent5446&gt; Actually, monolog already has a JsonFormatter. 22:49:27 &lt;TimStarling&gt; regarding &quot;Live exception object to be stringified by the log event emitter&quot; 22:49:30 &lt;parent5446&gt; We'd just need to use a Processor to put in the data we want 22:49:31 &lt;gwicke&gt; it is pretty simple to write a json grepper I guess 22:49:33 &lt;bd808&gt; The important part is keeping the log records structured internally until the emitter is reached 22:49:46 &lt;paravoid&gt; gwicke: jq 22:49:47 &lt;TimStarling&gt; do you mean Exception::__toString or something else? 22:50:01 &lt;paravoid&gt; gwicke: https://github.com/stedolan/jq 22:50:10 &lt;ori-l&gt; presumably the exception object itself 22:50:11 &lt;gwicke&gt; paravoid, oh, nice 22:50:14 &lt;paravoid&gt; (sorry, not relevant to this discussion) 22:50:33 &lt;bd808&gt; TimStarling: It's an implementation detail. Ideally the formatting of the exception would be left up to the output formatter 22:50:58 &lt;ori-l&gt; the thing that I wanted to flag actually is that we have two subsystems that half-implement the concept of pluggable logging backends 22:51:17 &lt;TimStarling&gt; you mean json_encode($exception)? 22:51:22 &lt;TimStarling&gt; I'm not sure that would work 22:51:31 &lt;ori-l&gt; TimStarling: we already have json-encoded exceptions in core 22:51:35 &lt;TimStarling&gt; some exception objects will have references to massive parents 22:51:41 &lt;paravoid&gt; forgive me for the naive question, is this the discussion for the logging format (plain, json, ...), the transport (udp2log, syslog, gelf, ...), or both? 22:51:56 &lt;ori-l&gt; TimStarling: see exception-json.log on fluorine :P 22:52:06 &lt;TimStarling&gt; I'll file a bug 22:52:15 &lt;ori-l&gt; TimStarling: we redact those from the JSON log 22:52:17 &lt;parent5446&gt; paravoid: the RFC focuses on format, but ideally we should replace our entire logging system 22:52:31 &lt;ori-l&gt; I'm not sure a bug is warranted 22:52:40 &lt;ori-l&gt; but anyways, to finish my point: there's wfDebugLog &amp; co., which recognize udp://, tcp://, and file paths 22:52:40 &lt;bd808&gt; Here's an example of monolog logging an exception: http://pastebin.de/37759 22:52:45 &lt;TimStarling&gt; I assumed it would be 0mq 22:52:49 &lt;TimStarling&gt; since it is ori-l writing it 22:52:52 &lt;parent5446&gt; (Also, I know I'm evangelizing monolog here, but it also cooperates with exception workflow.) 22:52:56 &lt;parent5446&gt; :P 22:53:05 &lt;ori-l&gt; and there's the recent change stream implementation that vvv wrote 22:53:37 &lt;ori-l&gt; the latter lets you specify an emitter class 22:53:38 &lt;gwicke&gt; we should also consider logs from non-PHP services 22:53:45 &lt;ori-l&gt; i wrote a redis one as a way of trying out the API, it's in core too 22:53:52 &lt;ori-l&gt; we should consolidate all of these, obviously 22:53:58 &lt;TimStarling&gt; UDP is sucky lazy rubbish 22:54:05 &lt;AaronSchulz&gt; heh 22:54:10 &lt;ori-l&gt; and make recent changes be a special case of logging 22:54:16 &lt;TimStarling&gt; asynchronous messaging on the cheap 22:54:18 &lt;bd808&gt; gwicke: Unifying across languages would be nice. 22:54:20 &lt;gwicke&gt; if we can agree on a standard set of keys for stuff like host name etc, then those can directly tie into the same infrastructure 22:54:31 &lt;TimStarling&gt; if you have an asynchronous messaging system that isn't prone to losing its messages, why not use it? 22:54:31 &lt;ori-l&gt; cf rcfeeds/RedisPubSubFeedEngine.php for an example 22:54:42 &lt;TimStarling&gt; syslog is ridiculously old and crusty and limited 22:54:57 &lt;TimStarling&gt; like 1024 byte packet limit, and integer facility fields 22:55:01 &lt;parent5446&gt; ori-l: monolog also has a Redis handler 22:55:03 &lt;ori-l&gt; TimStarling: I agree, but I think this is the uninteresting part of the problem 22:55:14 &lt;paravoid&gt; I think we need to split those two discussions 22:55:18 &lt;ori-l&gt; if you have pluggable backends people who love UDP can use UDP 22:55:22 &lt;paravoid&gt; it can be the same RFC 22:55:38 &lt;paravoid&gt; but split the parts of &quot;which format&quot; from &quot;which transport&quot; 22:55:40 &lt;TimStarling&gt; you know that I have mostly driven the adoption of UDP at WMF 22:55:47 &lt;TimStarling&gt; that is because I am lazy and cheap 22:56:17 &lt;TimStarling&gt; and because the queueing options at the time I started were not as good as they are now 22:56:28 &lt;ori-l&gt; we won't use UDP 22:56:49 &lt;paravoid&gt; we could use AMQP, or 0mq, or even Kafka. 22:57:03 &lt;ori-l&gt; can we rely on URL prefixes for dispatcher configuration? 22:57:13 &lt;paravoid&gt; but first agree on the format? :) 22:57:17 &lt;ori-l&gt; this would be consistent with wfDebugLog, the PHP stream API 22:57:29 &lt;ori-l&gt; and partly with the existing RC implementation 22:57:49 &lt;TimStarling&gt; ori-l: yeah, should work 22:57:51 &lt;ori-l&gt; i.e.: $wgLogHandlers[] = &quot;zmq://foo/topic&quot; 22:58:02 &lt;gwicke&gt; is everybody on board with the choice of JSON? 22:58:05 &lt;parent5446&gt; Rather than discussing WMF-specific logging implementations, we should first establish how we'd incorporate a structured logging system. 22:58:12 &lt;parent5446&gt; Where would the loggers go? 22:58:14 &lt;parent5446&gt; In ContextSource? 22:58:29 &lt;TimStarling&gt; gwicke: no, I am in favour of dual logging of JSON and plain text 22:58:31 &lt;parent5446&gt; Whether it's JSON or whatever comes after we have the modular system in place. 22:58:47 &lt;paravoid&gt; unstructured json? 22:58:52 &lt;gwicke&gt; TimStarling: it seems to be easy enough to convert JSON to plain 22:59:00 &lt;ori-l&gt; I propose we limit ourselves to the set of types available in JSON 22:59:03 &lt;paravoid&gt; or an existing structure, like gelf? 22:59:09 &lt;ori-l&gt; but that we make the actual serialization format configurable 22:59:12 &lt;bd808&gt; I would suggest a global logger factory. It could be a singleton or accessed via some convenient god object 22:59:19 &lt;parent5446&gt; ori-l: Agreed with this idea. It makes modular dispatching easier. 22:59:30 &lt;ori-l&gt; actually, maybe that's not a good idea 22:59:44 &lt;ori-l&gt; maybe you should just pass off to the serializer the richest possible objects you have 22:59:49 &lt;TimStarling&gt; #action ori-l to expand RFC 22:59:50 &lt;gwicke&gt; I'd be in favor of standardizing on something 22:59:53 &lt;paravoid&gt; hehe 22:59:56 &lt;paravoid&gt; gwicke: http://www.graylog2.org/gelf#specs ? 22:59:58 &lt;AaronSchulz&gt; ;) 23:00:07 &lt;paravoid&gt; gwicke: and https://github.com/robertkowalski/gelf-node I guess ;) 23:00:18 &lt;gwicke&gt; paravoid, we are considering https://github.com/trentm/node-bunyan 23:00:23 &lt;gwicke&gt; has a gelf backend too it seems 23:00:41 &lt;gwicke&gt; https://github.com/mhart/gelf-stream 23:00:43 &lt;ori-l&gt; if you have the proper abstractions in place implementing backends is trivial, right? 23:00:50 &lt;TimStarling&gt; #info JSON generally favoured as long as a plain text format can be also made available 23:01:03 &lt;ori-l&gt; i mean, log messages are messages, and message queues tend to provide good APIs for queueing messages 23:01:21 &lt;TimStarling&gt; #info transport selection based on URI-style destination string 23:01:29 &lt;ori-l&gt; weeee 23:02:18 &lt;parent5446&gt; URI-based selection might not be the best idea. What if you want a logger to only log certain levels, i.e., warnings or errors? 23:02:23 &lt;ori-l&gt; TimStarling: maybe as a final action-item, agree to the consolidation of RC logging with logging in general? 23:02:33 &lt;TimStarling&gt; &lt;parent5446&gt; Where would the loggers go? 23:02:33 &lt;TimStarling&gt; &lt;parent5446&gt; In ContextSource? 23:02:41 &lt;TimStarling&gt; parent5446: I suggest you comment on the RFC talk page 23:02:59 &lt;parent5446&gt; OK, will do that now. 23:02:59 &lt;ori-l&gt; parent5446: zmq://dest.eqiad.wmnet?loglevel=warn 23:03:03 &lt;bd808&gt; &lt;   bd808&gt;     I would suggest a global logger factory. It could be a singleton or accessed via some convenient god object 23:03:12 &lt;parent5446&gt; ori-l: Ah that works I guess 23:03:19 &lt;TimStarling&gt; ori-l: put it on the RFC 23:03:23 &lt;aude&gt; ori-l: i think RC is a separate consideration, maybe worth own rfc 23:03:38 * aude at least needs more details 23:03:42 &lt;subbu&gt; i've used log4j in other contexts which has notions of formatter, target (file, socket, console, etc.) and log-level (warn, info, debug, etc.) which can all be configured. this proposal seems similar by separating those concerns. 23:03:59 &lt;gwicke&gt; parent5446, re levels: I think that is both a source and consumer concern; the source selects the min level to send, while the consumer can further filter based on the level in the message 23:04:04 &lt;TimStarling&gt; ok, we are out of time now, please put your ideas on the RFC talk page if possible 23:04:10 * ori-l nods 23:04:28 &lt;paravoid&gt; thank you TimStarling for chairing. 23:04:30 &lt;bd808&gt; Thanks for all the great feedback 23:04:36 * subbu paged into the window rather late .. 23:04:45 * subbu will read scrollback and post on talk page 23:04:48 &lt;ori-l&gt; TimStarling: what bug were you going to file? 23:05:10 &lt;TimStarling&gt; #endmeeting