Reading/Multimedia/Architecture/Tech Debt Backlog

Loosely prioritized list of technical debt projects related to the Multimedia team.

Backlog

 * 1) Chunked upload love:
 * 2) * Stop using session data
 * 3) * More reliable job queue
 * 4) * Fix open bugs or know why we can't
 * 5) Thumbnail changes:
 * 6) * versioned URLs to help stop cache problems
 * 7) * varnish cache changes so we don't need to keep list of names
 * 8) * store generated assets differently to reduce replica clutter
 * 9) * more robust monitoring and logging of failures in generation
 * 10) Improve large file operations:
 * 11) * allow rename without copy
 * 12) * reduce lock contention
 * 13) * examine queued operations possibility for things we can't make faster
 * 14) Improve SVG rendering:
 * 15) * Lots of SVG bugs in Bugzilla
 * 16) * Support for multilingual SVGs
 * 17) * Make sure rsvg and fonts are up to date
 * 18) * Consider adding more fonts for rendering support (possibly including non-free fonts)
 * 19) UploadWizard improvements:
 * 20) * Finish up map support and deploy it
 * 21) * Finish up drag and drop support and deploy it

2013-08-23 Grooming Meeting

 * User:Bawolff, User:BDavis_(WMF), User:MarkTraceur, User:Fabrice_Florin_(WMF)
 * Discussed high level issues
 * Helped Fabrice understand terms "Varnish" and "Squid" and caching in general
 * Discussed difference between a feature and tech debt
 * User:Bawolff gave his top item list including relative priority:
 * Chunked upload. Session and general more reliability
 * Upload class (more extensible. The chunked uploading case is very ugly)
 * deadlocks and long held locks in filebackend / file class when moving/deleting/uploading file.
 * current image thumbnail should have timestamp in it for better caching clear
 * Partial failures in filebackend (If things fail, either nothing should happen or everything should happen. Sometimes pages get deleted but not the file, or vice veras)
 * Various swift layer improvements/thumbnailing cache (making thumbs cached at a different layer, maybe. Making content at an sha1 place and pretty urls only to user)

Copied from Multimedia/Architecture
Copied list and discussion from prior home on Multimedia/Architecture.


 * API upload is all kind of bad. everything under the directory /upload would be nice to rewrite.
 * though please be sure to test well with Upload Wizard and other API consumers, it's very easy to accidentally break edge cases (e.g. blacklist failures, duplicate checks, yadda yadda yadda) -- User:Eloquence
 * backlink invalidation in commons would be nice to fix (22390)
 * Redirect of old filename on file move would be nice. (for hotlinkers like the wmf blog, and due to lack of purging cache of commons client wikis)
 * this is hard to implement as a general MediaWiki feature without changing to run files through PHP by default, but could be done with our special file backends... -- User:Brion_VIBBER
 * Our use case is probably the most important. I don't think third parties use file redirects that much. Patch https://gerrit.wikimedia.org/r/#/c/80135/ -- User:Bawolff
 * versioned urls for thumbnails
 * +1 (or any other solution to outdated thumbnails getting cached) -- User:Kaldari
 * large file support in general - deadlocks abound
 * way FileRepo stores files is bad in general...requires pessimistic in general (see Requests for comment/Refactor on File-FileRepo-MediaHandler)
 * Why chunked upload should be fixed:
 * Upload actions that go through the job queue should be more reliable.
 * Suggest that long term stop using session data for this (people log in, log out, etc).
 * possible several pieces here: fix job queue to be sane, fix upload jobs to not (ab?)use session storage, and ? -- User:Brion_VIBBER
 * HTCP purges need active monitoring - DONE

Faidon via email; 2013-08-27
"An additional point I'd like to raise is technical debt surrounding the image scaler platform that I don't see mentioned there at all. The current way we do image scaling is crude and error-prone. See for example BZ #49118 but there are quite a few other short-comings arising from how the whole platform works (fire-and-forget launching of shell scripts). I think a designed-from-scratch service that spawns well-contained & well-monitored processes, failing gracefully and logging errors is long overdue but not unreasonably difficult to build, and I think it would fit well under the multimedia team's agenda."

IRC chat 2013-08-27
12:04	Aaron|home	I was thinking about what things are the most actionable 12:04	Aaron|home	bd808: bug 53400 might be an OK start 12:05	Aaron|home	basically, writeToDatabase should at least use onTransactionIdle and put the REPLACE in a callback/closure there 12:07	Aaron|home	bd808: in terms of RfC'ish stuff, the thumbnail coalescing isn't too bad of a place to dig into 12:07	Aaron|home	the amount of code change needed wouldn't be that huge, though some of it would be some small varnish module code 12:08	Aaron|home	bd808: the whole issue of large uploads bothers me because it relates to a bunch of problems that are hard to fix without rewriting everything (or horribly hacking around with job queue + persistent locks) 12:08	Aaron|home	we can make large uploads work better for the first stage of the pipeline (upload) though re-upload, move, delete, restore will still suck horribly 12:09	bd808	I'm not 100% sure, but I think roblaAWAY is open to major rewrite type projects 12:09	Aaron|home	that said, if videos tend to just be uploaded once and not changed, and it's badly wanted, it could be worth it I suppose 12:10	Aaron|home	well, there are different levels of "huge rewrites" ;) 12:11	Aaron|home	bd808: I think for someone new to MW, the thumbnail thing is a better place to get started rather than going down that rabbit whole just yet (which still scares me after all these years) 12:12	bd808	*listens to sage advice* 12:12	Aaron|home	of course, if the priority for the quarter was already decided, I guess you don't have much choice though ;) 12:12	paravoid	thumbnail coalescing? 12:13	Aaron|home	paravoid: whatever you call, fudging vcl_hash to group them for PURGEs 12:13	Aaron|home	maybe not using swift/ceph anymore for this and not having 7 copies of everything 12:13	paravoid	ah, that 12:13	paravoid	so, I was looking a bit at that in the past 12:13	bd808	I think the real priority at this point is "do things to make multimedia less sucky" 12:13	paravoid	remember the linear search issue? 12:14	Aaron|home	yes 12:14	bd808	but smoothing problems in the upload path seems to be a recurring theme 12:14	paravoid	so Tim was saying that he didn't expect this to be a huge problem, how many thumbs can a file have 12:14	paravoid	then you know what I pointed out? 12:14	paravoid	PDF and multi-page TIFFs 12:14	paravoid	1000-page PDF with 3-4 thumb sizes 12:15	paravoid	that's not uncommon at all 12:15	ori-l	heh 12:15	paravoid	there's a few wikis that use that a lot 12:15	Aaron|home	our djvu/pdf handling sucks too 12:15	Aaron|home	bd808: oh, wait, I told you that already 12:15	paravoid	arwikisource I think? 12:15	Aaron|home	like loading the whole text is metadata and slowing down category views 12:15	Aaron|home	only fixed the OOM aspect of that 12:15	ori-l	questions of the grammatical form "how many ___ could possibly ___..." are prayers to sauron 12:16	paravoid	but the solution could be handling pdf/tiff/djvu entirely differently 12:16	Aaron|home	paravoid: if nothing else, one could except by file extension and use the old system for those 12:16	paravoid	right 12:16	paravoid	heh 12:16	Aaron|home	and fix that crap later 12:16	paravoid	but yeah, this needs to be done with care 12:16	Aaron|home	we don't want to get caught in the spiderweb of having to redoing everything though, but breaks things into bits 12:17	paravoid	I'll leave that up to the people actually doing the work :) 12:17	paravoid	I'm merely pointing out the issue 12:17	Aaron|home	paravoid: sure 12:18	paravoid	but yeah, not having to store millions of tiny thumb files into media storage would be hugely appreciated 12:19	bd808	I'm of the naive opinion that "we" need to document the use cases and acceptance tests, evaluate current impl and design next-gen solution. 12:19	bd808	Then we need to figure out how to build that solution in smallish chunks 12:19	bd808	but I'm also talking out of my ass as to the specifics

Resources
People to talk to about things and stuff:
 * Aaron -- all things cache related
 * User:Bawolff -- all the MM things
 * Brad -- API
 * Faidon -- file storage infrastructure, imagescalers, operational/systems aspects in general