Talk:Flow/Architecture

2013-03-04 better Flow API
There are various benefits we get from a REST api, the biggest being that individual resources are much easier to manage cache control headers on. The current API responses join together disparate bits of data into a response that we cannot reasonably tell varnish to cache, invalidation is just too difficult. A RESTfull resource-based API breaks things up into smaller pieces and multiple requests, but modern protocols like SPDY and the coming HTTP/2.0 specifically handle this concept of making multiple smaller requests rather than a single large request. The reason for nodejs is partially tied to those smaller requests, MediaWiki has a fairly large startup time. If we were to issue one request that returns a list of topics on a page and then 5 requests for each of the new topics, then MediaWiki would potentially pay the startup cost 6 times, rather than nodejs which has almost no request initialization delay.

The other half of this is related to the SOA RFC (approved with overwhelming support at the architecture summit). ... Part of the implication of a service-oriented approach is that a service returns data suitable for a computer to interact with, rather than HTML which is more suitable for human interaction. Its unwritten but expected that a service should return structured metadata rather than user-facing HTML. Any kind of user interface, be it in a native mobile app, or the MediaWiki extension rendering a page, is built on top of the API responses.

-- Erik Bernhardson e-mail
 * Does nodejs really have no startup time? Or is it that Parsoid etc were written to be continuously-running daemons while MediaWiki's PHP currently isn't, and similar results could have been realized by writing a service daemon in PHP while making better use of the existing codebase? Anomie (talk) 13:57, 7 March 2014 (UTC)
 * Actually nodejs has worse startup time than php :) Thats part of why i called it request initialization for nodejs.  Its specifically about the daemon nature where much of the application can be loaded and booted ahead of time rather than per request.  Part of the implication there is that the application needs to be mostly stateless such that multiple requests can run through the same library routines.  Much of the work done passing IContextSource around rather than globals is probably a big help in this direction.  I've heard of a couple attempts to daemonize php, but not much concrete so this is fairly unexplored  territory. EBernhardson (WMF) (talk) 20:37, 7 March 2014 (UTC)

2013-03-04 conversation with Gabriel Wicke
This is follow-on from Flow providing a better API for mobile apps.

Parsoid HTML doesn't match PHP parser HTML
Image HTML is different, Video HTML is different and links are broken , Interlanguage links are losing the :colon prefix when saved , etc.

Regarding images, Parsoid HTML is intentionally cleaner and better. gwicke wants it to render like parser so that eventually Parsoid can show HTML output for visual fidelity comparisons. Flow should work with VE team to get the CSS necessary to do it moved out of VE.

BadImageList handling is integrated into the PHP wikitext parser. Parser.php's replaceInternalLinks2 calls wfIsBadImage.

Parsoid and link metadata
Parsoid will soon provide redlink info, will output more metadata about links, categories, etc. This overlaps with the parsing of links and templates that Flow has to do for WhatLinksHere, Special:LinkSearch, and Category support; see Flow/Link table spec, Flow should take advantage of the information Parsoid can provide.

Parsoid will update this information in its cache as articles change, and maybe regenerate HTML, e.g. if an image changes size. Flow could take advantage of this for some content, but maybe not all – e.g. templates in Flow header should update, but not (?) templates in Flow posts.

Parsoid caches its HTML in Varnish, Rashomon will be a permanent store of it.

Leveraging Rashomon storage
Flow could store the HTML of its "items" in Rashomon instead of ExternalStore. A Flow board is the wrong level to cache at, we would store either topics or individual posts.... which relates to whether the Flow API is at the level of operations on topics or at the post level. Leaning towards the former.

Rashomon knows about items via a page-like name, but a post reference like Talk:Sandbox?topic_postId=rqa8q1vz478mmrtu&workflow=rp7to9rygxadbohw doesn't qualify. The Flow team has talked about providing direct access to items at Special:Flow/post/ , Special:Flow/topic/ , Special:Flow/header/ , but the semantics of Special pages mean these would not be treated like pages without some work.

Why not allow access to Flow topics at a dedicated namespace? That's what LiquidThreads does, Flow team considered it early on, but hasn't pursued it.

Templating
Background: Requests for comment/HTML templating library.
 * Flow needs a templating solution soon for Shahyar's front-end rework, see
 * In order for Flow's API to return more sane information as well/instead of full client HTML (with reply/edit/hide/permalink links, and textareas for new posts), we really want server-side templating as well

An implementation of a subset of KnockoutJS is a strong contender, see Requests for comment/HTML templating library/Knockout - Tassembly The subset of Knockout already works with Knockout.js on the client. Matt Walker is working on a PHP implementation.

Will this be ready in a month for us to use for Shahyar's Flow front-end rework?

There isn't really an alternative with both JS and PHP implementation likely to get WMF support.

gwicke suggests: rather than templating in PHP, try using data attributes (?) and have CSS pull out the information.

Later talk to mwalker
The PHP implementation of our KnockoutJS subset (knockoff? dropout? :) ) isn't ready, but he still thinks using KnockoutJS for Flow front-end templating would be the best choice.

Next steps

 * Flow API: talk again in a week
 * Shahyar may try KnockoutJS in Flow front-end rework.