Developer Wishlist/2017/Backend

Kill wgLegacyEncoding
Since version 1.5, MediaWiki stores all data in Unicode. Before that, the enocding was configurable; $wgLegacyEncoding allows current MediaWiki to work with old non-Unicode database values. This is one of our oldest pieces of technical debt. Retire it and automatically convert database rows on upgrade instead.

Structured data side channel for wikitext
The problem of passing structured data from wikitext to external applications comes up in a wide variety of contexts, and a garden of ugly workarounds has grown around it, usually consisting of encoding the data in the HTML rendered from wikitext in some way, then external applications parsing it out and restoring the structure. Examples include #CommonsMetadata, the various services (#mcs, all kinds of Tool Labs tools) exposing mainpage/featured content (article/picture of the day, anniversaries, in the news etc), article maintenance / warning templates, infoboxes, using Wiktionary for word translation.

Eventually these issues should be handled by separating wikitext and structured data (e.g. with T107595: &#91;RFC&#93; Multi-Content Revisions) but that's a huge project and will take a while. A quick win that would be possible right now and would make the life of developers mining structured from wikitext (and editors maintaining the wikitext) would be to create a side channel where wikitext code can output structured data (with a dedicated parserfunction Lua method), in a simple hierarchic key-value format. The data could exposed by the parser and the parse API, and eventually morph into a virtual MCR slot.

Showcase how the separation of concerns should work between MediaWiki API and web
MediaWiki API modules and special pages contain lots of business logic, often duplicated between the two in similar-but-not-quite-identical ways. The business logic in these pages also tends to be inaccessible internally (so MediaWiki code that wants to access the functionality does horrible things like instantiating a  object or making   calls to the API). Everyone agrees the current situation sucks; no one seems to be sure how exactly the right way would look like, so newly written code does not necessarily end up in better shape.

We should pick some special pages and API modules (probably two of each since the answer will look very differently for something that does paged queries and for everything else), refactor them and turn them into a showcase that can be used as a guidance for future work.

Support (T156872)

 * 1) This, that and the other (talk) 07:54, 6 February 2017 (UTC)

Problem
There seems to be no way to reset all caches programmatically. One example is the Resource Loader minification cache. In extension development and when updating mediawiki installations, I ran into issues with improper cache invalidation. A complete manual reset of all caches helped, yet a script would be very helpful.

Who would benefit
Developers and MediaWiki maintainers

Proposed solution
Create a maintenance script which invalidates all caches. Preferrably with the option to selectively invalidate some types of caches, e.g. Resource Loadere minification.

Endorsements (T156695)

 * 1) This is annoying. I often find myself doing TRUNCATE TABLE objectcache; (since I use CACHE_DB). Sometimes I only want to get rid of some part of the cache (like only ResourceLoader caches) and the TRUNCATE is a nuclear option, but other alternatives are very tedious. This, that and the other (talk) 07:56, 6 February 2017 (UTC)

Support (T156695)

 * 1) Shizhao (talk) 07:00, 6 February 2017 (UTC)

Problem
When we edit Lua or JavaScript code, we don't see what we have change, and many times that needs a big memory concentration.

Who would benefit
All Lua or JavaScript coders.

Proposed solution
When we edit Lua or JavaScript code, highlight differences like in Gerrit ( - + parts of line). Do this when we explicitly ask differences, but also as option in any change from the previous version.

Description of the task
The way used in gerrit to display changes in source code seems better than some others. In code panels, for old and new codes, gerrit colorises them in red and green for each changed character.

Then we could use it for other source codes: javascript for gadgets and user scripts, Lua modules, Lisp for templates.

Some chooses to select
guerrit uses 2 ways, which mode use? In the mixing way, gerrit mixes the old and the new code lines, identify them with - and + signs, and hightlights them with red and green colors for each character. In the columns way, gerrit put the old code in left and the new code in right column, and hightlights them with red and green colors for each character.

This styles also interfere with the usual hightlight of each code.

Where to use these display ways? Perhaps also in wikitext for wikicontent? Perhaps also when we display a revision-diff = "Difference between revisions"? Even in VisualEditor?

When to use these display ways? Each time when we edit a script or a module or a template and then clic on "show changes". Also when we clic "Preview" to see the effect on a test page? Begin to offer these ways for codes panels, and later extend them to other places?

When to activate these ways and modes? When the user chooses these ways for each code panel? When the user chooses them in his preferences?

Support (T156048)

 * 1) David1010 (talk) 06:43, 6 February 2017 (UTC)
 * 2) Shizhao (talk) 07:00, 6 February 2017 (UTC)
 * 3) This, that and the other (talk) 07:56, 6 February 2017 (UTC)

Problem
The PSR3 logging interface has been introduced in MediaWiki to support structured logging, but no coordinated effort has been made to deprecate the use of,  ,  , and. Several bugs are open in the #mediawiki-debug-logger project about the lack of parity between debug log usability on the Wikimedia Foundation production cluster and a typical development environment or external deployment of MediaWiki that are directly related to bd808 taking the structured logging project to a point where it is useful for the WMF but not pushing that usability further for other MediaWiki deployments.

Who would benefit

 * MediaWiki site operators who want better insight into their operational issues
 * MediaWiki developers who don't want to think about choosing between two largely compatible but very different debug logging layers

Proposed solution

 * Replace all usage of  in MediaWiki core with direct PSR3 usage.
 * Add Monolog as a core dependency and the default debug logging solution.
 * Make configuring Monolog easier by making helpers in the  namespace.
 * Remove  from core. (It could be made a library if there are people who really love it and want to keep maintaining a homegrown debug log formatting and routing layer.)
 * Deprecate,  ,  , and.

[Task] Add Lua function to get Wikibase entity by site link (title)
Please add a Lua function similar to mw.wikibase.getEntityObject for getting an entity by site link (title), just like Special:ItemByTitle or the wbgetentities API module with titles parameter.

On cswiki articles translated from other language wikis are marked by Translated template (with source wiki, article and revision parameters filled in). The template could detect (using Module:Wikidata), whether the article is connected with that source article and categorize it if not. But if that source article is connected with a different article, it is false positive there (could be categorized too, but in a different maintenance category). I can not single out these false positives, because I can not find out (using Lua function in Module:Wikidata) the Q-id for a given wiki:page.

On enwiki in Module:Wikidata talk page is another request from czar: QID lookup from enwp article title.

Support (T74815)

 * 1) Info-farmer (talk) 05:22, 6 February 2017 (UTC)

Allow excluding soft redirected categories on Special:UnusedCategories
Special:UnusedCategories is flooded by tons of soft-redirected categories. But purpose of this special page was to show real unused categories, not all unused ”aliases” aka redirecting categories. It's necessary to define a special magic word which can be placed in ”Template:Category redirect” and will prevent categories containing it from showing up in Special:UnusedCategories. Such a thing is really necessary, because some wikipedias have over 15,000 redirected categories.

A similar technology is used for Special:DisambiguationPages where are listed all pages which contains the tag `__DISAMBIG__.

ApiQueryImageInfo is crufty, needs rewrite
The code is a mess, the limit semantics make no sense, and we have several other options that don't really fit non-images.

The best thing to do here is probably to just write a prop=fileinfo module from scratch so we don't have to worry about backwards compatibility, and then deprecate prop=imageinfo.

Current plans:


 * Right now, iilimit specifies the max number of revisions to return per file, which is inconsistent with the rest of the API and isn't particularly sane. For fileinfo, filimits will limit the number of file-info-objects returned per result, and a separate "fioldversions" property (default 0, values integers or 'all') will specify the max number of revisions to be returned per file.
 * fistart/fiend may result in the info for the current revision not being returned.
 * iiprops has three different metadata properties. There really should be only one, and if possible it should be key-value pairs rather than a list of objects with key and value properties.
 * Metadata needs to be separately continuable, see T86611: API does not fail gracefully when data is too large.
 * Figure out something sane to replace iiurlwidth/iiurlheight/iiurlparam. Maybe multi-valued fiparams?
 * prop=stashimageinfo is very odd, it's a prop module but doesn't use any titles. It would make sense to me for prop=fileinfo to have a fifilekeys parameter instead of having a whole separate module for this.
 * prop=videoinfo really isn't needed either. Instead we should make it possible for extensions to add additional info to the fileinfo response.

Add global logging context
Certain kinds of information would be useful to have in the log context, but not possible or convenient to add manually. When such information is available from the log processing code, we use a Monolog processor to add it (e.g. IP, URL) but that's not always possible. For example, logging the canonical special page name would be handy but that's controller-level information and the log processing code has no access to the controller (and if it is logging an exception, the controller might be in some unexpected state).

The straightforward solution is to have some globally available temporary storage which application code can write into and log processing code can read from. (log4j calls this a diagnostic context.)

API's list=recentchanges should have rcrelated parameter (provide Special:RelatedChanges/Special:RecentChangesLinked functionality via API)
... to implement something akin to Special:RelatedChanges.