User:Dantman/Code Ideas

From mediawiki.org

Just a random spot for code ideas I feel like working on and may or may not get around to:

  • Write a TorBlock like extension for http://stopforumspam.com/ - Extension:StopForumSpam
    • Keep in mind dnsbl.tornevall.org should also be used to lessen load.
  • Consider implementing an alternate form of dnsbl which like TorBlock can be enabled, only block people selectively, and can also be enabled initially with blocking features disabled so you can look at rc tags to understand what set of edits will actually be blocked before enabling full block.
  • Help improve Extension:MediaWikiAuth.
  • Try adding a degree of mobile support to MediaWiki. ie: Try to get as much of the Wikimedia mobile stuff working in core, without requiring a full extra ruby server.
  • Add iPad support to monaco-port (issue: requires a donated iPad).
  • Write a querying framework for MediaWiki?
    • A lot of our core and extension code writes a lot of ugly sql queries and table joins, which are dependent on table structure. Which of course gets in the way of any large changes to the db because piles of simple extensions now depend heavily on the database structure.
  • Write an abstract api for &action='s and article types.
    • Right now we make various assumptions when it comes to actions. We usually just check $wgRequest->getText('action', 'view');. However this of course has oddities with things like view/purge/empty and edit/submit. It's not a very good way to do things, and then there are diffs. So it would be good to have an abstract api to declare what type of page is being displayed and allow skinning to call it so it knows how to style the page.
  • Write a diff/rdiff based dump format for page text. An exporter and importer, so that hosts that use external storage can dump text as well as a database dump in a way that the customer they are exporting for can actually import it somewhere else.
    • The format would probably be an xml file containing elements with diffs for contents (to be more space efficient) and an attribute marking what text id they were associated with in the db (and whatever extra fields were necessary).
    • The import script would be run after you do a database import, it would run over the text dump finding revisions using the text id importing the text into whatever text/external storage is configured for the wiki (by default the text table) and updating the revision rows to use the new text id.
  • Write a built-in job queue based replacement extension, supporting patterns, and selective replacement.
    • pywikipedia is too complex for most people, it's also annoying to setup a client side bot, it would also be easier if you just gave the wiki a list of replacements, it found all the things it would replace, and gave you a quick way to peruse the diffs in an easy to read webpage so you could give your ok.
    • ReplaceText is too simple, no complex patterns. Additionally it does ugly things with SQL in a MediaWiki bad practice which makes it incompatible with any method of trying to efficiently store a wiki's content (ie: breaks on compression or use of external storage).
    • Plans on how this could work:
      • Submit a task. Including some sort of modification; Regexp replacement. Append/prepend text. Simple string replacment. Perhaps even a Lua script if Scribunto is installed. As well as a page filter (namespace, title patterns, categories, links to, transcludes, etc...)
      • The extension runs a job over all pages (Use the same batching self-spawning job technique SMW uses). Of for the case of some filters, over a query in some way (maybe we join on the other tables we need and order by page_id and handle page_id similarly to how SMW does it). For each page the modification is run. If the modification actually changes something the page and a delta are added to a special table (For pages that don't match we may want to increment a "Skipped pages" counter for the task. Or insert as normal but use a flag for skipped=bool).
      • The special page for the job lists the page deltas that have been mapped so far.
      • On the special page deltas can be confirmed individually, in batches, and also (less preferably) the job can be told to confirm every delta including ones for pages that haven't been scanned yet.
      • When deltas are confirmed a separate job is spawned (or rather we make sure that self-spawning job has been spawned if it isn't running) which makes the actual page edits in batches.
        • The delta knows what the latest revision at the time of running was, if the revision has changed the mod is re-run an unconfirmed replacement delta is made and a note about being editconflicted is made (perhaps we'll use the old delta as the editconflicted marker and then insert a new unconfirmed delta).
        • After an edit is made the delta is deleted from the table to clean things up.
        • Edits are made under the username of the user that initiated them.
        • If an auth system is introduced these edits can be put under an internally implemented application.
        • The edit summary can be defined when the task is created or when you start confirming edits. Whatever edit summary used (custom or automatic) it'll be prefixed with a link to the task.
        • If a "Mark edits as bot" checkbox is checked (probably on by default) all the edits will be marked as bot. We can probably make this bot functionality available to everyone who can make batch edits. We'll simply make a log entry saying that a user has made a mass replacement marked as bot on ~<x> pages on the wiki (Actually we might as well make bot always-on).
      • Naturally the special page will have a button to abort a task.
      • Sometimes you get a job slightly wrong. So for that case the special page should have an edit form. Using the edit form should cause the current task to be aborted and a new one to be submitted in it's place.
      • When echo is implemented the extension should send notifications for completion of the table scanning job, completion of the page replacement job, and edit conflicts.
  • A formal extension definition file. Right now there is no up front specification of what it takes to setup an extension, a file explaining this (and doing it better than just saying to run update.php or some install.php script) would be useful for future and my own wiki farm experiments. This should detail the steps for setting up a database and whatnot in a way that update.php or an extension specific script must be run. At the very least the tasks should be done in an abstract enough way they can be imported into some other specialized piece of code specific to the installer that knows how best to do things. Should really be abstract enough that even a wikimedia level pattern of installation can be started automatically from the extension definition without a bunch of manual sql work.
  • A anti-spam network daemon (lisp?). Various wiki from different groups tie in to a daemon and submit hashes of content that is being created on the wiki. When the daemon sees the same content being added to a bunch of different wiki it starts to warn other wiki to be more strict about allowing that data (presumably spam) to be posted. Via requiring captchas, requiring autoconfirmed, some other method, or just flat out saying "Hey, this looks like spam going to a bunch of wikis, you'll have to come back later to post it."
  • Implement support for Solr as a search engine in an extension. Solr supports query-suggest now, shards, and updating on the fly. It's not as complex to setup as lucene-search, or at least it's a standard daemon built by a project focused on that. Solr could be the best alternative to Lucene-search for general use rather than Sphinx
    • Sphinx is not compatible with revision compression or external storage because it accesses the sql directly
    • Sphinx does not support updating the index live, ie: you can't update the index on-save so changes show up instantly in the searches
    • The SphinxSearch extension's query suggest is really a spell check run through aspell/pspell, this is a bad way to do query suggest. Solr's query suggest should be based on the index properly (though it might not be as good as Lucene-search's as rainman speculates, it's still better than SphinxSearch's)
    • SphinxSearch's search interface is out of date.
    • Solr also supports 'cores' to separate indexes, it's probably easier to setup in environments with multiple wiki that want to share a search daemon.
  • Build a website and web spider which accepts urls of MediaWiki installations — and also tries to track them down independently — and looks at the wiki's installation and gives it a rating:
    • The rating will be a 5 star rating, a series of "common bad setup" checks will be made and a wiki with none of them will get a 4/5 (or maybe just 3/5), the last point will be awarded based on a series of checks for extra things added to some good quality installs.
      • (-) Using a major version which is not supported anymore, perhaps even more points off for using extremely out of date installs.
      • (-) Within the major version installed not being up to date with most recent security releases for that major (in addition to major version checks, so running 1.15.1 would be two points off).
      • (-) Urls with index.php in them. 1.5 for those with index.php?title= in them, just .5 for those with index.php/
      • It may be a good idea to search the wiki for spam, but that's more intensive than the checks planned for this setup. Perhaps a spider to track down abandoned wiki with heavy spam would be a separate worthy project.
  • A special format for dumping users. Ideally this format would be encrypted in some way. ie: Some way to encrypt each user entry by their own password hash, so only the user themselves can decode the data... though that's perhaps a little bit of a pipe dream, don't know how useful that is.
  • A tool to allow you to paste a blob of text, a url, or a reference to a page on a wiki.
    • The tool would scan for any urls in the blob of text and check them against various public blacklists. This can help you determine if adding certain public SpamBlacklists would help with your spam.
  • Build sophisticated enough replacements for shared dbs that we can depreciate them in favor of things that actually work in other dbs like PosgreSQL.
    • Better sharing of users between wiki (while we're at it we could build it integrated in a way that works even better than sharing a user table).
    • Rewrite the interwiki system to pull interwiki system from multiple configurable resources. Primary database, custom databases (ie: shared), flat files.
      • Replace the default interwiki link database insertion functionality with a built-in default flat file.
  • Update SpecialPage so that we can stop using $wgOut. Instead something like $this->getOut(), $this->getOutput(), $this->getOutputPage(), or $this->out should be used. - Yes Done
  • Stop supporting $wgUser->getSkin( $title );, it's essentially useless, not even used right, and is the wrong way to do things.
    • OutputPage should start managing what skin is being used to output the current page, this should become our central point bringing together things like what title is being used, what skin is being used, and the other things relevant to the output of the page. Skin should also start getting it's title from out.
  • Implement a way to map arbitrary paths to contents in namespaces, eg: While /wiki/Foo goes to Foo, make /help/Foo go to Help:Foo.
  • Add a hook to allow for arbitrary overriding of what title MW extracts from a path.
  • Remove the functional logic from Skin:
    • The list of what actions (move, etc...) are relevant to a page should be handled by a new base class below classes like Article, SpecialPage, etc... that handle what a page is.
  • Add Sender: support to our e-mail config. Some SMTP services will limit what address you can send from but support using the verified address in the Sender: iirc, while letting the From: be anything. We should support that for e-mails sent 'from' users.
  • Add array( 'class' => array( 'foo' => false ) ) support to Html::
  • Fix up rel=canonical
    • Output rel=canonical in all pages irrelevant of whether they are a redirect or not (someone could have linked to a ?query externally that should be dropped, there are wikis with tricks that stop normal 301 redirects that could use a rel=canonical, etc...)
    • Merge the code outputting language variant canonicals with the other rel=canonical code
    • Do this inside of the instance handling body, tabs, etc... ie: the viewer/article/specialpage instance.
      • This way the most authoritative class on this info can decide when the canonical is not applicable (like on a permalink)
      • Include some simple code like setCanonicalURL
  • Add a config that lets us leave / alone and not redirect
  • Switch selected and new classes in SkinTemplate personal_urls, content_navigation, etc... to array keys
  • An easy Special:Allpages mode to list all pages on the wiki without namespace restriction
  • Consider dropping the $wgUsePathInfo requirement for customized $wgArticlePath's (not the default /index.php/$1) since it's not set to true for non-apache/non-mod_php eg: Nginx, Lighttpd even though they can easily be configured to support these automatically. If the user has configured an $wgArticlePath they typically don't expect that they'll ALSO need to force $wgUsePathInfo on, typically providing a custom $wgArticlePath implies that you have setup the webserver to pass the paths to MediaWiki in a way we can decode.
  • Clean table based rc (there's a bug I'm watching about this I should look at)
  • Finish up moving Linker like methods out of HistoryPage
  • Move methods using $wgUser, $wgLang, $wgRequest, Request::userCan, and Title::quickUserCan (with the exception of deprecated methods and methods using stub threshold we hope will just disappear)
    • Deprecate formatSize and have people use and escape Language::formatSize directly
  • Based on skin template ideas implement a context sensitive template syntax for use in general programming
  • [PageView/pageoutput branch] New opt-in SpecialPage interface. Replace execute with a method that has no $par. Auto detect SpecialPage name. Add a getTarget that takes $par and WebRequest into account and knows about the target type of the SpecialPage (eg: Title or User)
  • Create a flipped version of vector with the pieces of the layout on the opposite sides of the layout. Call it NegativeVector, perhaps "−Vector" in i18n.
  • Create an extension that can be used on MediaWiki.org to replace Version lifecycle. A Special: page should provide the ability to update the version information. Minor releases should make older minor releases marked as obsolete or insecure. Major releases should permit a EOL date. All releases should permit a release date. The information should be exposed in a special page, through the api, and in a parser function we can use on Version lifecycle.
  • Create a "real" Subpage listing special page. Consider adding a link to it in the toolbox.
  • Implement a "Not what you're looking for? <Search for pages containing "...".>" feature when you use the 'Go' search. Probably using sessionStorage.
  • Make MediaWiki use a proper ETag with user info instead of Last-Modified and a Logged out cookie hack.
  • Implement a user-switcher that lets users be logged into multiple accounts (ie: for bot users, etc...)
  • Consider a Google like dropdown for personal urls in some skin.
  • An extension that will let you build tables of data store it, give you a good spreadsheet ui to edit them, allow you to import spreadsheets, and use a tag to allow the table of data to be inserted into and perhaps formatted in a page, as well as edit it inline.
  • User:Dantman/Anti-skin
  • Use http://publicsuffix.org/ to optionally validate e-mail addresses.
  • Package MediaWiki as a .phar archive:
    • Look at how short urls and shared hosts play with this (could be an interesting improvement)
    • Try jumping directly to installation instead of showing the index we have.
    • If we ever get a config database consider a .phar mode where it auto-detects a sqlite db with the same name as the .phar and auto-loads that then uses it for config.
    • Also have a mode where if we are say mediawiki-1.22.phar with no .db and we notice that there is a mediawiki-1.21.phar with a .db for it prompt the user with the question "We've detected the presence of a previous version of MediaWiki and a sqlite db. Would you like to copy the database and run the upgrade? You may wish to make that wiki read-only. <Yes, upgrade my wiki>, <No, install MediaWiki>"
  • Add meta description into core in a better way than I do it in Description2:
    • Have the parser output of every parse include a bit of meta info that contains a "summary" of the contents which is usable as a description.
    • Do this in a way that extensions can hook in and override the summaries generated to improve the way summaries are extracted if they don't work out that great on some wiki. And so that extensions can also provide things like parser functions and tag hooks that let users specify and explicit description for a page.
    • From the OutputPage or whatever take the ParserOutput for the primary content of the page and extract the summary and add a meta description tag with it's contents to the page output.
    • Consider doing this in a way that exposes the primary ParserOutput in a hook such that it can also be used by extensions to expose say an og:description as well.
  • Implement some method of marking images as either content or presentational and one as primary.
    • Maybe a |presentational| inside the normal [[File:...]] syntax. Or perhaps __PRESENTATIONAL__ on File pages?
    • By default the first content image can be considered the 'primary' image for a page, but we may want to consider a method of letting people set another image as primary.
    • The api should expose this information on parse or in another part of the api, for 3rd parties to make use of.
    • We may want to expose this in a special page interface so that it's easy to find out when a page has presentational images defaulting as content.
    • This information should be easily available to an extension so that for example an extensions that makes lists of pages can use the primary image information to display article images instead of titles or content.
    • We should probably add the presentational boolean to whatever api we use to add images to the parser output, this way any extension hooking in to add additional images in a programmatic way can easily start marking images as content or presentational.
    • OpenGraphMeta (or perhaps the OpenGraph exporting data could be moved right into core) could use this information to output a relevant set of og:image data for 3rd parties to extract and use when presenting something like a summary of the page.
    • Add a parser function that outputs the primary image for the article who's title is given to the parser function.
  • In OpenGraph metadata export (whatever ends up being the final thing doing that job) consider outputting a special dedicated og:description (probably note the same as meta description) for diff pages:
    • The current summary that gets auto extracted by 3rd party services for diffs look something like this: https://plus.google.com/u/0/100207479247433590565/posts/6UaWiuGZPZa
    • For a diff page we may consider a summary more like: "Revision as of 05:58, 28 October 2011 by Dantman; Revision comment: ..."
    • We'll probably want permalinks to just output the normal content in the description instead of output about the change.
  • Write a simple bash script that will download php 5.4, ./configure it with a local --prefix, start up a php built in cli-server with a nicely setup router, and use mw's cli based installer to install a standalone dev environment with common defaults.
  • Rewrite or improve the search suggestions code:
    • Take the code for vector's simple search and mwsuggest and either replace both or drop one and make it so we only have one suggestion engine to use across all skins.
    • Allow extensions to hook into the suggestion code and add custom suggestions and re-order them.
      • eg: For a wiki about movie series or television series a search suggestions extension should be able to key in on search suggestions for articles that are a series or movie and reorganize the results to group them at the top and then modify the suggestion to add in things like a tag below the name saying it's a series or movie (or maybe a group header) and some cover art to the left of the suggestion.
  • Add a JS feature to Special:SpecialPages that allows for users to start typing into the box and have matching special pages highlighted so that they can be easily found and tabbed through with the arrow keys then one can use enter to go to the selected page (like this File:SpecialPage Filter JS.png)
  • Changes to
    • Add a new variable to the MediaWiki messages files that will define the lang="" code to use inside the browser (useful for special languages like simple). (Also add a function to the language class to return it)
    • Make sure to use wfBCP47 on all language codes outputted to the browser.
      • Make sure that wfBCP47 handles the special case casing for 2 char, 1 char, and 3/4+ chars.
    • Add hooks that will make MediaWiki use a different file for loading the messages / class of a language (for custom extension based private languages).
    • Add a hook to the i18n Names.php that will let extensions define new language names.
  • Build and abstract link matching filter api and interface.
    • Core and extensions can define lists of url matching filters with a name. This would be used by the image embed whitelist and the spamblacklist url filter.
    • An abstract api would give access to this interface. Something like LinkMatchFilter::factory( 'spamblacklist' )->matchIsPresent( $url );
    • This would include a special page that would allow the user to edit the list.
    • The interface would either give multiple possible types of matches, domain match, regex, etc... or some sort of intelligent match editing interface that would allow for simple matches to be expressed intuitively.
  • PageLayouts
  • Write an extension to embed the Ace editor: - Extension:CodeEditor
    • Enable it on .css and .js pages.
    • If possible see if it can be enabled somehow for <source> tags.
    • Look and see how it can be integrated into the Lua extension for editing.
    • See if a WikiText mode can be implemented (this may help with that <source> idea, perhaps a bit of Lua too).
      • If implemented, be sure to push it upstream.
      • Make sure this is optional, something that can be toggled on and off.
      • See if the WikiText mode display can be less code like, since when editing WikiText you don't think of it completely like code.
      • Look at integration into WikiEditor so it'll be WMF compatible, we can have both a nice looking editor chrome and editing area, and we can have a nice toggle button built into WikiEditor's toolbar.
  • Examine references.
    • See if there is a general intuitive way to input citations. (Visual editor ideas too)
    • See if there is some sort of microdata format we can use for output.
      • Use itemref in a link to actual microdata connection?
    • Think of how we can swap normal references sections for inline things, other kinds of popups, etc...
  • MWCryptRand
    • Tim says php will buffered read 8k, not 4k. It seams that reading from /dev/urandom can also starve /dev/random. Use stream_set_read_buffer if available added (5.3.3) to read as little as possible. If it's not available then read the entire 8k instead of reading 4k and discarding the other half of the buffer.
    • Try adding EGD support (http://egd.sourceforge.net/)
      • Try checking for the /dev/egd-pool and look at where else OpenSSL might look for it (if it's not a /dev then we should try and double check it is in fact a random stream; If we can't check that it's a unix socket and not a real file then we can at least open it 2 or 3 times, read a little from it, and then compare the output to ensure that none of them are the same (to ensure we're not reading out of a static file in a dangerous way)
      • Add a $wg variable to allow the path or host:port to an EGD socket to be defined, eg: $wgEGDSocket = "localhost:7894"; or $wgEGDSocket = "/home/foo/.gnupg/entropy";
      • If there's a common egd config path we might consider trying to read the config
    • Consider opening /dev/random if /dev/urandom is not available (careful with it though, it can block, etc...; make sure to use stream_set_timeout to avoid blocking waiting for randomness)
    • Look into mcrypt_create_iv. The MCRYPT_DEV_URANDOM may or may not be useful, but definitely take MCRYPT_RAND into consideration. Though note that on 5.3+ it seams that URANDOM works even on Windows. Figure out what's really going on in the background.
    • (Tim) On Windows try shelling out to vbscript to access Window's random pool.
    • Deeply examine srand.php's technique looking for clock drift to improve on drupal's technique.
  • Change Special:Preferences so that you don't manually input the watchlist token anymore. - Yes Done
    • Drop the random token generated on each preferences page view.
    • Change the watchlist token field into a read-only input or div (<output>?)
    • Follow the watchlist token field with a checkbox to reset the token
    • When resetting consider just blanking it. Remember that Special:Watchlist automatically generates it when you need it.
  • Write a tool that lets you go through all the domains used in external links on the wiki and individually decide if links should be follow or nofollow.
    • Consider expanding this into a generic link management tool:
      • Let users control nofollow.
      • Let users blacklist/whitelist from a ui.
  • Create a system for vouching for other users; instead of having to use a top down hierarchy of control like in ConfirmAccount or restricting edit to groups.
    • A user's account is on hold when first created and they cannot edit.
    • Another user on the wiki that may know them may vouch for them. At which point they gain the ability to edit.
    • A user who vouches for a lot of users a large percentage of whom get banned for bad behavior loses their ability to vouch and potentially ends up automatically banned themselves. (ie: Spammers who try to create spam sockpuppets get banned themselves. But a user who accidentally trusts one spammer out of a number of good people doesn't get banned.)
    • Some amount of rate-limitting may also prevent spammers from easily gaining a number of accounts. Perhaps you may also require a period of time or even multiple vouches before you are allowed to vouch for other users.
    • Users can un-vouch for users, so if you begin to think someone is untrustworthy you can un-vouch for them. Doing so may lead to them losing the ability to edit (as well as anyone they vouched) for. However this is based you being the only person vouching for them. An established member of the community may have multiple people vouching for them. If one starts to distrust them and un-vouches for them they still have rights since the other members of the community trust them. While a spammer who only vouches for their sockpuppets will lose rights on all accounts once the people that vouched for them stop vouching for them. Unvouching for a user may still leave some logs behind. That way a spammer cannot get around vouching by unvouching for their spam accounts after they get banned.
  • Add an API for getting things like the upload url and login url. These are a complete mess to get currently. Add in a way for extensions to hook in and change the url.
  • Write a FUSE module that exposes a Kyoto Tycoon (perhaps even Memcached) server EhCache as a caching filesystem. Such a thing could be used for thumbnail storage alongside a 404 handler to allow unused files to disappear and not take up space.
  • Write an auth system that uses user certificates.
    • Use <keygen> to generate the pubkey and privatekey.
    • Sign and return a certificate to the user. The authority isn't important, each server could just have it's own self-signed certificate.
    • Don't bother with CA auth on the server. Instead just save each pubkey in the database and use it to do authentication.
    • Allow multiple certificates (named too) and the ability to revoke them.
  • User:Dantman/Abstract Revision System: Implement an abstract revision system. Such a system would replicate the whole system of revisions, parents, history, revdelete, etc... but in a system that also adds type/origin separation that allows extensions to create new types of things which can have a whole set of revisions but without them needing to be tied to a specific page. A system like this could allow for extensions to create LQT posts, forum posts, blog comments, request forms (like our dev requests, etc...), problem reports (article comments, like the feedback we end up appending to an article), form submissions, infoboxes, css/js, gadgets, etc... all where each item has a complete history but without requiring any of them to be tied to a page. These revisions would be separate from the normal edit system so their contents could be anything in any syntax and would not be subject to editors being able to modify them outside of whatever interface the extension provides. (Also consider expanding the protection system to support this)
  • Special new pager system for special pages:
    • Uses clean high level templates to define the output format
      • These templates can be used both by PHP and by JS
      • Either light object based. PEG based. Or maybe PHP parsed then delivered as an object based syntax to the client.
      • Information like header areas, footer areas, potentially custom pagers, and what rows are like and their location are marked up in these templates.
    • Registering classes into this pager system automatically registers them into a special API module.
    • Using this pager system for output automatically makes special pages very dyamic:
      • An abstract series of scripts takes over handling for any links and forms inside the pager. The client side scripting automatically does fetches to the automatically exposed API and then uses the template to format output and intuitively update the page. It may also use pushState to update the url.
  • Things like delete, protect, block, etc... pages shouldn't be top level special pages or actions. (Actually if it can fit into an action than maybe it's not even top-level at all?). We should have an interface where these can be defined as smaller abstract components. Defining these actions to those pages will make it possible to automatically expose all pages defined with this interface as ajax UIs as well. eg: A protect dialogue that pops up on the page being protected instead of changing pages.
  • CloudFlare Extension:
  • TorWiki: A MediaWiki mode that makes it safe to use MW over a Tor hidden service. (IP addresses are hidden, etc...)
  • Postprocessing parser api. Basically a stripstate that lets you insert tokens into the parser output which later on will be replaced by calling a specified function with given parameters.
  • Add an extra option to DISPLAYTITLE so when $wgRestrictDisplayTitle is false any title that does not normalize to the actual title (ie: would usually be restricted) will have a subtitle added below the title. Something like "Title: Foo (bar)" if you used {{DISPLAYTITLE:Foo}} on "Foo (bar)".
  • Create an extension for archive page listing. Use a parser function like {{#archivelist:Archive_$1|ul}} to list archives.
  • Implement a two way router:
    • In addition to routing paths to arguments the router will also be used to generate MW's urls for different purposes.
    • In addition to the planned support for different entrypoints such as 'api', 'thumb', 'index'... this router would understand various types of urls:
      • A different name for action/view/etc... replacements in both indexed and nonindexed variants; ie: indexed being our articlepath. And now having the ability to use nonindexed paths besides /path/to/index.php; eg: /wiki/Foo & /w/page/Foo?action=edit the latter being a fancy url that is still behind robots.txt
      • 'view' (viewing of an article)
      • 'simplediff' and 'diff' as separate entries; simplediff being ones that can be expressed with less parameters
      • Perhaps names will be defined in an inheriting way; eg: Something like 'edit' -> 'action' -> 'index'; Such that 'index' might be user set to the default /index.php (with extra arguments being put into the query string automatically) whilst action and edit may be unset. So in this case a url building request for 'edit' [ title = 'Foo' ] may end up as /index.php?title=Foo&action=edit (the &action=edit coming from the definition of 'edit' and 'action' being passed on).
  • Create a special class for handling various special urls inside mediawiki:
    • Primary urls for this would be upload, login, createaccount, and logout.
    • For many of these urls some sort of extension or configuration would have rationale to override them. For upload a foreign files only wiki would want to point upload to the foreign wiki. And in a SSO auth where signin was entirely done with an external login form it would make sense for the login urls to be overridden to point directly to the SSO.
    • Many of these urls are used in multiple locations, most with no way to override. As a result extensions cannot easily supplant these things.
  • "Instead of MediaWiki implementing AuthWordPress, AuthDrupal, and AuthPhpBB; and WordPress implementing AuthMediaWiki, AuthDrupal, and AuthPhpBB; and Drupal implementing AuthMediaWiki, AuthWordPress, and AuthPhpBB; and phpBB implementing AuthMediaWiki, AuthWordPress, and AuthDrupal; etc... MediaWiki, WordPress, Drupal, phpBB, etc... all just implement a special AuthProvider interface and a general Auth plugin that uses said interface. Every system only needs to implement two things (or even one) and then you can configure every system to use users from another."
    • Write up a standard for an Auth Provider/Consumer interface for systems belonging to the same owner:
      • Use HTTP to avoid issues with loading, separate servers, language differences, etc...
      • Use a shared key between the auth provider and consumer.
      • Have client configure auth url, shared key, site id, etc...
      • Have provider configure site id, shared key, login/logout/etc... webhook url, etc... for each site.
    • Implement an AuthProvider and AuthPlugin for MediaWiki. Also consider implementing others for WordPress, Drupal, etc...
  • Implement a proper "library" mode for MediaWiki.
    • One where a package manager can place the codebase someplace like /usr/share/mediawiki-X.XX/ with only one extra file to tell MediaWiki how to handle the packaging.
    • Some helper script can set MediaWiki up in some directory. A reference to the codebase can be formed using an Alias, symbolic link, or rewrite.
    • MediaWiki will suggest install points relative to that directory instead of relative to where the codebase is.
    • The library config will allow locations for extecutables, etc... to point to their standard installed locations.
    • Library config may also give hints like preferred database info.
    • Packages would preferentially be done with separate mediawiki-X.XX packages and a 'mediawiki' package that points to the latest one and ensures it's installed (or two, one that points to the LTS and one that points to the latest release).
    • Upgrading would be done on individual wikis (perhaps with a helper script) by changing their Alias/symlink to the new mediawiki-X.XX source and then running the updater.
  • Introduce a new custom RL module that will auto-generate the css for external link icons. 'mediawiki.skinning.content.externallinks' module
    • Skins can use this instead of implementing it in their own css.
    • Skins don't need to copy or refer to other skin's link icons.
    • We can standardize the icon set while perhaps leaving room for some skins like MonoBook to override.
    • I wanted actual auto generation from php, with the ability for skins to define alternate images in an array. But this might be fine. Daniel Friesen (Dantman) (talk) 22:02, 27 January 2015 (UTC)
  • Give ResourceLoader an API/Extensions like description/documentation system.
    • Add description message keys (like the ones we use for extension credits descriptions) to every Resource Loader module.
    • In the i18n system give a description (and lite documentation?) of the module.
    • Setup a special page that will list all the RL modules available on a wiki.
    • When we start generating documentation for extension scripts and RL modules consider making it so that RL modules have enough information for the special page to link to generated documentation on the modules.
  • Add support to maintenance/dev/ for multiple instances:
    • Support an optional instance argument to scripts.
    • When in use use a different db location/name in the installer and settings.
    • Tell the installer to save the settings file to a different location.
    • When starting the webserver tell the installer to use a different local settings location. (MW constant from router? env var? env var that router sets constant for?)
  • Define a way for extensions to define missing dependency errors without using die.
    • Add a constant when we do so extensions can still use die on older versions of MW.
    • Define a function/method they can use to declare these errors at the top of their main extension file, then return from the file.
    • Output wfDebug messages for these errors and also output full warnings on Special:Version.
  • Change category next/prev links to /wiki/Category:World_Wide_Web?pagefrom=Xsa%0AXSA#mw-pages and add rel next/prev <link>s so search engines can index complete categories.
  • Create a spamdelete extension:
    • Spamdelete will move page table rows to a spam_page table similar to how an archived_page table (or page_deleted/deleted_page table?) would work.
      • If that RFC is implemented maybe we can add a column to that table instead of adding a third page table.
    • Spammed pages will linger in that table for about a month to permit undeletion and avoid abuse of the feature. Then they and all their revisions will be purged from the database.
    • If an extension is implemented that allows users who don't have any edits, etc... that their user_id is needed for in the database to be purged. Then we'll add a hook to spamdelete to trigger that extension for the creator of the first revision of the page that was spamdeleted when the page is purged from the spam_page table.
  • Add a new tool to analyze/extract some data from extension .php files: (should now be doable with extension.json!)
    • Use token_get_all to parse the php files in extensions
    • Iterate over all the extensions we have in gerrit git repositories regularly.
    • Track wgHooks and Hook::register registration of hooks.
    • MAYBE track wfMessage and ->msg message key usage. Maybe. Meh, maybe not.
    • Track wgResourceModules registration of RL modules.
    • Maybe track definition of global variables.
    • Maybe track modifications to wgGroupPermissions and wgAvailableRights
    • Maybe track wgSpecialPages registration of special pages.
    • Maybe track NS_ constant registration, $namespaceNames in i18n (bonus parse wgExtensionMessagesFiles definition to find the i18n files), and wgExtraNamespaces to also track NS id usage.
    • After extracting all this data store it in a database.
    • Display this information in tables on a public web frontend, probably on labs.
  • Make sure we use https://github.com/lowe/zxcvbn for password strength meters.
  • Start recording file history of MediaWiki for future use in upgrade tools:
    • Maintain a list of glob paths for files that are part of MW: includes/**.php, ...
    • Write a maintenance script that'll iterate over all files in the glob paths and then save checksums about them into a file.
      • Be sure to avoid bugs with \n being converted to \r\n by FTP in the checksumming (ie: convert all \r\n to \n before checksumming.)
      • Incorporate this script into the release process so future versions of MediaWiki will come bundled with this checksum file. (This will allow an upgrader to verify that core files have not been modified.)
    • Also maintain a list of files that are removed from MW across versions and we would want to have deleted when an upgrade is performed.
  • Consider tweaking Hooks::register so that in debug mode it uses the stack to register the block (function/method/global), file, and line that a hook was registered at.
  • Implement a snapshot feature for ResourceLoader.
    • Allow RL module classes to indicate whether they can or cannot be snapshotted (some like the user dependent ones would be too dynamic).
    • Create a "ResourceLoaderModuleSnapshotable" interface for RL Module classes to implement handling to output and input snapshots for their instances.
      • Snapshot data should be handled in such a way to minimize dependence on any MediaWiki resource such that the output will remain the same even on a different version of MediaWiki. Something with a high degree of variance in compiled output may want to snapshot the pre-compiled input instead of the output. While LESS compiled styles may depend on extra files in the include path or functions implemented inside PHP having differences between MW versions and hence would be better outputting post-compiled data into snapshots.
    • Implement a maintenance script that will create a snapshot that can apply to the current &version= of modules by iterating over all current modules, asking for a snapshot, and then saving it.
    • Setup ResourceLoader so that to understand module snapshots and when the &version= of a load.php query is old attempt to serve the module from the snapshot instead of serving it normally.
    • ((The fundamental idea here is that when doing an upgrade of MediaWiki setups like WMF may first create a snapshot, then upgrade, and load the snapshot into the new version of MediaWiki. This will allow old cached pages to remain as-is without breaking even if a module is modified, renamed, or even deleted.
  • Write some sort of extensions.php maintenance script.
    • ((Or just write some node.js tool since this is just a hack/workaround))
      • If I make this a node tool, I could make it interactive and easier to use.
      • I could also accept this is a hack and have a labs tool that crawls all our extensions and builds the database that way for an API instead of making it a MediaWiki org thing.
      • Allow extensions to have a .dotfile that exposes some preferences like git tag usage, that REL branches should be preferred in some situations, or that installation via git is broken and composer should be used (SMW).
    • Have a MediaWiki.org API that knows about all our extensions: knows what ones use git (and which repo), what ones use REL branches and what ones use version tags (and what format), and what ones use composer.
    • Have extensions.php list list the downloaded extensions in extensions/, what version/source they are using (git master, git tag, composer version, or static files).
    • Have extensions.php outdated list the download extensions in the same way, but check if the extensions are out of date (git fetch and check if git branch is out of date, a new version tag is available, or composer says it is outdated).
      • A warning may be output noting that a static or ExtensionDistributor installed extension may have no comparison.
    • Have extensions.php download Foo look up the preferred method of installation (composer, git tag, git master, or ExtensionDistributor download) and acquire the files that way.
      • Use -f --force to replace an existing extension and destructively switch installation method (replace static download with git repo, replace git repo with composer install).
      • Support downloading multiple extensions at once.
    • Have extensions.php update Foo update the extension (git pull, git checkout tag, composer update, or ExtensionDistributor download).
      • If using git-rel this might check the MediaWiki version and do a git checkout if the REL branch needs changing.
    • Have extensions.php switch Foo (git-master,git-tag,git-rel,composer,extensiondistributor) switch between installation methods.
      • Use -f --force if this will be a destructive switch (git <-> composer, ed <-> anything, etc...).
      • The primary use for this will be switching between git-master and git-rel types.
      • Most git based extensions will support multiple git-* installation types.
      • Most extensions will support extensiondistributor installs.
  • Make something that scans all of our classes and reports the method definitions and global functions. ([static], [public/private/protected] Foo::Bar ($argname, $argname = null, ...)
    • Do this for all releases as they come out and use it to build a report of new/removed/modified function definitions and new/removed classes.
    • It doesn't do breakages inside functions, but it would work as a partial breaking changes report.

Infrastructure[edit]

  • Try Kyoto Tycoon in place of Memcached; Ability to have a disk based cache.
  • wiki host experiment; Replace MongoDB with a combination of Riak (for config document and whatnot storage), Memcached (or Kyoto Tycoon) for performance fetching of that data, and MySQL for indexing the stuff that ends up as lists in management interfaces rather than easy key-value lookups. Riak has MapReduce, but it's probably not what I'd want to use for those lists. However since it's possible for MySQL do end up with a data desync, we can probably use Map Reduce when we need to re-populate the data in MySQL. (Note: Perhaps I should make sure management UIs bypass the cache and always fetch from Riak)
  • Implement a Bayesian spam filter extension using http://php-classifier.com/ or https://github.com/fieg/bayes.

Small code changes[edit]

  • Consider moving related user and related title out of Skin and into either OutputPage or RequestContext (Actually perhaps the page/viewer)
  • Update special pages to use context properly
  • Add a new hook inside of RequestContext::getSkin() to allow extensions to override what skin is loaded in certain contexts, such as different skins on certain pages, or a different default skin for admins.
  • Stop array_reverse-ing the personal, namespace, and view link lists in vector and flip the float instead. Reversing them creates incorrect orders in text browsing, screen readers, and when tabbing through lists.
  • Tweak installer to use WikiPage instead of Article.
  • Reserve the skin name 'default' and make it so that &useskin=default always uses the $wgDefaultSkin.
  • Implement rel=next and rel=prev support on most paginated pages (AllPages, query pages, categories). Consider rel=index or something for the instance where there are so many pages we display an index.
  • mediawiki.special.search.js's autofocus should be in a common module.
  • maintenance/changePassword.php should support password input via stdin instead of requiring an unsafe argument.
  • Finally drop css targeting #toc and #firstHeader now that the classes have been around for awhile

ToDo / Fixme[edit]

  • rel=archive is still not present in history links in Vector, recode Vector to stop using it's attributes key and use makeListItem.
  • re-introduce pre style changes (without overflow changes). Put the new style in commonElements and the old style in MonoBook.
  • makeListItem should support 'tag' => false?
  • Fix emails so that page deletion and user rights changes don't use "[...] page [...] has been created by [...]" as their subject.
  • Fix hide trick use for mw-jump. Should use less buggy clip() trick.
  • Move common*.css files into modules.
  • Introduce 'skinPath' => 'skname'; and $wg variables for both extensions and skins to override default paths for specific ones.
  • Implement MySQLi Database class so we can test and ease into migration away from PHP's deprecated ext/mysql (PHP 5.5 is actually supposed to start outputting E_DEPRECATED errors). Should we do PDO too?
  • Double check we're not using any of 5.5's depreciated features http://php.net/manual/en/migration55.deprecated.php (try to check extensions too).
  • Remove XHTML1 support from MediaWiki.[RFC] Drop XHTML 1.0 created.
    • $wgHtml5 won't be needed anymore so remove it.
    • Consider supporting XHTML5 when someone sets $wgMimeType to an XML/XHTML mimetype.
      • Use $wgXhtmlNamespaces when an mimetype is used.
        • For posterity we might as well expand Html::isXmlMimeType to support all */*+xml mime types instead of just application/xhtml+xml.
      • Output an xmlns="http://www.w3.org/1999/xhtml" in the <html> element when an XML mimetype is used.
    • Neither HTML5 nor XHTML5 use a full doctype, so $wgDocType and $wgDTD can be removed.
  • Move charset <meta> above the <title>.
  • Write an RFC for my thoughts on moving categories out of wikitext.
  • Implement OpenGraph and description generate support in core.
    • Every ParserOutput should contain a description. Certain parts of the code like article handling will then assign the right one to the output page.
    • Refine my old Output Metadata ideas and use them for outputting the RDFa stuff.
  • Add some special case handling to Html.php for the <a> tag's download attribute. This attribute can be both boolean (<a ... download> when xml well-formedness is not necessary) and non boolean (<a ... download="filename.ext">).
  • Add a Html::escape to escape non-attribute html like Html::element does and update our escaping to use str_replace instead of the less efficient strtr. (Gerrit change 67605)
  • bug 49232
  • http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/70395 kill $wgPasswordSalt after making our backwards compat code work without it simply by testing both the salted and unsalted versions of an ancient password hash.
  • See if wfHostname could make use of gethostname.
  • Output an x-default for /wiki/ when we output rel=alternate hreflang as suggested by http://googlewebmastercentral.blogspot.ca/2013/04/x-default-hreflang-for-international-pages.html
  • Add support for &uselang=x-default as a way of bypassing user lang temporarily and viewing the site in the content lang.