User:Dantman/Code Ideas
From MediaWiki.org
Just a random spot for code ideas I feel like working on and may or may not get around to:
- Write a TorBlock like extension for http://stopforumspam.com/
- Keep in mind dnsbl.tornevall.org should also be used to lessen load.
- Consider implementing an alternate form of dnsbl which like TorBlock can be enabled, only block people selectively, and can also be enabled initially with blocking features disabled so you can look at rc tags to understand what set of edits will actually be blocked before enabling full block.
- Help improve Extension:MediaWikiAuth.
- Try adding a degree of mobile support to MediaWiki. ie: Try to get as much of the Wikimedia mobile stuff working in core, without requiring a full extra ruby server.
- Manual:Gallery of user styles/iWiki is the mobile friendly skin I was talking about lastnight in IRC when that GSOC disucssion came up (if you were there, I can't remember), there also appears to be Manual:Gallery_of_user_styles#WPtouch, But I don't know how well either of those functions. Peachey88 07:37, 3 February 2011 (UTC)
- Add iPad support to monaco-port (issue: requires a donated iPad).
- Write a querying framework for MediaWiki?
- A lot of our core and extension code writes a lot of ugly sql queries and table joins, which are dependent on table structure. Which of course gets in the way of any large changes to the db because piles of simple extensions now depend heavily on the database structure.
- Write an abstract api for &action='s and article types.
- Right now we make various assumptions when it comes to actions. We usually just check
$wgRequest->getText('action', 'view');. However this of course has oddities with things like view/purge/empty and edit/submit. It's not a very good way to do things, and then there are diffs. So it would be good to have an abstract api to declare what type of page is being displayed and allow skinning to call it so it knows how to style the page.
- Right now we make various assumptions when it comes to actions. We usually just check
- Write a diff/rdiff based dump format for page text. An exporter and importer, so that hosts that use external storage can dump text as well as a database dump in a way that the customer they are exporting for can actually import it somewhere else.
- The format would probably be an xml file containing elements with diffs for contents (to be more space efficient) and an attribute marking what text id they were associated with in the db (and whatever extra fields were necessary).
- The import script would be run after you do a database import, it would run over the text dump finding revisions using the text id importing the text into whatever text/external storage is configured for the wiki (by default the text table) and updating the revision rows to use the new text id.
- Write a built-in job queue based replacement extension, supporting patterns, and selective replacement.
- pywikipedia is too complex for most people, it's also annoying to setup a client side bot, it would also be easier if you just gave the wiki a list of replacements, it found all the things it would replace, and gave you a quick way to peruse the diffs in an easy to read webpage so you could give your ok.
- ReplaceText is too simple, no complex patterns. Additionally it does ugly things with SQL in a MediaWiki bad practice which makes it incompatible with any method of trying to efficiently store a wiki's content (ie: breaks on compression or use of external storage).
- A formal extension definition file. Right now there is no up front specification of what it takes to setup an extension, a file explaining this (and doing it better than just saying to run update.php or some install.php script) would be useful for future and my own wiki farm experiments. This should detail the steps for setting up a database and whatnot in a way that update.php or an extension specific script must be run. At the very least the tasks should be done in an abstract enough way they can be imported into some other specialized piece of code specific to the installer that knows how best to do things. Should really be abstract enough that even a wikimedia level pattern of installation can be started automatically from the extension definition without a bunch of manual sql work.
- A anti-spam network daemon (lisp?). Various wiki from different groups tie in to a daemon and submit hashes of content that is being created on the wiki. When the daemon sees the same content being added to a bunch of different wiki it starts to warn other wiki to be more strict about allowing that data (presumably spam) to be posted. Via requiring captchas, requiring autoconfirmed, some other method, or just flat out saying "Hey, this looks like spam going to a bunch of wikis, you'll have to come back later to post it."
- Implement support for Solr as a search engine in an extension. Solr supports query-suggest now, shards, and updating on the fly. It's not as complex to setup as lucene-search, or at least it's a standard daemon built by a project focused on that. Solr could be the best alternative to Lucene-search for general use rather than Sphinx
- Sphinx is not compatible with revision compression or external storage because it accesses the sql directly
- Sphinx does not support updating the index live, ie: you can't update the index on-save so changes show up instantly in the searches
- The SphinxSearch extension's query suggest is really a spell check run through aspell/pspell, this is a bad way to do query suggest. Solr's query suggest should be based on the index properly (though it might not be as good as Lucene-search's as rainman speculates, it's still better than SphinxSearch's)
- SphinxSearch's search interface is out of date.
- Solr also supports 'cores' to separate indexes, it's probably easier to setup in environments with multiple wiki that want to share a search daemon.
- Build a website and web spider which accepts urls of MediaWiki installations — and also tries to track them down independently — and looks at the wiki's installation and gives it a rating:
- The rating will be a 5 star rating, a series of "common bad setup" checks will be made and a wiki with none of them will get a 4/5 (or maybe just 3/5), the last point will be awarded based on a series of checks for extra things added to some good quality installs.
- (-) Using a major version which is not supported anymore, perhaps even more points off for using extremely out of date installs.
- (-) Within the major version installed not being up to date with most recent security releases for that major (in addition to major version checks, so running 1.15.1 would be two points off).
- (-) Urls with index.php in them. 1.5 for those with index.php?title= in them, just .5 for those with index.php/
- It may be a good idea to search the wiki for spam, but that's more intensive than the checks planned for this setup. Perhaps a spider to track down abandoned wiki with heavy spam would be a separate worthy project.
- The rating will be a 5 star rating, a series of "common bad setup" checks will be made and a wiki with none of them will get a 4/5 (or maybe just 3/5), the last point will be awarded based on a series of checks for extra things added to some good quality installs.
- A special format for dumping users. Ideally this format would be encrypted in some way. ie: Some way to encrypt each user entry by their own password hash, so only the user themselves can decode the data... though that's perhaps a little bit of a pipe dream, don't know how useful that is.
- A tool to allow you to paste a blob of text, a url, or a reference to a page on a wiki.
- The tool would scan for any urls in the blob of text and check them against various public blacklists. This can help you determine if adding certain public SpamBlacklists would help with your spam.
- Build sophisticated enough replacements for shared dbs that we can depreciate them in favor of things that actually work in other dbs like PosgreSQL.
- Better sharing of users between wiki (while we're at it we could build it integrated in a way that works even better than sharing a user table).
- Rewrite the interwiki system to pull interwiki system from multiple configurable resources. Primary database, custom databases (ie: shared), flat files.
- Replace the default interwiki link database insertion functionality with a built-in default flat file.
- Update SpecialPage so that we can stop using $wgOut. Instead something like
$this->getOut(),$this->getOutput(),$this->getOutputPage(), or$this->outshould be used. - Stop supporting
$wgUser->getSkin( $title );, it's essentially useless, not even used right, and is the wrong way to do things.- OutputPage should start managing what skin is being used to output the current page, this should become our central point bringing together things like what title is being used, what skin is being used, and the other things relevant to the output of the page. Skin should also start getting it's title from out.
Implement a way to map arbitrary paths to contents in namespaces, eg: While /wiki/Foo goes to Foo, make /help/Foo go to Help:Foo.Add a hook to allow for arbitrary overriding of what title MW extracts from a path.- Remove the functional logic from Skin:
- The list of what actions (move, etc...) are relevant to a page should be handled by a new base class below classes like Article, SpecialPage, etc... that handle what a page is.
- Add Sender: support to our e-mail config. Some SMTP services will limit what address you can send from but support using the verified address in the Sender: iirc, while letting the From: be anything. We should support that for e-mails sent 'from' users.
Add array( 'class' => array( 'foo' => false ) ) support to Html::- Fix up rel=canonical
- Output rel=canonical in all pages irrelevant of whether they are a redirect or not (someone could have linked to a ?query externally that should be dropped, there are wikis with tricks that stop normal 301 redirects that could use a rel=canonical, etc...)
- Merge the code outputting language variant canonicals with the other rel=canonical code
- Do this inside of the instance handling body, tabs, etc... ie: the viewer/article/specialpage instance.
- This way the most authoritative class on this info can decide when the canonical is not applicable (like on a permalink)
- Include some simple code like setCanonicalURL
- Add a config that lets us leave / alone and not redirect
- Switch selected and new classes in SkinTemplate personal_urls, content_navigation, etc... to array keys
- An easy Special:Allpages mode to list all pages on the wiki without namespace restriction
- Consider dropping the $wgUsePathInfo requirement for customized $wgArticlePath's (not the default /index.php/$1) since it's not set to true for non-apache/non-mod_php eg: Nginx, Lighttpd even though they can easily be configured to support these automatically. If the user has configured an $wgArticlePath they typically don't expect that they'll ALSO need to force $wgUsePathInfo on, typically providing a custom $wgArticlePath implies that you have setup the webserver to pass the paths to MediaWiki in a way we can decode.
- Clean table based rc (there's a bug I'm watching about this I should look at)
- Finish up moving Linker like methods out of HistoryPage
- Move methods using $wgUser, $wgLang, $wgRequest, Request::userCan, and Title::quickUserCan (with the exception of deprecated methods and methods using stub threshold we hope will just disappear)
- Deprecate formatSize and have people use and escape Language::formatSize directly
- Based on skin template ideas implement a context sensitive template syntax for use in general programming
- [PageView/pageoutput branch] New opt-in SpecialPage interface. Replace execute with a method that has no $par. Auto detect SpecialPage name. Add a getTarget that takes $par and WebRequest into account and knows about the target type of the SpecialPage (eg: Title or User)
- Create a flipped version of vector with the pieces of the layout on the opposite sides of the layout. Call it NegativeVector, perhaps "−Vector" in i18n.
- Create an extension that can be used on MediaWiki.org to replace Version lifecycle. A Special: page should provide the ability to update the version information. Minor releases should make older minor releases marked as obsolete or insecure. Major releases should permit a EOL date. All releases should permit a release date. The information should be exposed in a special page, through the api, and in a parser function we can use on Version lifecycle.
- Create a "real" Subpage listing special page. Consider adding a link to it in the toolbox.
- Implement a "Not what you're looking for? <Search for pages containing "...".>" feature when you use the 'Go' search. Probably using sessionStorage.
- Make MediaWiki use a proper ETag with user info instead of Last-Modified and a Logged out cookie hack.
- Implement a user-switcher that lets users be logged into multiple accounts (ie: for bot users, etc...)
- Consider a Google like dropdown for personal urls in some skin.
- An extension that will let you build tables of data store it, give you a good spreadsheet ui to edit them, allow you to import spreadsheets, and use a tag to allow the table of data to be inserted into and perhaps formatted in a page, as well as edit it inline.
- User:Dantman/Anti-skin
- Use http://publicsuffix.org/ to optionally validate e-mail addresses.
- Package MediaWiki as a .phar archive:
- Look at how short urls and shared hosts play with this (could be an interesting improvement)
- Try jumping directly to installation instead of showing the index we have.
- If we ever get a config database consider a .phar mode where it auto-detects a sqlite db with the same name as the .phar and auto-loads that then uses it for config.
- Also have a mode where if we are say mediawiki-1.22.phar with no .db and we notice that there is a mediawiki-1.21.phar with a .db for it prompt the user with the question "We've detected the presence of a previous version of MediaWiki and a sqlite db. Would you like to copy the database and run the upgrade? You may wish to make that wiki read-only. <Yes, upgrade my wiki>, <No, install MediaWiki>"
- Add meta description into core in a better way than I do it in Description2:
- Have the parser output of every parse include a bit of meta info that contains a "summary" of the contents which is usable as a description.
- Do this in a way that extensions can hook in and override the summaries generated to improve the way summaries are extracted if they don't work out that great on some wiki. And so that extensions can also provide things like parser functions and tag hooks that let users specify and explicit description for a page.
- From the OutputPage or whatever take the ParserOutput for the primary content of the page and extract the summary and add a meta description tag with it's contents to the page output.
- Consider doing this in a way that exposes the primary ParserOutput in a hook such that it can also be used by extensions to expose say an og:description as well.
- Implement some method of marking images as either content or presentational and one as primary.
- Maybe a |presentational| inside the normal [[File:...]] syntax. Or perhaps __PRESENTATIONAL__ on File pages?
- By default the first content image can be considered the 'primary' image for a page, but we may want to consider a method of letting people set another image as primary.
- The api should expose this information on parse or in another part of the api, for 3rd parties to make use of.
- We may want to expose this in a special page interface so that it's easy to find out when a page has presentational images defaulting as content.
- This information should be easily available to an extension so that for example an extensions that makes lists of pages can use the primary image information to display article images instead of titles or content.
- We should probably add the presentational boolean to whatever api we use to add images to the parser output, this way any extension hooking in to add additional images in a programmatic way can easily start marking images as content or presentational.
- OpenGraphMeta (or perhaps the OpenGraph exporting data could be moved right into core) could use this information to output a relevant set of og:image data for 3rd parties to extract and use when presenting something like a summary of the page.
- Add a parser function that outputs the primary image for the article who's title is given to the parser function.
- In OpenGraph metadata export (whatever ends up being the final thing doing that job) consider outputting a special dedicated og:description (probably note the same as meta description) for diff pages:
- The current summary that gets auto extracted by 3rd party services for diffs look something like this: https://plus.google.com/u/0/100207479247433590565/posts/6UaWiuGZPZa
- For a diff page we may consider a summary more like: "Revision as of 05:58, 28 October 2011 by Dantman; Revision comment: ..."
- We'll probably want permalinks to just output the normal content in the description instead of output about the change.
Write a simple bash script that will download php 5.4, ./configure it with a local --prefix, start up a php built in cli-server with a nicely setup router, and use mw's cli based installer to install a standalone dev environment with common defaults.- Rewrite or improve the search suggestions code:
- Take the code for vector's simple search and mwsuggest and either replace both or drop one and make it so we only have one suggestion engine to use across all skins.
- Allow extensions to hook into the suggestion code and add custom suggestions and re-order them.
- eg: For a wiki about movie series or television series a search suggestions extension should be able to key in on search suggestions for articles that are a series or movie and reorganize the results to group them at the top and then modify the suggestion to add in things like a tag below the name saying it's a series or movie (or maybe a group header) and some cover art to the left of the suggestion.
- Add a JS feature to Special:SpecialPages that allows for users to start typing into the box and have matching special pages highlighted so that they can be easily found and tabbed through with the arrow keys then one can use enter to go to the selected page (like this File:SpecialPage Filter JS.png)
- Changes to
- Add a new variable to the MediaWiki messages files that will define the lang="" code to use inside the browser (useful for special languages like simple). (Also add a function to the language class to return it)
Make sure to use wfBCP47 on all language codes outputted to the browser.Make sure that wfBCP47 handles the special case casing for 2 char, 1 char, and 3/4+ chars.
Add hooks that will make MediaWiki use a different file for loading the messages / class of a language (for custom extension based private languages).- Add a hook to the i18n Names.php that will let extensions define new language names.
- Build and abstract link matching filter api and interface.
- Core and extensions can define lists of url matching filters with a name. This would be used by the image embed whitelist and the spamblacklist url filter.
- An abstract api would give access to this interface. Something like
LinkMatchFilter::factory( 'spamblacklist' )->matchIsPresent( $url ); - This would include a special page that would allow the user to edit the list.
- The interface would either give multiple possible types of matches, domain match, regex, etc... or some sort of intelligent match editing interface that would allow for simple matches to be expressed intuitively.
- PageLayouts
- Write an extension to embed the Ace editor:
- Enable it on .css and .js pages.
- If possible see if it can be enabled somehow for <source> tags.
- Look and see how it can be integrated into the Lua extension for editing.
- See if a WikiText mode can be implemented (this may help with that <source> idea, perhaps a bit of Lua too).
- If implemented, be sure to push it upstream.
- Make sure this is optional, something that can be toggled on and off.
- See if the WikiText mode display can be less code like, since when editing WikiText you don't think of it completely like code.
- Look at integration into WikiEditor so it'll be WMF compatible, we can have both a nice looking editor chrome and editing area, and we can have a nice toggle button built into WikiEditor's toolbar.
- Examine references.
- See if there is a general intuitive way to input citations. (Visual editor ideas too)
- See if there is some sort of microdata format we can use for output.
- Use itemref in a link to actual microdata connection?
- Think of how we can swap normal references sections for inline things, other kinds of popups, etc...
- MWCryptRand
Tim says php will buffered read 8k, not 4k. It seams that reading from /dev/urandom can also starve /dev/random. Use stream_set_read_buffer if available added (5.3.3) to read as little as possible. If it's not available then read the entire 8k instead of reading 4k and discarding the other half of the buffer.- Try adding EGD support (http://egd.sourceforge.net/)
- Try checking for the /dev/egd-pool and look at where else OpenSSL might look for it (if it's not a /dev then we should try and double check it is in fact a random stream; If we can't check that it's a unix socket and not a real file then we can at least open it 2 or 3 times, read a little from it, and then compare the output to ensure that none of them are the same (to ensure we're not reading out of a static file in a dangerous way)
- Add a $wg variable to allow the path or host:port to an EGD socket to be defined, eg: $wgEGDSocket = "localhost:7894"; or $wgEGDSocket = "/home/foo/.gnupg/entropy";
- If there's a common egd config path we might consider trying to read the config
- Consider opening /dev/random if /dev/urandom is not available (careful with it though, it can block, etc...; make sure to use stream_set_timeout to avoid blocking waiting for randomness)
- Look into mcrypt_create_iv. The MCRYPT_DEV_URANDOM may or may not be useful, but definitely take MCRYPT_RAND into consideration. Though note that on 5.3+ it seams that URANDOM works even on Windows. Figure out what's really going on in the background.
- (Tim) On Windows try shelling out to vbscript to access Window's random pool.
- Deeply examine srand.php's technique looking for clock drift to improve on drupal's technique.
- Change Special:Preferences so that you don't manually input the watchlist token anymore.
- Drop the random token generated on each preferences page view.
- Change the watchlist token field into a read-only input or div (<output>?)
- Follow the watchlist token field with a checkbox to reset the token
- When resetting consider just blanking it. Remember that Special:Watchlist automatically generates it when you need it.
- Write a tool that lets you go through all the domains used in external links on the wiki and individually decide if links should be follow or nofollow.
- Consider expanding this into a generic link management tool:
- Let users control nofollow.
- Let users blacklist/whitelist from a ui.
- Consider expanding this into a generic link management tool:
- Create a system for vouching for other users; instead of having to use a top down hierarchy of control like in ConfirmAccount or restricting edit to groups.
- A user's account is on hold when first created and they cannot edit.
- Another user on the wiki that may know them may vouch for them. At which point they gain the ability to edit.
- A user who vouches for a lot of users a large percentage of whom get banned for bad behavior loses their ability to vouch and potentially ends up automatically banned themselves. (ie: Spammers who try to create spam sockpuppets get banned themselves. But a user who accidentally trusts one spammer out of a number of good people doesn't get banned.)
- Some amount of rate-limitting may also prevent spammers from easily gaining a number of accounts. Perhaps you may also require a period of time or even multiple vouches before you are allowed to vouch for other users.
- Users can un-vouch for users, so if you begin to think someone is untrustworthy you can un-vouch for them. Doing so may lead to them losing the ability to edit (as well as anyone they vouched) for. However this is based you being the only person vouching for them. An established member of the community may have multiple people vouching for them. If one starts to distrust them and un-vouches for them they still have rights since the other members of the community trust them. While a spammer who only vouches for their sockpuppets will lose rights on all accounts once the people that vouched for them stop vouching for them. Unvouching for a user may still leave some logs behind. That way a spammer cannot get around vouching by unvouching for their spam accounts after they get banned.
- Add an API for getting things like the upload url and login url. These are a complete mess to get currently. Add in a way for extensions to hook in and change the url.
- Write a FUSE module that exposes a
Kyoto Tycoon (perhaps even Memcached) serverEhCache as a caching filesystem. Such a thing could be used for thumbnail storage alongside a 404 handler to allow unused files to disappear and not take up space. - Write an auth system that uses user certificates.
- Use <keygen> to generate the pubkey and privatekey.
- Sign and return a certificate to the user. The authority isn't important, each server could just have it's own self-signed certificate.
- Don't bother with CA auth on the server. Instead just save each pubkey in the database and use it to do authentication.
- Allow multiple certificates (named too) and the ability to revoke them.
[edit] Infrastructure
- Try Kyoto Tycoon in place of Memcached; Ability to have a disk based cache.
- wiki host experiment; Replace MongoDB with a combination of Riak (for config document and whatnot storage), Memcached (or Kyoto Tycoon) for performance fetching of that data, and MySQL for indexing the stuff that ends up as lists in management interfaces rather than easy key-value lookups. Riak has MapReduce, but it's probably not what I'd want to use for those lists. However since it's possible for MySQL do end up with a data desync, we can probably use Map Reduce when we need to re-populate the data in MySQL. (Note: Perhaps I should make sure management UIs bypass the cache and always fetch from Riak)
[edit] Small code changes
- Consider moving related user and related title out of Skin and into either OutputPage or RequestContext (Actually perhaps the page/viewer)
- Update special pages to use context properly
Add a new hook inside of RequestContext::getSkin() to allow extensions to override what skin is loaded in certain contexts, such as different skins on certain pages, or a different default skin for admins.- Stop array_reverse-ing the personal, namespace, and view link lists in vector and flip the float instead. Reversing them creates incorrect orders in text browsing, screen readers, and when tabbing through lists.
- Tweak installer to use WikiPage instead of Article.
Reserve the skin name 'default' and make it so that &useskin=default always uses the $wgDefaultSkin.- Implement rel=next and rel=prev support on most paginated pages (AllPages, query pages, categories). Consider rel=index or something for the instance where there are so many pages we display an index.
- mediawiki.special.search.js's autofocus should be in a common module.
[edit] ToDo / Fixme
- rel=archive is still not present in history links in Vector, recode Vector to stop using it's attributes key and use makeListItem.
- re-introduce pre style changes (without overflow changes). Put the new style in commonElements and the old style in MonoBook.
- makeListItem should support 'tag' => false?
- Fix emails so that page deletion and user rights changes don't use "[...] page [...] has been created by [...]" as their subject.