User:Dantman/Code Ideas

Just a random spot for code ideas I feel like working on and may or may not get around to:
 * Write a TorBlock like extension for http://stopforumspam.com/
 * Keep in mind dnsbl.tornevall.org should also be used to lessen load.
 * Consider implementing an alternate form of dnsbl which like TorBlock can be enabled, only block people selectively, and can also be enabled initially with blocking features disabled so you can look at rc tags to understand what set of edits will actually be blocked before enabling full block.
 * Help improve Extension:MediaWikiAuth.
 * Try adding a degree of mobile support to MediaWiki. ie: Try to get as much of the Wikimedia mobile stuff working in core, without requiring a full extra ruby server.
 * Manual:Gallery of user styles/iWiki is the mobile friendly skin I was talking about lastnight in IRC when that GSOC disucssion came up (if you were there, I can't remember), there also appears to be Manual:Gallery_of_user_styles, But I don't know how well either of those functions. Peachey88 07:37, 3 February 2011 (UTC)
 * Add iPad support to monaco-port (issue: requires a donated iPad).
 * Write a querying framework for MediaWiki?
 * A lot of our core and extension code writes a lot of ugly sql queries and table joins, which are dependent on table structure. Which of course gets in the way of any large changes to the db because piles of simple extensions now depend heavily on the database structure.
 * Write an abstract api for &action='s and article types.
 * Right now we make various assumptions when it comes to actions. We usually just check . However this of course has oddities with things like view/purge/empty and edit/submit. It's not a very good way to do things, and then there are diffs. So it would be good to have an abstract api to declare what type of page is being displayed and allow skinning to call it so it knows how to style the page.
 * Write a diff/rdiff based dump format for page text. An exporter and importer, so that hosts that use external storage can dump text as well as a database dump in a way that the customer they are exporting for can actually import it somewhere else.
 * The format would probably be an xml file containing elements with diffs for contents (to be more space efficient) and an attribute marking what text id they were associated with in the db (and whatever extra fields were necessary).
 * The import script would be run after you do a database import, it would run over the text dump finding revisions using the text id importing the text into whatever text/external storage is configured for the wiki (by default the text table) and updating the revision rows to use the new text id.
 * Write a built-in job queue based replacement extension, supporting patterns, and selective replacement.
 * pywikipedia is too complex for most people, it's also annoying to setup a client side bot, it would also be easier if you just gave the wiki a list of replacements, it found all the things it would replace, and gave you a quick way to peruse the diffs in an easy to read webpage so you could give your ok.
 * ReplaceText is too simple, no complex patterns. Additionally it does ugly things with SQL in a MediaWiki bad practice which makes it incompatible with any method of trying to efficiently store a wiki's content (ie: breaks on compression or use of external storage).
 * A formal extension definition file. Right now there is no up front specification of what it takes to setup an extension, a file explaining this (and doing it better than just saying to run update.php or some install.php script) would be useful for future and my own wiki farm experiments. This should detail the steps for setting up a database and whatnot in a way that update.php or an extension specific script must be run. At the very least the tasks should be done in an abstract enough way they can be imported into some other specialized piece of code specific to the installer that knows how best to do things. Should really be abstract enough that even a wikimedia level pattern of installation can be started automatically from the extension definition without a bunch of manual sql work.
 * A anti-spam network daemon (lisp?). Various wiki from different groups tie in to a daemon and submit hashes of content that is being created on the wiki. When the daemon sees the same content being added to a bunch of different wiki it starts to warn other wiki to be more strict about allowing that data (presumably spam) to be posted. Via requiring captchas, requiring autoconfirmed, some other method, or just flat out saying "Hey, this looks like spam going to a bunch of wikis, you'll have to come back later to post it."
 * Implement support for Solr as a search engine in an extension. Solr supports query-suggest now, shards, and updating on the fly. It's not as complex to setup as lucene-search, or at least it's a standard daemon built by a project focused on that. Solr could be the best alternative to Lucene-search for general use rather than Sphinx
 * Sphinx is not compatible with revision compression or external storage because it accesses the sql directly
 * Sphinx does not support updating the index live, ie: you can't update the index on-save so changes show up instantly in the searches
 * The SphinxSearch extension's query suggest is really a spell check run through aspell/pspell, this is a bad way to do query suggest. Solr's query suggest should be based on the index properly (though it might not be as good as Lucene-search's as rainman speculates, it's still better than SphinxSearch's)
 * SphinxSearch's search interface is out of date.
 * Solr also supports 'cores' to separate indexes, it's probably easier to setup in environments with multiple wiki that want to share a search daemon.
 * Build a website and web spider which accepts urls of MediaWiki installations — and also tries to track them down independently — and looks at the wiki's installation and gives it a rating:
 * The rating will be a 5 star rating, a series of "common bad setup" checks will be made and a wiki with none of them will get a 4/5 (or maybe just 3/5), the last point will be awarded based on a series of checks for extra things added to some good quality installs.
 * (-) Using a major version which is not supported anymore, perhaps even more points off for using extremely out of date installs.
 * (-) Within the major version installed not being up to date with most recent security releases for that major (in addition to major version checks, so running 1.15.1 would be two points off).
 * (-) Urls with index.php in them. 1.5 for those with index.php?title= in them, just .5 for those with index.php/
 * It may be a good idea to search the wiki for spam, but that's more intensive than the checks planned for this setup. Perhaps a spider to track down abandoned wiki with heavy spam would be a separate worthy project.
 * A special format for dumping users. Ideally this format would be encrypted in some way. ie: Some way to encrypt each user entry by their own password hash, so only the user themselves can decode the data... though that's perhaps a little bit of a pipe dream, don't know how useful that is.
 * A tool to allow you to paste a blob of text, a url, or a reference to a page on a wiki.
 * The tool would scan for any urls in the blob of text and check them against various public blacklists. This can help you determine if adding certain public SpamBlacklists would help with your spam.

Infrastructure

 * Try Kyoto Tycoon in place of Memcached; Ability to have a disk based cache.
 * wiki host experiment; Replace MongoDB with a combination of Riak (for config document and whatnot storage), Memcached (or Kyoto Tycoon) for performance fetching of that data, and MySQL for indexing the stuff that ends up as lists in management interfaces rather than easy key-value lookups. Riak has MapReduce, but it's probably not what I'd want to use for those lists. However since it's possible for MySQL do end up with a data desync, we can probably use Map Reduce when we need to re-populate the data in MySQL. (Note: Perhaps I should make sure management UIs bypass the cache and always fetch from Riak)