Extension:DidYouMean

DidYouMean is an extension under development by Hippietrail that adds links to articles with "similar" titles, where what is similar can be decided by a PHP function.

History
The English Wiktionary has long used the templates  and   to link articles which differ only by capitalisation, use of accents, or nonletter characters such as hyphens, apostrophes, and spaces.


 * ivy / Ivy; faith / Faith; frank / Frank
 * façade / facade; café / cafe; naïve / naive
 * showoff / show off / show-off

Automation
The extension will look up its extra database tables under three conditions:


 * 1) A page is displayed but other pages have similar titles and might be the one the user really wants.
 * 2) A page doesn't exist under the exact title given by the user but others with similar titles do.
 * 3) Search didn't find the exact title the users entered but similar titles exist.

Interface

 * The output will initially be designed to mimic the English Wiktionary's output for its template.
 * If a page has a  template, any items listed there will also be output. This allows adding redlinks for titles the similarity function does not yet know about.

Design issues

 * 1) No hook yet exists for condition #2 and there are three places in the code where the   message is displayed so I'm not sure where to add such a hook.
 * 2) Metadata must be tracked whenever a title enters or leaves the scope of the DidYouMean extension and whenever it changes names. Hooks specifically designed for metadata tracking do not yet exist and their are issues with the hooks that I have used.

Similarity
There are many ways in which article titles might be considered similar apart from what matters to the English Wiktionary.


 * Soundex and Metaphone are well-known algorithms used for finding words based on their pronunciation.
 * Textonyms: book, cool, cook, conk, amok, and bonk are all entered into a telephone keypad as 2665.
 * Anagrams: coordinate, decoration, carotenoid are all spelled with the same letters.
 * Stemming: work, works, worked, working; and perhaps worker and workers etc.

Particularly Soundex and Metaphone would be useful on all wikis including Wikipedia. Others might only be of interest to some Wiktionaries.

With a little more work it should be possible to add full spellchecker type support such as missing letters, extra letters, and transposed letters. This would certainly be useful for all wikis.

Ready for testing
Brion Vibber has kindly hosted a development Wiktionary for testing extensions at http://wiktionarydev.leuksman.com/

Please go try it out and report any bugs. Please check for corner cases.