Extension:Smart Index

The Smart Index extension allows the user to create an index on any desired wiki page. It consists of a parser tag, which is used where the user wants to create the index, and a special page, which is used to set up and update the associated database tables. After installation and before the use of the parser tag, the index words should be updated from the special page. This creates a database table in which the index words are stored.

The user can optionally enter characters he or she wants stripped from the output. The following characters are removed by default: #.!'[]{}*,"=; -?†—:“”„ in addition to new line. If additional characters are to be removed from the output, I recommend including the default characters as well.

The user also has the option of adding a list of words that should not be included in the index - "stop word." The user creates a wiki page under Namespace 0, which is likely the default, and lists the desired stop words separated by new line characters (think "Enter" key). The user then enters the name of this page in the.

Stop words can also be generated based on the frequency of words in the wiki. The advantage to this approach is that the user does not have to continuously update the list of stop words. Words that have become so common as to be no longer important will no longer be displayed by the index after updating the database. Currently, Smart Index allows the user to choose from simple frequency (the number of times a word appears in the wiki) and inverse document frequency (based on the number of pages on which a given word appears). The easiest approach to finding a threshold value for stop words is to first generate the index as a table and then sort by the word's "score." Likely stop words will be clustered together on the table.

Tag Parameters
The following parameters can be used with the parser tag: