Extension:BayesianFilter/GSoC 2013/Project updates

Monthly Reports
June


 * Investigated for data sets corresponding to spamming on wikis, downloaded the data set of STiki, analyzed it if can be used as a training data set. Unfortuantely it didn't work out as the STiki labels vandalism and not spam
 * Investigated the Mediawiki API and its control flow for developing my extension.
 * Thoroughly examined the SpamBlacklist extension and how it works, as it is very close to my extension.

July

Till Now


 * After discussion on IRC and with Chris, decided to make an extension that registers spam rather than a gadget.
 * Created the skeleton for BayesianFilter extension, that as of now registers the reverted edits.
 * Studied the database access and implemented functionalities for registering of undo and rollback edits
 * Implemented a checkbox "Mark this Spam" beside "Watch this page" for undo actions.
 * The source code can be found here https://github.com/anubhav914/BayesianFilter.
 * Made changes to the reverted_edits table and the code as suggested by Platonides
 * My earlier plan was to get the data and build the training model, but STiki data did not work, and it took time in writing the data gathering extension so I have made the basic skeleton of the extension of how it will look.
 * I have added the functionality to checkSpam, which cleans the text and then calculates bayesian probability of spam in it.

Future Plan of Action
 * Plug the extension into Mediawiki
 * Gather Training Data an sample(huge task) the data for effective training model