User:Nischayn22/Gsoc

Identity
Name:  Nischay Nahata Email: nischayn22@gmail.com Project title: Green SMW

Contact/working info
Timezone: Asia/Kolkata UTC +5:30 Typical working hours: to  ( flexible ) IRC handle(s): Nischayn22 Skype name: nischay.nahata gtalk nick: nischayn22

Project summary
This project is about increasing the performance of SMW as suggested by Markus Krötzsch. Badon is also interested in this project and has promised to help. Semantic MediaWiki is about making extensive use of the data in a wiki such that it can be processed and queried dynamically, but at present it has a few performance issues when integrated with huge projects like Wikipedia. Here are a few performance degradation issues reported with SMW

My proposal is to improve the performance by making efficient use of the database and implementing caching techniques in places where it is missing now This can be achieved in steps explained in the deliverables.
 * Some Special pages require expensive database queries, such queries put extra load on the servers.
 * Frequent writes to the database ( writes occur even if nothing has changed )
 * RDF exports and RSS feeds are frequently generated by Users. Feed generation require many string concatenations in PHP which are not efficient.
 * Each time a page is viewed SMW issues many read queries, especially if further SMW extensions are used. Pages that display large query results or use complex templates can lead to hundreds of database queries leading to overall slow performance.
 * SMW database was designed with varying number of DataItems in mind, but later the number of DataItems were fixed at 10, the database design thus needs few refinements.

Required deliverables
1. SMW does not perform database Write operations when no properties are changed.
 * Hashes of all Insert SQL queries are stored in a table, before performing any query we do a hash and compare with the initial hash from the table, if different we perform the write and rewrite the hash value in the table.( Careful! we might have done a Delete query already. )

2. Expensive Special Pages are identified and Cached
 * This can be done in two ways using memcache or implementing MW's Query Page

3. Highly polled RDF and RSS exports are cached
 * Exports that are polled with high frequency are cached using MW's file caching techniques.

4. Special Page to allow the Admins to assign cache expiration time for different RDF and RSS exports.

5. Pages that have large inline queries or complex templates are identified and cached using memcache.


 * Complexity of a query is determined using a new cache variable specified in LocalSettings.php

6. Measuring Performance and Documentation
 * Profiling all the Pages where caching is implemented to measure performance
 * Document the new variables used in Localsettings.php to enable/modify caching features that will be used.

If time permits

 * If time permits I plan to add Caching mechanisms to Semantic Forms/Semantic Drilldown Extension as well.

Project schedule
This project has a lot of scope and the schedule is most likely to change a bit.
 * Till April 25 Identify the places in database design and implementation fully
 * I have my exams till May 7
 * May 8-20 : Design of the database and cron job implementations after open discussion with mentor and other developers
 * May 21 onwards : Do the coding :)

About you
Hi! My name is Nischay Nahata, an engineering student from India. In a year I will complete my B.tech in Information Technology. My code is generally written in C and PHP (also trying Python now). I am not new to FOSS anymore, my love of PHP brought me here more than a month back, and since then I have spent most time coding for MediaWiki. This serves two purpose for me, getting to work with experienced developers from different parts of the world and contributing to Wikipedia with code. In recent days I have been highly influenced by SMW's idea of a Semantic Wikipedia. Therefore, I also look forward to contribute to Wikidata in future.

Participation
I have been working for more than a month now with MediaWiki, and am familiar with the various modes of communication used here.
 * I am mostly online on IRC,Skype and Gtalk ( I usually try getting answers myself, otherwise try IRC and if this fails I try asking specific developers )
 * I plan to use github to maintain the code so that my mentor can keep track of my work and post reviews
 * I have already started to blog my daily activities at greensmw.wordpress.com

Past open source experience
My Open-Source journey started at MediaWiki itself. I have made an extension(still not usable) and added two new formats to Extension Semantic Results Format. Besides, I have also solved bugs and enhancements and have also been communicating with other developers. Here is a gist of all of my work for MW User:Nischayn22. I am good at understanding existing code and currently going through the core code of SMW Extension family