User:Nischayn22/Gsoc

This project has now been completed. Expect results to be in SMW 1.8. For the progress see [results], to view my commits see Gerrit.

Identity
Name:  Nischay Nahata Email: Project title: Green SMW Project Blog: greensmw.wordpress.com

Contact/working info
Timezone: Asia/Kolkata UTC +5:30 Typical working hours: to  ( flexible ) IRC handle(s): Nischayn22 Skype name: nischay.nahata gtalk nick: nischayn22 Phone: +917676568898

Project summary
This project is about increasing the performance of SMW as suggested by Markus Krötzsch .. Semantic MediaWiki is about making extensive use of the data in a wiki such that it can be processed and queried dynamically, but when used with wikis with large contents it has a few performance issues see discussion Here are a few performance degradation issues with SMW

My proposal is to improve the performance by making efficient use of the resources and implementing caching to reduce SMW's energy consumption. This can be achieved in steps explained in the deliverables.
 * Frequent writes to the database ( writes occur even if nothing has changed ).
 * Some Special pages require expensive database queries, loading them takes lot of time.
 * RDF exports and RSS feeds (frequently generated by Users) require many string concatenations in PHP which are not efficient.
 * Pages that display large inline query results or use complex templates can lead to hundreds of database queries leading to slow performance.
 * SMW database was designed with varying number of DataItems in mind, but later the number of DataItems were fixed at 10, the database design thus needs a few refinements.

Deliverables
This project has a lot of scope and so more things can always be added (see If time permits section).

Required deliverables
1. Hash table to validate if write queries are new.

2. Expensive Special Pages are identified and cached

3. Special:ExportRDF identifies queries that are frequent and uses cached data.

4. Special Page to allow the admins to control caching for different RDF and RSS exports.

5. Pages that have large inline queries or complex templates are identified and cached using memcache.

6. Improvements to SMW's accesses to the database.

7. Profiling and documentation.


 * Profiling all the Pages where caching is implemented to measure performance.
 * Document the new variables used in Localsettings.php to enable/modify caching features that will be used.

If time permits

 * If time permits I plan to add Caching mechanisms to Semantic Drilldown Extension as well.

Community bonding Period

 * Identify all Special Pages that are expensive
 * Figure out the common performance pitfalls of SMW with input from its users ( Wikia Inc. etc)
 * Analyze the problems with the outdated database and simplicity required, discuss with Markus and Jeroen.

Coding Period

 * Milestone 1 (1 week) Implement a table to store hash values of queries, before performing any query we do a hash and compare with the initial hash from the table, if different we perform the write and rewrite the hash value in the table.( Careful! we might have done a Delete query already. )
 * Milestone 2 (1 weeks)Implement caching in expensive Special Pages.
 * This can be done in two ways using memcache or implementing MW's Query Page


 * Milestone 3 (2 week) Highly polled RDF and RSS exports are cached
 * Exports that are polled with high frequency are cached using MW's file caching techniques.


 * Documentation (1 week) Clean up code and write documentation


 * Milestone 4 (2 weeks) Implement algorithm to identify complex inline queries and complex templates used and caching these queries.


 * Milestone 5 (1 week) Implement Special Page to manage specific cache settings.


 * Milestone 6 (3 weeks) Re-design/Improve code that accesses the database with inputs received from mentor and other developers.


 * Testing and Profiling (1 week) Implement profiling and adding documentation wherever needed.

About you
Hi! My name is Nischay Nahata, an engineering student from India. In a year I shall complete my B.tech in Information Technology. Most of my code is generally written in C and PHP ( also trying Python now ). I started to work on FOSS about a month ago when my search for internship and love of PHP landed me here, and since then I have spent most time coding for MediaWiki. This serves two purpose for me, getting to work with experienced developers from different parts of the world and contributing to Wikimedia with code. In recent days, I have been highly influenced by the idea of a Semantic Wikipedia (concept of Wikidata), wherein we can not only search but also query Wikipedia. Hence I chose this project, it will let me know the SMW code better and therefore help me to contribute to both SMW and Wikidata in future.

Participation
I have been working for more than a month now with MediaWiki, and am familiar with the various modes of communication used here.
 * I am mostly online on IRC, Skype and Gtalk ( I usually try getting answers myself, otherwise try IRC and if this fails I try asking specific developers )
 * I plan to use github to maintain the code so that my mentor can keep track of my work and post reviews.
 * I have already started to blog my daily activities at greensmw.wordpress.com

Past open source experience
My Open-Source journey started at MediaWiki itself. Till date I have made two extensions, added two new formats to Extension Semantic Results Format and added caching to a few Special Pages of SMW. Besides, I have also solved bugs/enhancements and have also been communicating with other developers. Here is a gist of all of my work for MW User:Nischayn22. I am good at understanding existing code( PHP ) and currently going through the core code of SMW.

Any other info
Markus and Jeroen are possible mentors. Badon is also interested in this project and has promised to help.