User:Omegat/Project Proposal for OPW

From mediawiki.org

Contact Information[edit]

Name : Manpreet Kaur

Email : manpreetkaur9411@gmail.com

IRC : maverick_

Gerrit : username - Maverick

Location : BITS Pilani, Goa Campus, India

Education : 3rd Year Under Graduate, Electrical and Electronics (B.E.) at BITS Pilani

About Me[edit]

I am Manpreet Kaur, an Electrical and Electronics final year student. I am very keen on being a part of the open source culture. I heard about the OPW program two years back from the members of the IEEE group of my college. I have always been inspired by the open source culture and deeply wish to be an integral part of it. The application process of OPW has given me a good starting experience for contributing to an open source project. Besides WikiMedia, I also wish to contribute to one of the science libraries of the Python Foundation as I am deeply motivated by science and math. My academic interests are Artificial Intelligence and Quantum Computation. I have had prior experience in using Python's open source library sci-kit learn for machine learning related purposes.

Experience with FOSS projects[edit]

I have had previous experience of using products of various FOSS organizations like the Python Foundation (numpy, sci-kit learn), Debian, Ubuntu and Red-Hat linux. I have also worked on an open source development board called Arduino for communication and control purposes.

Experience with MediaWiki[edit]

I became interested in contributing to MediaWiki after attending a WikiMedia workshop where I was first introduced to software bots that perform automated tasks on WikiPedia. Motivated by this, I explored more by setting up the environment for pywikibot-core and compat and locally running some scripts. I enjoyed this and hence, chose to do a MediaWiki project for OPW. I find MediaWiki a very active and enthusiastic organization where it is very easy to exchange ideas. I feel very proud to be working for an organization which has an outreach to millions of people around the globe.

I found the MediaWiki documentation for beginner's support like local setup and making a contribution easy to understand. Manual pages of most PyWikiBot (PWB) scripts also exist and are very helpful. The other contributors of PWB are always willing to help on IRC and the mentor I approached guided me throughout my contribution phase with great enthusiasm. It was a great experience to make a small difference to MediaWiki.

Contributions to MediaWiki[edit]

https://gerrit.wikimedia.org/r/#/c/166948/ (merged) - Modifying the file watchlist.py to use CachedRequest (introduced in core) for caching. watchlist.py had it's own caching process which was replaced by CachedRequest so that MediaWIki API responses are cached in a uniform manner.

https://gerrit.wikimedia.org/r/#/c/168029/ (not merged) - Created family file of ProofWiki

https://gerrit.wikimedia.org/r/#/c/167831/ (not merged) - Created family file of ReutersWiki

The suggested micro task for creating a family file for WeRelate was not done as WeRelate does not have an API. I checked with other PWB developers and they couldn’t find the API either. Thus WeRelate is an interesting case to solve later in the project. Hence, after discussion with the potential mentor, WeRelate was replaced by ReutersWiki and ProofWiki.

Interested Project[edit]

Project Idea[edit]

PyWikiBot currently supports only a few wiki projects. At the end of this project, the benefits of automation of tasks by PWB will be provided to all MediaWiki sites, non-MediaWiki wiki sites and non-wiki sites. I have also proposed to add support to a specific wiki engine (Depending upon feasiblity and popularity. To be decided after IWM project) and XML-RPC. Further analysis has been done in the following sections.

Project Description[edit]

WikiMedia is an online, open-source organization that aims to bring free educational content and information to the world. Wikimedia has various projects and chapters and hosts millions of web pages. To automate and maintain a web-base of this size, several WikiMedia projects (The WikiMedia Foundation Projects) use MediaWiki, an open-source wiki package. Pywikibot (PWB) is a MediaWiki tool written in python that automates various tasks such as categorization of sites or making API calls on MediaWiki sites. However, MediaWiki doesn’t cover all wiki projects. These projects are listed in the InterWiki Map. My project goal is to provide support for all sites on the InterWiki Map (IWM) so as to extend Pywikibot’s functionality to these wikis.

Further, more implementation can be added to the new Site classes created for sites listed on the IWM list. This means the benefits of the new Site class will be extended to any other website running on the particular Wiki Engine. Also, there is an interface called XML-RPC which has been implemented by many wiki applications. This allows one to perform interesting tasks with great ease. Extending support to this specific interface will allow one to access wiki from other applications and enable testing on multiple wiki engines. I find this quite interesting and have included it as a part of my project.

Project Mentor[edit]

A possible mentor for this project is John Mark Vandenberg. He is also the official mentor for this project listed on the Project List page: https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Experimental_wiki_engine_support. I have been in constant touch with him and he has guided me very well throughout the course of application process.

Implementation Plan[edit]

The project requires to include support for all sites listed on InterWiki Map into PyWikiBot. Most of the sites which are already powered by MediaWiki can be imported into the PWB framework by creating its specific Family class. However, there can be many challenges for this task like the site might be running on an older version of MediaWiki. Thus, PWB needs to be made backward compatible for sites running on older versions of MediaWiki.

Non-MediaWiki sites listed on the IWM can be other wikis running on a non-MediaWiki engine (like Py Wiki or Moin Moin) or nonwiki websites meant to support wiki (e.g. Bugzilla – for bug tracking, Gerrit for code review). There can be many compatibility issues with these sites as they don’t have a Site class for PWB to interact with or these sites might be powered by some other wiki engine. Hence, as a starter, a script must be written to detect which engine the site runs on. Full fletched support for these sites is not possible as they are non-MediaWiki sites and differ in functionality. Thus PWB needs limited but crucial information about these sites.

Also, each non-MediaWiki wiki site on the IWM needs to be instantiated as an instance of a separate Site class. Many pointers must be considered while making the Site instance of the sites not powered by MediaWiki. Few such are:

  • Raise NotImplementedError for attributes which the wiki engine supports.
  • Raise NotSupportedError for attributes the wiki engine does not support.
  • Create a hierarchy of classes of the wiki engines used by the site.
  • Sites with multiple entries under different names should be instantiated only once.

These standard tasks can be automated by a unittest metaclass. It is however very likely that the PWB framework doesn’t support the sites in it’s current format. Depending upon the test results, modifications to PWB will be made. One way to make these non-MediaWiki sites even more compatible is to extend support to the wiki engines themselves. The second half of my project proposal is to add basic support to major wiki engines which is conceptually similar to MediaWiki. This can be done by adding more features to the Site classes created in the IWM project. The support would also include supporting alternate wiki syntax and allowing transfer of content between wiki engines. This is highly related to the InterWiki project and will encompass the purpose of extending PWB functionality to more Wiki projects.

I would also like to mention that I understand the importance of documentation and a well-documented project enables more clarity and easy review and maintenance of code. Hence, I will keep documenting my code as and when I write it.

Timeline (December 2014 - March 2015)[edit]

Community Bonding Period[edit]

1 Nov – 12 Nov

  • Familiarize myself more with the community and code.
  • Take advice from the community members as to what additional features of should be added in the Site class.
  • Read and understand the relevant existing code of PWB.

Week 1 – Week 2[edit]

13 Nov – 28 Nov

If accepted, I would like to get a headstart by starting early.

Goal : Add support for all Mediawiki sites.

  • Create the site’s Family class.
  • Run the python test suite on each MediaWiki site, skipping tests which are not relevant to the site.

Week 3 – Week 5[edit]

14 Dec - 4 Jan

Goal : Add support for all the non-MediaWiki wiki sites.

  • Write a distinct Site class for each non-MediaWiki site.
  • For each distinct Site class created, modify them to raise the errors mentioned in the Implementation Plan.
  • Add features after discussion with mentors.

Week 6[edit]

5 Jan – 11 Jan

Goal - Add support for all non wiki sites.

  • While non-wiki sites will not require any functionality to be added to PWB, there are many non-wiki sites on the IWM. It will be time consuming to analyze each of them.
  • Generate a document listing all the sites on IWM and describing their wiki engines, version, etc.

Week 7 – Week 9[edit]

12 Jan – 1 Feb

Goal - Add basic functionality for Non-MediaWiki wiki sites.

  • Discuss with mentors about various wiki engines that can be added and advantages or difficulties in adding them. Choose an engine which is most widely used by the sites listed in the IWM.
  • Two basic functions to be added for the export functionality are:
    1. Get list of all pages
    2. Get page wikitext
  • Add more implementation to the new Site class after discussion with mentors.

Week 10 – Week 12[edit]

1 Feb – 28 Feb

Goal - Add support for Wiki RPC-XML

Week 13[edit]

1 March – 7 March

A buffer week left for debugging, testing and refining documentation.

Evaluation Period[edit]

8 March - 9 March

PS: I also want to make a brief note that documentation will run parallel to the codes written for the week.


Project Preparation
[edit]

I would like to assure that I will be able to devote 40-50 hours per week for the project since I have no other major commitments which will disturb my timeline during the internship period. I will only be occupied in the first week of the official project period i.e. 9 – 14 December for which I have compensated in the third week of November.

Also, I have had many discussions regarding the project with my mentor and have taken his consent regarding timeline and other specific details of the project. To become more familiar with the project, I will utilize the time before the official project period begins.