Google Summer of Code/2021

Read information for participants  Read information for mentors

Program timeline
Also see the full timeline.

Ideas for projects
'''Watch this space for project ideas! We will add ideas below between now and when the application period opens.'''

Add autocompletion to Page Forms spreadsheet display
The Page Forms extension provides a spreadsheet-style editing display in two places: in the page Special:MultiPageEdit (for editing many pages at the same time) and in regular forms, with the setting "display=spreadsheet" (for editing multiple instances of the same template within one page). This spreadsheet-style editing can make use of a number of helpful input types (dropdown, date, etc.) for different cells, but it does not allow - unlike regular Page Forms forms - for doing autocompletion. This project would add this capability.

More details: T273530

Skills required: PHP, JavaScript

Mentors: Yaron Koren, Sahaj Khandelwal

Upgrade WebdriverIO to the latest version for all repositories
MediaWiki is a free and open-source Wiki application. It powers a lot of websites including Wikipedia, Wiktionary, and Wikimedia commons.

Delivery of optimal quality software with unique and innovative features is always a priority. However, without evaluating software components under various expected and unexpected conditions, one cannot guarantee these aspects. Therefore, testing is performed to test every software component large and small. To increase the efficiency of the testing process and to maximize test coverage, automation is imperative. Automation not only improves the quality of testing but also makes it faster and reduces the cost involved.

We use WebdriverIO as a browser driver for our test automation framework. We are currently using version 6 (2 repositories still at v4). WebdriverIO v7 was released on February 9, 2021. CodeSearch: v4, v6

The major goal for the Google Summer of Code 2021 internship is to migrate all repositories to v7

More details: T274579

Skills required: Javascript(Node.js), WebdriverIO

Mentors: Soham Parekh Vidhi Mody

Update the front-page of Wikimedia projects
In 2016, the front page of Wikipedia: www.wikipedia.org, underwent a subtle refresh. The code was moved from a series of scripts on meta.mediawiki.org into a Git repository utilizing Mustache templates and a build step to generate the final HTML page. Unfortunately, Wikimedia’s other projects, like Wikiquote, Wikisource, Wikibooks, etc. we left out of this refresh and their pages are still generated via the scripts on meta.wikimedia.org.

This project aims to convert these pages into HTML templates so that they can run through the same build-step as  www.wikipedia.org.

More Information:


 * Pabricator tag: Wikimedia-Portals
 * Workboard: https://phabricator.wikimedia.org/project/board/1619/
 * Task details: T273179

Skills required: This project is well suited for anyone interested in semantic HTML & CSS and front-end build steps. The specific technologies involved are: Handlebars templates, Less CSS, plain Javascript, Node.js scripts and a build-step powered by Gulp.js. There is no server-side component to this project.

mentor: Jan Drewniak

Develop a web dashboard or a command line tool to help inventory and/or monitor database and backup objects
Wikimedia uses over 200 MariaDB instances to store content and metadata for Wikipedia and other free knowledge projects. It also uses Bacula, mydumper and xtrabackup to perform backups of its hundreds of terabytes of data in its infrastructure. While standard open source tools for both monitoring and automation are used when possible, there are some tasks that require custom development.

A few suggestions for improving existing monitoring tools or developing web frontends for them in order to maintain an inventory and track several objects distributed among the database and backups infrastructure, to choose among:


 * MariaDB instances inventory (zarcillo web interface)
 * Database object inventory
 * Database backup inventory improvements
 * Improve WMF Bacula monitoring
 * MySQL account metadata inventory

More Information:
 * Phabricator task: T274636

Skills required: cli or web development basics, Python (preferred, probably with Django or Flask), PHP, basic databases and SQL/file management knowledge

Mentors: Jaime Crespo aka #jynus (@jcrespo), Manuel Arostegui (@marostegui).

Write MediaWiki userscript tutorial with adventure tour
User scripts are programs written in JavaScript for use on Wikimedia projects by users. User scripts enable the Wikimedia user account to do many things that they otherwise couldn't. Like changing DOM, appending HTML snippets in DOM, and changing Interface according to browsers events etc.

The project is about to create a guided adventure tour on MediaWiki and MetaWiki to give users insight on “How to create userscript on Wikimedia projects”. The adventure will be designed like The Wikipedia Adventure which will have 3-4 missions.

More Information:
 * Phabricator task: T274635

Skills required: HTML, CSS, JavaScript, and jQuery

Mentors: Jayprakash and KCVelaga

Add zooming and panning to the Wikisource Pagelist Widget
The Wikisource Pagelist Widget is an OOUI based widget that streamlines the process of creating a pagelist for new (and existing) users of Wikisource.

Currently, while using the Pagelist widget, the user is presented with the picture of a scanned page and is asked to identify the page number on the scan. However, there is no option to zoom or pan the scanned image inside the Pagelist widget. Adding the option to zoom and/or pan the image will allow users to see the page number for scans that have a very small font, or have lots of text (for example newspapers scans)

During the course of the project, the following should be accomplished:
 * Review and test various zooming and panning libraries available.
 * Integrate the library with the current code of the pagelist widget.
 * Work on integrating the library with ResourceLoader (the system used by Wikimedia to serve Javascript, CSS and Image assets).
 * If time permits, work on replacing the old jQuery-based zooming and panning library used by the Page: namespace editor.

More details

 * Phabricator Task: T262146
 * Skills required: Javascript, (a basic knowledge of PHP may be required)
 * Mentors: Sohom Datta, Satdeep Gill, Sam Wilson

Gamified Knowledge Base Completion Plugin for Wikibase/Wikidata
Open data collections -- like Wikidata -- are created and maintained by volunteers and are thriving on community knowledge. Missing or new information needs to be added by community members, otherwise the knowledge base will dry out and be obsolete after a while. Hence, completing a knowledge base is a crucial task within any community-driven initiative. Consequently, helping people to integrate their knowledge into a knowledge base will benefit the growth, correctness, and topicality of Wikidata.

In our earlier research [1] we already showed that it is possible to find outliers in graph-based knowledge bases which need to be checked by experts to ensure the data quality. Our recent implementation of a tool called Wikidatacomplete ( https://wikidatacomplete.org ) (c.f., [2]) shows how facts are extracted from text and offered to users for validation. After this process step the validated fact is pushed to Wikidata for later integration. Hence, identifying demands for the Wikidata completion is possible and already has proven its value.

We propose here to implement a Wikibase plugin that is dedicated to facilitate the Wikidata completing process. While navigating through Wikidata, the plugin will show to the user facts extracted from textual sources as well as other knowledge bases (e.g., Wikipedia) which need to be validated. Hence, the Wikibase plugin is showing users suggestions of facts that should be added or changed within the Wikidata knowledge base. To compute the suggestion previously developed services will be used.

Additionally, a badge Web service interface needs to be integrated allowing users to integrate their badge into their profiles of social networks (e.g., on Wikidata’s user page, GitHub profile, Linkedin profile) to show their dedication and motivate other users to contribute, too. A rule-based system for earning badges needs to be implemented.

[1] Didier Cherix, Ricardo Usbeck, Andreas Both, and Jens Lehmann (2014). Lessons learned—the case of crocus: Cluster-based ontology data cleansing. In European Semantic Web Conference (pp. 14-24). Springer, Cham.

[2] Bernhard Kratzwald, Guo Kunpeng, Stefan Feuerriegel, and Dennis Diefenbach. IntKB: A Verifiable Interactive Framework for Knowledge Base Completion. International Conference on Computational Linguistics (COLING), 2020

More details

 * Phabricator Task: T275102
 * Skills required: Javascript, (a basic knowledge of PHP may be useful)
 * Mentors: DD063520, AnBo-de, Gabinguo, Aleksandr.perevalov

Contact

 * Reach out for general questions on the #gsoc21-outreachy22 Zulip chat (preferred) or send an email to the organization administrators: Srishti Sethi (ssethi@undefinedwikimedia.org), Pavithra Eswaramoorthy (pavithraes@undefinedoutlook.com), Gopa Vasanth (gopavasanth1999@undefinedgmail.com) and Ankit Maity (ankitmaity@undefinedoutlook.com).
 * Ask a technical question in the support places.