Extension:RDFIO/Template matching for RDFIO

Visual Import and Template Matching for RDFIO

 * Public URL: https://www.mediawiki.org/wiki/User:Zahara/OPW_Proposal_Round_8
 * Bugzilla report: Bug 61999
 * Announcement: http://lists.wikimedia.org/pipermail/wikitech-l/2014-March/075286.html

Name and contact information

 * Name: Ali King
 * Email: 1.alison.king[at]gmail.com
 * IRC or IM networks/handle(s): zahara
 * Web Page / Blog / Microblog / Portfolio: @ali_king
 * Resume (optional):
 * Location: Edinburgh, Scotland
 * Typical working hours: 9:30-18:00 BST (UTC + 1)

Synopsis
There are a number of points for development in the Extension:RDFIO extension, and I intend to address one main one (adapting to use semantic templates) as well as some more minor ones. The extension originated from a previous GSoC project, and I hope that the development I carry out will improve its usability and encourage its adoption by Semantic Mediawiki users.

The main focus for this project is matching of RDF data being imported using the RDFIO extension to SMW template types. The appropriate template types for the data can be calculated based on the entity types and relationships. This may result in more than one match - in any case, the user will have the option of selecting the top matches or any other templates which fit the data.

This requires the initial mapping of the most common ontologies (such as FOAF) to Semantic Mediawiki template types. Other specialised ontologies reference these common type in the definition of their own types, which can be used to select likely candidate templates. User input to match these types will then inform the mapping of these ontologies to the templates. This matching data could be stored as a triple store itself, or in a conventional database if this is more practical.

A related proposed project is Wikidata Article Generation, which would create adhoc pages from data, without requiring an editor to explicitly write them. I will consult with the author on the use of data to populate templates, and whether this common functionality might be incorporated into core.

Another item on the project roadmap is to remove the extension's dependency on the Wiki Object Model. This requires further investigation in terms of the complexity of the parsing required in order to assess whether it is feasible to implement within the timeframe.

There is also a requirement for improved documentation and tutorial examples in order to make the extension as user-friendly as possible. This is something I will be working on as part of the new developments, but will also be ensuring that all of the existing functionality is also covered.

The main benefit of this project is to open up the extension to less technically skilled users who may not be that familiar with manipulating RDF data, and encourage the use and import of pre-existing data stores. It also improves the integration of the extension with Semantic Mediawiki, in terms of how it stores and processes data using templates.

Joel Sachs, Samuel Lampa
 * Possible mentors:

Deliverables
Please describe the details and the timeline of the work you plan to accomplish on the project you are most interested in (discuss these first with the mentor of the project):

Participation
I have so far discussed details with the mentors using Google Hangouts. I've also previously used HipChat as a team communication tool, but am open to any channel which works for the team as a whole.

I have also joined the project board on Trello, which has been very useful in terms of seeing the overall project roadmap and status. I have also used 15five and WeekDone in the past to communicate aims and achievements on a weekly basis, so will be discussing with the project mentors the best project progress methods to suit their schedules.

The source code for RDFIO is currently on GitHub, so I intend to keep it there and use the standard Mediawiki workflows.

I will also attend meetup events such as those organised by the Open Knowledge Foundation, which are great for meeting experts in open data and semantic technology, and also arrange specific meetings with other people who may be able to offer advice.

About you
In September 2012 I quit my public sector job in order to become a professional programmer. I secured a mentor and began working at a startup company on projects using JavaScript, jQuery, XSLT, Ruby, Rails and other gems.

I also gained my first contract work doing report development work in MS SQL Server for the National Health Service. This involved database design, visual dashboard design, training staff in SQL, and remote working.


 * Education completed or in progress:

(all completed - currently undertaking informal learning)

Computing for Data Analysis (Coursera, completed with distinction) CIW Database Design, Web Languages (JavaScript & Perl), Web Design, Web Foundation Studied for a BEng in Electronics with Music at the University of Glasgow.


 * How did you hear about this program?

Tweet from the Ada Initiative


 * Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?

No - I am a contract IT worker, and my current contract will have finished by the time the program begins. I do not plan to start my next contract until September.


 * We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?

I am not a student so am not eligible for GSoC

Past experience

 * Please describe your experience with any other FOSS projects as a user and as a contributor:

I have not previously worked as a contributor to open source, although it's something I've been wanting to do for a while, and now I feel my skills and experience are up to the job. I have however done research into the feasibility of using open data exchange formats, and attended Open Knowledge Foundation events in Edinburgh to this end.


 * Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them (include links):

The most important thing I have learned from previous projects is the importance of making data import technically accessible to your target users. I worked on a project at Skills Development Scotland implementing a bulk import facility using the XCRI schema. This initially asked for learning providers to submit their course data as a schema-compliant XML file, which few had the skills or resources to do. A second implementation used a spreadsheet template which mapped to the schema, but gave insufficient feedback on any errors in input. I was not in a position to influence the original project design, but gained a great deal of insight into issues faced by the users. In researching other upload interfaces during the course of this and my subsequent work at a startup company in the same field, I have come to understand the importance of visual feedback in the data import process.
 * What project(s) are you interested in (these can be in the same or different organizations)?