Extension:RDFIO/Template matching for RDFIO

From mediawiki.org

Template Matching for RDFIO[edit]

Public URL
https://www.mediawiki.org/wiki/Extension:RDFIO/Template_matching_for_RDFIO
Bugzilla report
Bug 61999
Announcement
http://lists.wikimedia.org/pipermail/wikitech-l/2014-March/075286.html

Name and contact information[edit]

Name
Ali King
Email
1.alison.king[at]gmail.com
IRC or IM networks/handle(s)
zahara
Web Page / Blog / Microblog / Portfolio
Blog Twitter
Resume (optional)
LinkedIn
Location
Edinburgh, Scotland
Typical working hours
9:30-18:00 BST (UTC + 1)

Synopsis[edit]

There are a number of points for development in the RDFIO extension, and I intend to address one main one (adapting to use semantic templates) as well as some more minor ones. The extension originated from a previous GSoC project, and I hope that the development I carry out will improve its usability and encourage its adoption by Semantic MediaWiki users.

The main focus for this project is matching of RDF data being imported using the RDFIO extension to SMW template types. The appropriate template types for the data can be calculated based on the entity types and relationships. This may result in more than one match - in any case, the user will have the option of selecting the top matches or any other templates which fit the data.

This requires the initial mapping of the most common ontologies (such as FOAF) to Semantic MediaWiki template types. Other specialised ontologies reference these common types in the definition of their own types, which can be used to select likely candidate templates. User input to match these types will then inform the mapping of these ontologies to the templates. This matching data could be stored as a triple store itself, or in a conventional database if this is more practical. The existing code uses the Equivalent URI special property to match pages and properties, so I will investigate the best way to incorporate this into the new functionality.

A related proposed project is Wikidata Article Generation, which would create adhoc pages from data, without requiring an editor to explicitly write them. I will consult with the author on the use of data to populate templates, and whether this common functionality might be incorporated into core.

There has also been some discussion of merging RDFIO with other extensions with similar function. Although this is beyond the scope of the project, I will look at the differences between these extensions, and seek to communicate with the dvelopers on the subject of harmonisation.

Another item on the project roadmap is to remove the extension's dependency on the Wiki Object Model. This requires further investigation in terms of the complexity of the parsing required in order to assess whether it is feasible to implement within the timeframe.

There is also a requirement for improved documentation and tutorial examples in order to make the extension as user-friendly as possible. This is something I will be working on as part of the new developments, but will also be ensuring that all of the existing functionality is also covered.

The main benefit of this project is to connect Semantic MediaWiki more strongly with the rest of the semantic web, allowing much easier interoperability with existing resources. It will also open up the extension to less technically skilled users managing domain-specific knowledge who may not be that familiar with manipulating RDF data, and encourage the use and import of pre-existing data stores. It also improves the integration of the extension internally with Semantic MediaWiki, in terms of how it stores and processes data using templates.

Possible mentors

Joel Sachs, Samuel Lampa

Deliverables[edit]

Please describe the details and the timeline of the work you plan to accomplish on the project you are most interested in (discuss these first with the mentor of the project):

Week Number Week Task
-1 19th March - 20th April Research and exploration of the extension. Fix Bug 61027 (incorrect file path). Attend Edinburgh Wikimedia meetup (March 30th) to discuss ideas.
0 21st April - 18th May Research and investigation of RDFIO and Semantic Mediawiki internal workings, contacting other developers whose expertise may be of use. Assisting with preparing presentation for SMWCon. Assessment of work required to refactor RDFIO to remove Wiki Object Model dependency, and whether it can be included in this schedule.
1 19th May - 25th May Travel to Canada in order to attend SMWCon and meet with mentor Joel Sachs in person. Prepare and deliver presentation at SMWCon covering development so far and design decisions, and soliciting feedback from SMW developers and users. Use case elicitation/development.
2 26th May - 1st June Diagramming the existing extension architecture and functionality for documentation purposes, and proposed changes to the system. Use case elicitation/development. Deliverable: system diagrams to add to project documentation
3 2nd June- 8th June Prototyping of template-mapping functionality. Use case refinement. Discussion and planning of development workflows and setup.
4 9th June - 15th June Prototyping. Use case refinement. Establishment of development setup. Testing of existing functionality and design of new unit tests.
5 16th June - 22nd June Publish prototype as a demo and solicit feedback. Deliverable: demo published to GitHub
6 23rd June- 29th June Mid-term assessment - review of project progress & priorities for next phase
7 30th June - 06th July Incorporation of feedback into prototype - refactoring where required
8 07th July - 13th July Addition of prototype to new feature branch, integration with existing extension. 'Lightning talk' presentation at Open Knowledge meetup at the Scottish Parliament
9 14th July- 20th July Integration development
10 21st July - 27th July Development/testing
11 28th July-3rd August Development/testing
12 4th August-10th August Deployment and user testing, update documentation. Deliverable: new version of extension with added functionality
13 11th August-18th August Final patches & documentation. Deliverable: code changes where needed, documentation

Participation[edit]

I have so far discussed details with the mentors using Google Hangouts. I've also previously used HipChat as a team communication tool, but am open to any channel which works for the team as a whole.

I have also joined the project board on Trello, which has been very useful in terms of seeing the overall project roadmap and status. I have also used 15five and WeekDone in the past to communicate aims and achievements on a weekly basis, so will be discussing with the project mentors the best project progress methods to suit their schedules.

The source code for RDFIO is currently on GitHub, so I intend to keep it there and use the standard Mediawiki workflows.

I will also attend meetup events such as those organised by the Open Knowledge Foundation, which are great for meeting experts in open data and semantic technology, and also arrange specific meetings with other people who may be able to offer advice.

Project progress reports

About you[edit]

In September 2012 I quit my public sector job in order to become a professional programmer. I secured a mentor and began working at a startup company on projects using JavaScript, jQuery, XSLT, PostgreSQL, Ruby, Rails and other gems.

I also gained my first contract work doing report development work in MS SQL Server for the National Health Service. This involved database design, visual dashboard design, training staff in SQL, and remote working.

Education completed or in progress

(all completed - currently undertaking informal learning)

Computing for Data Analysis (Coursera, completed with distinction)

CIW Database Design, Web Languages (JavaScript & Perl), Web Design, Web Foundation

Studied for a BEng in Electronics with Music at the University of Glasgow.

How did you hear about this program?

Tweet from the Ada Initiative

Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?

No - I am a contract IT worker, and my current contract will have finished by the time the program begins. I do not plan to start my next contract until September.

We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?

I am not a student so am not eligible for GSoC


Past experience[edit]

Please describe your experience with any other FOSS projects as a user and as a contributor

I have not previously worked as a contributor to open source, although it's something I've been wanting to do for a while, and now I feel my skills and experience are up to the job. I have however done research into the feasibility of using open data exchange formats, and attended Open Knowledge Foundation events in Edinburgh to this end.

Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them (include links)

The most important thing I have learned from previous projects is the importance of making data import technically accessible to your target users. I worked on a project at Skills Development Scotland implementing a bulk import facility using the XCRI schema. This initially asked for learning providers to submit their course data as a schema-compliant XML file, which few had the skills or resources to do. A second implementation used a spreadsheet template which mapped to the schema, but gave insufficient feedback on any errors in input. I was not in a position to influence the original project design, but gained a great deal of insight into issues faced by the users. In researching other upload interfaces during the course of this and my subsequent work at a startup company in the same field, I have come to understand the importance of feedback and usability in the data import process.

What project(s) are you interested in (these can be in the same or different organizations)?

I would also be interested in the two Semantic Forms projects if this one is unavailable

Any other info[edit]

See also[edit]