User:SHL/GSoC2010

Identity
Name: Samuel Lampa Email: samuel.lampa[at]gmail.com Project title: General RDF export/import functionality for Semantic MediaWiki

Contact/working info
Timezone: Sweden (GMT +1) Typical working hours: 14:00 - 02:00 IRC or IM networks/handle(s): Skype: samuel_lampa, IRC: freenode/samuell

Project summary
Extend the import/export functionality of Semantic MediaWiki (SMW) to allow also full, general RDF import.

The background for the idea (for me) is to enable the use of SMW as a general collaborative RDF editor that can be integrated with workflow systems/scriptable workbench software such as Bioclipse), to enable workflows of the type:

Import RDF to Wiki --> Collaboratively edit --> Export back in same format

...but this project would include general reworkings of the import/export functionality, which that specific use case can take advantage of.

Suggested project plan
''(Based on mail conversation with Denny Vrandecic)

1) Replace the RAP connection, using it as a starting point for connecting ARC with SMW and probably the recently introduced SMWWriter API, to allow creation of a SPARQL/SPARQL Update API to the Wiki Knowledge base.

2) Improve the equivalent URI functionality allowing to use it for mapping of URI:s with Wiki articles, at import and export. and replace the current vocabulary import feature.

The order of 1) and 2) does not matter.

3) Connect the SPARQL endpoint with the equivalent URI feature so that one can use their own vocabulary when querying the wiki.

3) Design an improved RDF export feature that allows to specify an ontology to use for export.

4) Implement an import and update of RDF to the wiki, preferably using SPARQL update as an interface. Implement a namespace based mappings of wiki titles to RDF base URI:s for that (more info below).

5) Define a way for using mapping properties in the RDF, and implement a mapping tool. The latter is optional.

During the whole time: document, support, release.

Using namespace mapping for wiki titles on fresh wikis
On importing RDF with nodes that have no corresponding wiki article, one has to choose the name of the wiki articles somehow.

(This is especially the case on the original use-case of connecting an empty Semantic MediaWiki to a workflow tool for use as a collaborative RDF editor).

Using the full URI:s as titles is unpractical and using just the part after the "base URI" risks to create duplicate pages.

Suggestes solution is to allow a mapping (specified in import data or in a config article) RDF base URI:s to MediaWiki (pseudo-)namespaces. (Fallback behaviour, where mapping for a certain base URI is missing, would be to use the full URI as title).

Example:

So, one would for example map the wiki pseudo namespace " " with " ", so that on importing a triple containing " " results in the wiki article " ", " " might similarly result in " " etc.

About you
I'm a 27 year old biotechnology student att Uppsala university, having much interest in systems biology, computational biology, system design, semantic technologies and web development, currently just finishing my M.Sc. degree in biotechnology (focusing on systems biology and bioinformatics).

Much of my technical experience comes from besides my studies, from doing web design since 10+ years, web development with Drupal and MediaWiki for some 4 years, as well as summer work as PC support technician/(Windows) network admin etc. Web development has been done through my father's small firm RIL Partner AB where we are also providing web hosting for a few customers, running our own dedicated (Ubuntu) servers (which we optimized for MediaWIki and Drupal) which I'm administrating. At RIL Partner we've been playing around quite a bit with MediaWiki and Semantic MediaWiki, testing out different ideas.

In the last few years I've actively focused on getting more hands on coding experience, and hence did a PHP/MediaWiki web interface project at uni, took bioinformatics courses, did a little Java web crawler for use with the Sphinx search engine etc. In my degree project, I'm getting experience from Java coding, Eclipse RCP development and Prolog, as well as getting to know the W3C Semantic formats and technologies.

The borders between studies, work and hobby tends to get a bit blurred for me (I'm typically easier to reach by e-mail or skype than by phone :) ). not leaving very much spare time. The time that is over anyway I typically spend hanging out with my family.

In the near future I hope to be able to work in the bioinformatics sector, or with systems and knowledge management tools for the Life Sciences. I'll probably continue open source development for Bioclipse and MediaWiki to some extent in the future, as I see both of them as great platforms for the kind of functionality I want to implement and work with. The above proposed GSoC project is highly interesting to me as it would be a killer feature for Bioclipse to be able to export data for community collaboration, and then retrieve it back again.

What drives me is a vision to enable better and more systematic knowledge discovery and integration in the Biology / Life Sciences domain, by integrating Semantic technologies, with computational and simulation tools.

Required deliverables
The order of 1) and 2) does not matter.
 * 1) Replace the RAP connection, using it as a starting point for connecting ARC with SMW and probably the recently introduced SMWWriter API, to allow creation of a SPARQL/SPARQL Update API to the Wiki Knowledge base.
 * 2) Improve the equivalent URI functionality allowing to use it for mapping of URI:s with Wiki articles, at import and export. and replace the current vocabulary import feature.
 * 3) Connect the SPARQL endpoint with the equivalent URI feature so that one can use their own vocabulary when querying the wiki.
 * 4) Design an improved RDF export feature that allows to specify an ontology to use for export.
 * 5) Implement an import and update of RDF to the wiki, preferably using SPARQL update as an interface. Implement a namespace based mappings of wiki titles to RDF base URI:s for that (more info [#Using_namespace_mapping_for_wiki_titles_on_fresh_wikis above]).
 * 6) Define a way for using mapping properties in the RDF

During the whole time: document, support, release.

If time permits

 * Implement a mapping tool.

Project schedule
...

Participation
I prefer having contact daily (or so) on a chat such as IRC or Skype (hanging out daily at #bioclipse right now for my degree project) + E-mail for longer discussions. I also much like the idea to use a blog (and really use it!) to document my progress (and make sure I don't forget things learned), and to use GitHub (or similar) for publishing source code.

Past open source experience

 * SWI-Prolog integration plugin for Bioclipse (GitHub repo, Project blog, Screencast)
 * A little patch to Yaron Koren's Semantic Forms.

Any other info

 * Made a web interface for a protein analysis tool (Project done at the LCB in Uppsala), using MediaWiki, the MediaWiki API + external php scripts (Screencast)
 * A Java based web crawler for use in combination with SPHINX search engine
 * MediaWiki skin (demo)
 * Drupal theme