User:SHL/GSoC2010

Identity
Name: Samuel Lampa Email: samuel.lampa[at]gmail.com Project title: General RDF export/import functionality for Semantic MediaWiki

Contact/working info
Timezone: Sweden (GMT +1) Typical working hours: 14:00 - 02:00 IRC or IM networks/handle(s): Skype: samuel_lampa, IRC: freenode/samuell

Project summary
Extend the import/export functionality of Semantic MediaWiki (SMW) to enable the use of SMW as a general collaborative RDF editor that can be integrated as part of workflows. That is, so that workflow systems (or scriptable workbench software, like Bioclipse) can easily export/import data into SMW in a uniform way: Export RDF from workflow --> Collaboratively edit (in SMW) --> Import back RDF to workflow.

Semantic MediaWiki already has RDF export (both per page, and as a maintenance script), and can map properties to RDF predicates (with this extension). But what is missing (seemingly) is full general RDF import, and ability to map also normal wiki pages to RDF URI:s.

So, practically the two (at least) things that need to be done:
 * 1) Extend the Vocabulary import feature of Semantic Mediawiki (which can map Semantic properties to RDF based predicates) to not only work for semantic properties, but allowing to set up mappings also for "normal" wiki page/terms. Probably one would have to map MediaWiki namespaces to RDF namespace equivalents, so as not to get too long titles while still being specific enough. (For example, one might collaborate on chemicals identified with chEBI ids, where the URI:s looks like so: " http://bio2rdf.org/chebi:16183 " ... then one might map the the RDF base URL (" http://bio2rdf.org/chebi: ") to the MediaWiki namespace "chebi", so that the wiki title can be just "chebi:16183".)
 * 2) General RDF import functionality (Not restricted to OWL ontologies, making use of mappings defined according to the point above. That is, some ability to parse RDF (RDF/XML or N3 or similar) into the MediaWiki import format (Maybe/probably using parser functionality of the ARC PHP library ...(?)

About you
I'm a soon-to-graduate biotechnology engineering student at Uppsala University, Sweden, focusing on bioinformatics and computational biology.

I also have some 10+ years of web design and development experience and always tend to "get stuck" in some web projects besides my studies :). especially related to MediaWiki and Drupal, of which some can be seen in the portfolio over at RIL Partner AB.

I have a deep interest in Semantic technologies, and how it can be used (probably in tight integration with computational and simulation tools) to facilitate better knowledge discovery and integration in the Life Sciences, making it possible to understand Biological systems at the systems level ... with "an engineer's glasses". This is what really drives me ... and gives me an interest in helping to build the tools needed for this. Part of realizing this is my current degree project where I'm integrating a Prolog based semantic framework into Bioclipse (An open source scriptable life sciences workbench), and comparing it with conventional technologies.

I hope to be able to work in the bioinformatics sector ... and will probably continue open source development for Bioclipse in the future, as I see it is a great platform for developing and/or using new bioinformatics functionality in. The above proposed GSoC project is highly interesting to me as it would be a killer feature for Bioclipse to be able to export data for community collaboration, and then retrieve it back again.

I'm also a fan of Open Source and Open standards and prefer Ubuntu for OS (Desktop and server), Java for general programming, and PHP for web programming (And Prolog for Logic programming :) ).

Deliverables

 * Extension of the SMW [Vocabulary import], to allow mapping also of any Wiki page to a RDF URI (probably mapping wiki namespaces and RDF namespaces).
 * Full import functionality

Project schedule
...

Participation
I prefer having contact daily (or so) on a chat such as IRC or Skype (hanging out daily at #bioclipse right now for my degree project) + E-mail for longer discussions. I also much like the idea to use a blog (and really use it!) to document my progress (and make sure I don't forget things learned), and to use GitHub (or similar) for publishing source code.

Past open source experience

 * SWI-Prolog integration plugin for Bioclipse (GitHub repo, Project blog, Screencast)
 * A little patch to Yaron Koren's Semantic Forms.

Any other info

 * Made a web interface for a protein analysis tool (Project done at the LCB in Uppsala), using MediaWiki, the MediaWiki API + external php scripts (Screencast)
 * A Java based web crawler for use in combination with SPHINX search engine
 * MediaWiki skin (demo)
 * Drupal theme