Chemical Markup support for Wikimedia Commons

Chemical Markup support for Wikimedia Commons

 * Public URL: https://www.mediawiki.org/wiki/Chemical_Markup_support_for_Wikimedia_Commons
 * Bugzilla report: bugzilla:16491
 * Announcement: [//lists.wikimedia.org/pipermail/wikitech-l/2014-March/075524.html wikitech-l], [//lists.wikimedia.org/pipermail/commons-l/2014-March/thread.html commons-l]

Name and contact information

 * Name: Rainer Rillke
 * Email: @wikipedia.de
 * IRC or IM networks/handle(s): rillke
 * Web Page / Blog / Microblog / Portfolio: https://commons.wikimedia.org/wiki/User:Rillke
 * Resume (optional): http://osrc.dfm.io/rillke
 * Location: Germany
 * Typical working hours: 16:30 - 22:00 UTC (weekday)

Synopsis
Wikipedia articles covering chemical reactions or chemical compounds are often illustrated with SVG graphics showing chemical equations or compounds. However, SVG is a graphic format. It is therefore not possible to easily re-mix these fils and one has to draw the whole compound again (or pull it from a database). A common scenario is ''"Quack" started an article about a compound and "Cheming" wants to contribute how to synthesize that compound. "Cheming" has to re-draw the whole compound.''

Goals
Allow uploading and implement rendering for MDL-molfiles. The format is specified, human readable and commonly used. "“The molfile is sufficiently common that most, if not all, cheminformatics software systems/applications are able to read the format, though not always to the same degree. It is also supported by some computational software such as Mathematica.” -en:Chemical table file"
 * Server-side support

There are client-side JavaScript creators for web browsers available and 2image converters for the server side.

Provide a JavaScript molecule editor so editors do not have to install software and then go through the hoops, choosing the correct format(ting) and uploading the files. File upload can be accomplished by AJAX.
 * Client side molecule editor


 * Possible mentors: Gilles Dubuc, Brian Wolff, Bryan Davis

Deliverables
Please describe the details and the timeline of the work you plan to accomplish on the project you are most interested in (discuss these first with the mentor of the project):

Below a list of existing client-side and server-side solutions.

The client side
Although not discussed with the mentors yet, I believe the most viable option in regard to achieving the goal having a working prototype or better, advancing into production, is using  +.

Workflow

 * 1) Client on   file (existing or non-existing file)
 * 2) Client loads molfile editor. Editor allows import/export of molfile, export of SMILES and export of SVG (server created SVG).
 * 3) User edits file and saves
 * 4) FormData is used for file upload
 * 5) Molfile is stored; do MDL molfiles contain notable metadata that have to be extracted or converted?
 * 6) SVG is created from   file through sdf2svg and stored - file name, where?
 * 7) SVG is thumbnailed through rsvg (building on existing SVG support/approach) creating PNG thumbs

Non-obvious challenges

 * Either molfile editor gets a full security audit (we might even consider prettifying and adding comments to the source code [creating something maintainable], although not nice becasue upstream library) or it is inlcuded through an, loaded from a different domain
 * Internationalization of the molecule editor
 * Option for turning on/off atom coloring on a per-site and per-inclusion basis:

sdf2svg

 * Aromatic bonds not shown
 * Some editors write  into molfiles... sdf2svg should be able to read this
 * Padding often too small cutting off atom lables

Participation

 * Style: MediaWiki extension, similar to Extension:TimedMediaHandler or Extension:PagedTiffHandler
 * Progress and experiences will be logged at /office desk (including future visions, what's missing etc.) and more in a more narrow frame at /microtasks (commits, code review).
 * Code will be hosted at Wikimedia Git. Git/New repositories/Requests. The repo will be name after Extension:MolHandler, thus . But I will run a   repository at GitHub allowing me to push changes quickly, creating as many branches as I like and to test different options and still showing that I am not idle.
 * Every time I commit something to the mw-repo, it will have to be reviewed, thus I learn how to do it correctly. However, do not expect me committing something to that repo every day; but at least once per week.
 * MediaWiki has great help resources for self-study (this wiki, doxygen generated stuff and finally the source code looks also sane) but for "best practices" I will certainly need the help of my mentors. Expect me asking a lot of "What is the best approach for … "-questions, especially regarding the PHP-part. This is also the reason I wish two mentors knowledgeable with file handling on the server side. Dependent on what turns out to be more efficient, I'll bug them with e-Mails or on IRC.
 * I'll occasionally notify and gather feedback at project chemistry at Wikimedia Commons so it's not going to be vapourware for the reason not being accepted.

About you

 * Education completed or in progress: In progress — something closely related to the enhancements the extension will evolve. But well, [//www.youtube.com/watch?v=lOA6jSf6gMM I am German]. I am careful when it comes to sharing all kind of data with the whole world. In other words, I would appreciate if you won't force me publishing anything specific.

I read a post on a mailing list complaining raising the point that there wouldn't be enough diversity regarding the origin amongst the applicants. I intended to change that with my participation.
 * How did you hear about this program?

Some of my time will go into the Pronunciation Recording Gadget. But this has a wider schedule and I'll have plenty of time this spring/summer. Otherwise there are no specific plans for activities like internships or vacation, yet.
 * Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?

Outreach Program for Women? Without looking into the details but *I think*, this doesn't apply to me.
 * We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?

Past experience

 * Please describe your experience with any other FOSS projects as a user and as a contributor: I could do this providing links to Gerrit, GitHub and Special:CentralAuth/Rillke and telling you to look though the rights logs and contribs as well as the user pages but here is a brief summary about my experience at Wikimedia: In 2010, I registered at Wikipedia, became Wikimedia Commons addict in 2011 and learned a lot about JavaScript, administrator in November 2011 (so I was able to maintain my scripts), started reporting bugs at Bugzilla in 2012 and using Toolserver and Gerrit in 2013. In 2014, I created some tools at Toollabs (learning PHP) that are still up and running. Most notably the database query services: [//tools.wmflabs.org/octodata/ OctoData] and [//tools.wmflabs.org/rillke/jsonapi.php?action=sha1lookup&sha1=C5729501FEBE7F9A33C74AD3C2ED1E7E5F318DBA sha1lookup] for the old_image table which is not exposed through regular mw-API. I have [//toolserver.org/~rillke/docs/ created documentation using JSDuck] but the software it is for isn't in use yet ...


 * Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them (include links): So far, I've mainly contributed to Wikimedia projects, for example UploadWizard but also wrote a bunch of user scripts at Wikimedia Commons: [//commons.wikimedia.org/w/index.php?title=Commons:User_scripts/File_Analyzer&withJS=MediaWiki:FileAnalyzer.js FileAnalyzer], VisualFileChange, [//commons.wikimedia.org/w/index.php?title=Commons:MyGallery&withJS=MediaWiki:JSONListUploads.js&gUser=Rillke GalleryTool], a script using chunked upload protocol, [//commons.wikimedia.org/w/index.php?title=Commons:User_scripts/Invisible_charaters&withJS=MediaWiki:Invisible_characters_unveiled.js Title checker] and maintain a lot more (more that I am able/ or let's say is fun to maintain, given all the recent JavaScript deprecations). Furthermore, I worked with molfiles and a molfile editor in the past. Proof can be provided upon request, discretely. Not to forget the daily media-related work at Wikimedia Commons.

I prefer projects where I can see the light at the end of the tunnel, and where past experience has proven they're successful, hence my late registration at Wikipedia. Thematically, I like projects around chemistry, media files, uploading, involving communities and feedback cycles. I believe that asking users that are target of the software about their needs, by using specific questions and coming up with different suggestions is a crucial part of software development. Head over to Meta, if you want to see these points proven.
 * What project(s) are you interested in (these can be in the same or different organizations)?

Any other info

 * Test the mol file editor!