Chemical Markup support for Wikimedia Commons
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on any information on this page.
This was a Google Summer of Code/2014 project/proposal.
Chemical Markup support for Wikimedia Commons
- Public URL
- Issue tracker
- Maniphest (a "Phabricator" application)
- Issue tracker for Extension
- MediaWiki Bugzilla
- Task board
- Project Board (a "Phabricator" application)
- Gerrit WM
- Bugzilla report
- wikitech-l, commons-l
Name and contact information
- Rainer Rillke
- IRC or IM networks/handle(s)
- Web Page / Blog / Microblog / Portfolio
- Resume (optional)
- Typical working hours
- 08:00 - 20:00 UTC; Tue, Wed, Thur: 08:00 - 16:00 UTC
Wikipedia articles covering chemical reactions or chemical compounds are often illustrated with SVG graphics showing chemical equations or compounds. However, SVG is a graphic format. It is therefore not possible to easily re-mix these fils and one has to draw the whole compound again (or pull it from a database). A common scenario is "Quack" started an article about a compound and "Cheming" wants to contribute how to synthesize that compound. "Cheming" has to re-draw the whole compound.
- Server-side support
Allow uploading and implement rendering for MDL-molfiles. The format is specified, human readable and commonly used.
“The molfile is sufficiently common that most, if not all, cheminformatics software systems/applications are able to read the format, though not always to the same degree. It is also supported by some computational software such as Mathematica.” -en:Chemical table file
- Client side molecule editor
- Possible mentors
- Gilles Dubuc, Brian Wolff, Bryan Davis
Please describe the details and the timeline of the work you plan to accomplish on the project you are most interested in (discuss these first with the mentor of the project):
Below a list of existing client-side and server-side solutions.
Sever side or fully integrated solutions
|approach / third party dependency||pros||cons|
|Client side SVG creation; embedding molfile into generated SVG||less security issues anticipated
rapid deployment possible
|creation of some kind of personal format spec.|
only users with UAs supporting SVG creation would be supported
AGPL as license:[dubious ]The only (nice) SVG creating molfile editor I found is Ketcher by GGA; there is another one but this is compiled from Java with Google Webtoolkit and I don't even want to look at the output
Users could trick the system violating integrity; the user renders the file - molfile could be different from SVG
|Server-side molfile rendering: indigo-depict
||pure PHP, easy to review, rewrite and deploy
|SVG must be susequently processed by rsvg for thumbnails|
PHP is not the fastest approach
|Server-side molfile rendering: indigo by GGA
|maintanance by a notable company
precompiled binaries avalable (good for testing)
|requires installing binaries or phpize or something like that|
requires c/c++ security review
|Server-side molfile rendering: ChemAzTech
||incoporates a lot of features and is already translated to french
|python for converting to images required|
|Server-side molfile rendering: OpenBabel
||a wide variety of formats supported; C/C++, native code — fast processing with almost zero impact on servers expected (since chemical markup is not too commonly used in WMF projects); Ubuntu package for Ubuntu 12.04 available
|similar to indigo a large framework|
The client side
|Ketcher by GGA||draws on SVG that could be sent and stored at server||
advertising must be removed
|ChemDoodle Web Components||
nice and fast
code without any helpful comment; looks like concatenated from multiple files, but is still readable, however far away from being a pleasure
draws on canvas
advertising must be removed
draws on SVG that could be sent and stored at server
BSD license (the compatible one)
Windows 3.1 look
SVG produced doesn't look well
JS compiled from GWT/Java - almost impossible to read the
draws on SVG that could be sent and stored at server (in theory)
Apache version 2 license (which should work as MW is GPL.v2+, meaning GPL.v3 as an option) but probably not preferred
looks like not completely ready, yet
Although not discussed with the mentors yet, I believe the most viable option in regard to achieving the goal having a working prototype or better, advancing into production, is using
ChemDoodle Web Components.
|Setup environment (vagrant, gerrit, git), /microtasks||04/03/14 - 28/03/14||took a look at vagrant, other stuff was installed before||Done|
|Create GitHub repository, legal check, etc.||28/03/14 - 07/04/14||Code will be hosted at Wikimedia Git. The repo will be name after Extension:MolHandler, thus
|Aim for a working proof-of-concept||08/05/14 - 20/06/14||get the whole pipeline running, even if it only works on a local install, the code isn't clean or tested, etc.; something that shows that all the moving parts can work together from upload to file page with a generated thumbnail||In progress|
|Mid Term Evaluation||20/06/14 - 23/06/14|
|Prettify, prepare for production||23/06/14 - 15/07/14||making it clean, giving it test coverage, and writing all the things necessary for production deployment (presumably things like puppet scripts to deploy the server-side things, production config changes, etc.).|
|Writing Documentation, Deploying changes to labs, letting folk testing there, then WMF side deployment||15/07/14 - 10/08/14|
|Final Report Submission||20/08/2014|
- Client on
.molfile (existing or non-existing file)
- Client loads molfile editor. Editor allows import/export of molfile, export of SMILES and export of SVG (server created SVG).
- User edits file and saves
- FormData is used for file upload
- Molfile is stored; do MDL molfiles contain notable metadata that have to be extracted or converted?
- SVG is created from
.molfile through indigo-depict and stored - file name, where?
- SVG is thumbnailed through rsvg (building on existing SVG support/approach) creating PNG thumbs
- Either molfile editor gets a full security audit (we might even consider prettifying and adding comments to the source code [creating something maintainable], although not nice becasue upstream library) or it is inlcuded through an
<iframe>, loaded from a different domain
- Internationalization of the molecule editor
- Option for turning on/off atom coloring on a per-site and per-inclusion basis:
- Aromatic bonds not shown
- Some editors write
$RXNinto molfiles... sdf2svg should be able to read this
- Padding often too small cutting off atom lables
- --> We went with indigo-depict.
- Style: MediaWiki extension, similar to Extension:TimedMediaHandler or Extension:PagedTiffHandler
- Progress and experiences will be logged at /office desk (including future visions, what's missing etc.) and more in a more narrow frame at /microtasks (commits, code review).
- Code will be hosted at Wikimedia Git. Git/New repositories/Requests. The repo will be named after Extension:MolHandler, thus
mediawiki/extensions/MolHandler. But I will run a
-devrepository at GitHub allowing me to push changes quickly, creating as many branches as I like and to test different options and still showing that I am not idle.
- Every time I commit something to the mw-repo, it will have to be reviewed, thus I learn how to do it correctly. However, do not expect me committing something to that repo every day; but at least once per week.
- MediaWiki has great help resources for self-study (this wiki, doxygen generated stuff and finally the source code looks also sane) but for "best practices" I will certainly need the help of my mentors. Expect me asking a lot of "What is the best approach for … "-questions, especially regarding the PHP-part. This is also the reason I wish two mentors knowledgeable with file handling on the server side. Dependent on what turns out to be more efficient, I'll bug them with e-Mails or on IRC.
- I'll occasionally notify and gather feedback at project chemistry at Wikimedia Commons so it's not going to be vapourware for the reason not being accepted.
- Education completed or in progress
- In progress — something closely related to the enhancements the extension will evolve. But well, I am German. I am careful when it comes to sharing all kind of data with the whole world. In other words, I would appreciate if you won't force me publishing anything specific.
- How did you hear about this program?
I read a post on a mailing list
complaining raising the point that there wouldn't be enough diversity regarding the origin amongst the applicants. I intended to change that with my participation.
- Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?
Some of my time will go into the Pronunciation Recording Gadget. But this has a wider schedule and I'll have plenty of time this spring/summer. Otherwise there are no specific plans for activities like internships or vacation, yet.
- We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?
Outreach Program for Women? Without looking into the details but *I think*, this doesn't apply to me.
- Please describe your experience with any other FOSS projects as a user and as a contributor
- Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them (include links)
- What project(s) are you interested in (these can be in the same or different organizations)?
I prefer projects where I can see the light at the end of the tunnel, and where past experience has proven they're successful, hence my late registration at Wikipedia. Thematically, I like projects around chemistry, media files, uploading, involving communities and feedback cycles. I believe that asking users that are target of the software about their needs, by using specific questions and coming up with different suggestions is a crucial part of software development. Head over to Meta, if you want to see these points proven.