User:Damonwang

Identity
Name: Damon Wang

Email: damonwang uchicago edu

Project title: Towards a Python port of Texvc

Contact/working info
Timezone: Chicago or New York (UTC-5 or -4)

Typical working hours: noon to four, eight to midnight (UTC 17:00-21:00, 1:00-5:00)

IRC or IM networks/handle(s): damonwang on freenode

Project summary
Mediawiki pretty-prints math formulae by taking a subset of AMS-TeX and displaying either a PNG rendered via dvi2png or, if the formula is simple enough, an HTML approximation. However, AMS-TeX has numerous features allowing arbitrary code execution, unbounded render time, and similarly undesirable behavior. The current solution is texvc , an OCaml parser which accepts only a safe subset of AMS-TeX, and produces the appropriate HTML or PNG output. Unfortunately, the obscurity of the source language seems to have discouraged maintenance: texvc has some fifty open bugs going back six years; of the texvc bugs that have been resolved, very few involved edits to the OCaml code. It was therefore suggested that texvc be ported to a more popular language.

A PHP port would have been the ideal solution. Mediawiki already has a large pool of PHP developers, calling the external binary is currently the source of a major outstanding bug , and a PHP dependency for a project written in PHP is as good as eliminating a dependency entirely. In fact, a PHP port has been attempted , but omitted all validation of the input, probably because PHP has no existing parsing packages. Judging from the LaTeX2MathML code, which does implement a LaTeX parser in PHP, backward to write the parser manually. Much of the clarity, concision, and robustness of the OCaml texvc comes from the fact we need only maintain a BNF-like input to a parser-generator, rather than the parser itself. As it would be well beyond the scope of GSoC to write a general PHP parsing package, I propose instead to port texvc into Python, which offers the following benefits:


 * popularity: easier to find maintainers, easier for sysadmins to install, easier for new developers to "read themselves into" the codebase for quick patches
 * ubiquity: although it does need an interpreter at run-time, this is not a difficult dependency to satisfy; probably even easier than an OCaml compiler at install-time or a binary distributions
 * several mature parsing packages: potentially no regression from the OCaml version in terms of concision, clarity, and "elegance"

About you
I am a third-year undergraduate who was first started programming in a high school data structures course taught with Scheme and C. Perhaps because I started out with homework exercises, my interest was in small toy problems such as those offered by Project Euler, the USA Computing Olympiad, the Python Challenge , and the system scripts on my laptop. As a relative newcomer to open-source software, especially on the scale of Mediawiki, it's a relief to find a rather isolated corner of the project like texvc where I can make a concrete contribution without needing to grok the entire codebase.

Although Scheme was my first language in high school and Haskell my first language at university, my work as a sysadmin on campus has taught me that big libraries and side effects are often the easiest way to get a job done. For practical programming, Python is my first choice, with shell a close second and Perl tying with Common Lisp as very distant thirds.

We don't just care about your project -- you are a person, and that matters to us! What drives you? What makes you want to make this the most awesomest wiki enhancement ever?

You don't need to write out your life story (we can read your blog if we want that), but we want to know a little about what makes you tick. Are you a Wikipedia addict wanting to make your own experience better? Did a wiki with usability problems run over your dog, and you're seeking revenge? What does making this project happen mean to you?

Deliverables
It should be possible to break down your project into some bullet points describing particular features or milestones which can be reached individually. Consider that we may wish to roll out the system for testing when at an intermediate stage of completion, and that time estimates might vary, leaving you with more time than you expected or (more likely) a lot less -- some features can be pushed back if you end up short.

Project schedule
Try to break your deliverables into "milestone" points which can be reached in sequence. Block out your estimated schedule of when you'll reach each functional milestone. Don't forgot that real time may change -- leave enough wiggle room for your required features to be completed!

Participation
We don't just want to know what you plan to accomplish; we want to know how. Briefly describe your work style: how you plan to communicate progress, where you plan to publish your source code while you're working, how and where you plan to ask for help. (We will tend to favor applicants that demonstrate a clear vision for what it means to be an active participant in our development community)

Past open source experience
Do you have any past experience working in open source projects (MediaWiki or otherwise). If so, tell us about it!

Any other info
If there's other relevant information -- UI mockups, references to related projects, a link to your proof of concept code, whatever. There are no specific requirements, but we love to see people who love what they're doing. Show us you're excited about this project and have an interest in the background and are considering how best to make your idea work.