User:Jarry1250/GSoC 2012 roadmap

From mediawiki.org

This is the application by me, Harry Burt (User:Jarry1250) to take part in the 2012 edition of Google Summer of Code.

Identity[edit]

Name
Harry Burt
Email
<myusername>@gmail.com
Project title
TranslateSvg: Bringing the translation revolution to Wikimedia Commons

Contact/working info[edit]

Timezone
London (GMT/UTC+0, very shortly BST/UTC+1)
Typical working hours
9am to 6pm, perhaps (see below)
IRC or IM networks/handle(s)
jarry1250 on Freenode
Other contact details
Onwiki: Jarry1250 (SUL), mostly like to be found hanging out at the English Wikipedia (user page) or on Wikimedia Commons (user page). Twitter: @harryaburt.

Project summary[edit]

The currently rough and ready TranslateSvg extension in operation

In what few hours I managed to find over Christmas, I threw together a quick extension called Extension:TranslateSvg. This proposal attempts to give sufficient resources to turn it into a powerful tool and a viable WMF deployment. The existing extension provides both a starting point and a proof-of-concept, but will require fundamental improvements before it could function on the kind of production wiki where it is "sorely needed", according to one developer with whom I have corresponded.

TranslateSvg has the potential to revolutionise the ability of Wikimedia's diverse groups of image maintainers to work together creating and improving the same communal set of SVG (vector) images. At the moment, providing alternative translations of SVG files typically requires "forking" the image. This drastically increases the image's maintenance burden and thereby discourages image improvement. Where such improvement does take place, it is seldom shared between different language versions. TranslateSvg would completely change this suboptimal workflow by removing the need for the image to be forked; instead, translations (provided using a streamlined special page) would be saved inside the image itself, in accordance with the SVG 1.1 specification. This translation process would be handled on a special page with a native-feel, allowing for seamless and easy translation of files, even by users with little or no knowledge of the mechanics of SVG files, unfamiliar with translation in general, or both.

The file, complete with these embedded translations, could then be displayed in either the language of a wiki, the user's preferred interface language, or any given arbitrary language. If the SVG file were to be served directly, it would helpfully display in the user's system language where such a translation was available, aiding reuse possibilities. When I originally raised this idea it received the support of several Wikimedia Commons users as well as WMF developers.

Deliverables[edit]

Required deliverables[edit]

The final extension should:

  • be able to handle the translation 99%+ of text embedded in SVGs, taking account of (for example) meaningful tspans, italics, bold, superscript, subscript and other formatting using well-documented methods;
  • provide a functioning, internationalisable and polished interface able to adjust for translations (with a "native" feel and a minimum of visual clutter):
    • in different writing scripts,
    • with different x/y positioning,
    • and in right-to-left (RTL) languages;
  • modify file description pages, to enable visitors to view files in different languages and provide them with easy access to the translation mechanism;
  • be written to cope with "evolving" SVG files, i.e. those which go through a repeated translation-modification cycle;
  • be well documented, or, even better, be sufficiently simple that it needs little in the way of official documentation;
  • implement logical and informative permissions and error-handling;
  • and do all of the above in a resource-efficient manner, even for large SVGs; or, if this proves impossible, at least implement a reasonable upper limit with regard to performance (similar to the long-time situation with large PNG files).

If time permits[edit]

The final extension could:

  • provide quality control via a special logging action (reverting being handled natively via MediaWiki);
  • separate a new SVG-translation user right from the usual upload user right to allow for more granular permissions;
  • ensure SVG-file-parsing system is sufficiently logical and/or modular to be easily extensible by future developers.

Project schedule[edit]

I am submitting this under an unusual timetable, but please do bear with me, as I feel it's very much an achievable one:

  • 20 April to 2 June: "Ramp up period" (approximately 12 hours a week), fixing priorities, methods and designs (both for the main translation page and the additions to the file description page). This would include interviews with translators – in order to understand their needs fully – as well as a quantitative analysis of SVGs files extant on Wikimedia Commons looking at which structures will need to be accommodated. Agree performance-friendly methods with mentor and other developers. Blog enthusiastically about the project.
  • 1 June to 3 June: Potential visit to Berlin hackathon, time permitting.
  • 2 June to 9 June: There's not much time available for me in this week, but it's enough time to work on (and probably complete) the changes to file description pages. Continue blogging.
  • 9 June to 23 June: **temporary lull due to unavoidable university-wide examinations** (responding to inquiries, continuing discussions and emails only)
  • 23 June to 22 July: main bulk of development work, 45 hours per week, semi-regular blog posts and tweets.
    • 23 June to 30 June: finish proposed new interface, open up for testing and comments, regular blog posts.
    • 30 June to 7 July: half-hours due to (already booked) holiday; responding to testing and comments full time, making adjustments as required.
    • 7 July to 14 July: do most of work with regard to the extension handling the full plethora of SVG structures, iterative fixes to interface, more communications.
    • 12 July to 14 July: Potential visit to Wikimania, scholarship application permitting.
    • 14 July to 21 July: finish work on handling SVG files, including permissions and error handling.
  • 21 July to 11 August: Heavy-duty code review, documentation, and several rounds of testing with real-world translators.
  • 11 August to 20 August: Final polish, integration focussed elements. Pencils down on official project.
  • 20 August to end of October (provisional): Continue to work on the project part-time, prepare for deployment if required, publish roundup blog posts and plenty of time for "wind down" engagement.

About you[edit]

I'm a 19-year-old student from Colchester in Essex, England. I'm currently studying combined Philosophy, Politics and Economics (PPE) at the University of Oxford. That and my position as the top A-level examinations student in England for my year group are both characteristic of my eternal urge to take on new and exciting challenges, especially where those can make a real difference to the world. PPE isn't a degree that lends itself naturally to a sideline in computer programming, but it's certainly a challenging one that has over the past six months enabled me to perfect my time management and communications skills as well as putting to the test my problem solving and critical thinking capabilities. In my spare time I've continued to program actively, building up contributions to MediaWiki and other open-source projects (details below).

I have chosen internationalisation as the topic of my project because of its ability to engage and empower so many potential contributors so easily. Thus it seems to be an ideal area in which to gain traction even during the limited time period available for full-time development work as part of Google Summer of Code. I've felt for some time as though a great deal could be achieved in this area if only I could set aside a block of time to sit down and work on it—that's why I'm excited to be applying for a Google Summer of Code placement.

Participation[edit]

Friends rarely complain that I communicate too little and talking is a habit I would be unlikely to break during the Google Summer of Code programme. I've got my own blog (already on Planet Wikimedia), which will be the central focus for progress reports, though I'm also an occasional tweeter. Source code will be pushed regularly (probably daily) to the extension's repository on Git. I'll be lurking in IRC throughout and I'm also very responsive to emails when awake, allowing both updates and support to flow forwards and backwards whenever necessary. Testing itself will be documented on extension subpages on MediaWiki.org, allowing for editors to engage with the development process in the mode they are most likely to be happy with. I am reasonably well-known at present, and am typically collegial and drama-intolerant, even to a fault – though the benefit of this is that I feel that I could draw on a wide and varied support network if I needed to without fear of partisanship.

Past open source experience[edit]

My contributions to MediaWiki itself include a number of patches (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, possibly others); several of the more recent ones have taken the form of Gerrit patchsets (example). In the past, I've also submitted patches to Mozilla, as well as putting my many other code contributions (including a successful website, a sizeable collection of Toolserver pages, a sizeable proportion of the Peachy bot framework, bot code, and development work on AutoWikiBrowser) under various libre and semi-libre licenses. Of these, one Toolserver tool I overhauled and rewrote (SVGTranslate) is of particular relevance as it represents my first work in the SVG translation field; a comment made by one of its users provided the idea behind this proposal. Elsewhere, I'm familiar with Git, am already set up with Gerrit (as previously mentioned), and as the regular "Technology report" writer for the English Wikipedia-based Signpost electronic newspaper, am already fluent in the language of Wikimedia-MediaWiki development and deployment processes. I subscribe to the OpenKnowledge Foundation's mailing list, and have participated in several of their themed hackdays, the product of which is always openly licensed. Finally, the Brighton Hackathon enabled me to put faces to several of the Wikimedia names with which I was so familiar whilst working on a Toolserver tool.

Any other info[edit]