User:Baconprime/GSoC2012

Identity

 * Name:
 * Connor Bartol


 * Project Title:
 * Automatic Taxobox Usability and Feature Enhancements


 * Email:
 * @gmail.com

Contact Information

 * Timezone:
 * Vancouver (PST/UTC-8)


 * Typical Working Hours:
 * 0900->1700 (although this is highly flexible)


 * IRC or IM networks/handle(s):
 * baconprime on freenode

What The Heck is an Automatic Taxobox
Being developers, you're probably familiar with the concept of taxonomy -- abstracted a little bit so as to make the comparison, it's the result of organizing things into hierarchical tree structures. You've probably seen this taxonomy in action before, such as in the term Homo Sapiens (that's us, btw); here, Homo refers to the genus (the parent taxon of ...), whereas Sapiens refers to the species (... its child). Now that that's out of the way: what is a taxobox?

Click on the link to the Wikipedia article on Homo Sapiens. Notice the box on the right-hand side near the top? It's called a taxobox (short for taxonomy box), and it summarizes some data about the taxon in question (recall that a 'taxon' can be a genus, species, family, order, etc.) in an easily digestible manner. The trick is in making them quickly and well; at the moment, it involves a lot of template trickery and help-page spelunking in order to create a box that looks and functions the way one wants it to. My project is to create a GUI to abstract away writing/editing the template calls (and the templates themselves, as in milestone three) directly so as to generally make life easier for all parties involved.

Automatic taxobox = taxobox that automatically generates the taxonomic hierarchy above and optionally below the indicated taxon. This is useful because one would only have to correct the name or lineage of a taxon in one place, as opposed to at every reference.

Internationalization
Unfortunately, there doesn't appear to be an equivalent to the automatic taxobox system on other language Wikipedias (at least, the fr Wikipedia). That being said, there is still a great deal of use in a WYSIWYG editor for regular taxoboxes regardless of language, and there are sure to be private wikis that implement similar taxonomic systems (not to mention the prospect of representing any sidebox with this GUI! but I'm getting ahead of myself).

I believe that, these points in mind, internationalization should be of low concern; however, any spare time at the end of the project period (and afterwards!) will be dedicated to generalizing and porting the interface for use elsewhere.

Objective
Have the automatic taxobox's main and most used features exposed as a GUI (Extension; but most of the work is done in gadget-land) (the exceptions would be caching, some auto-complete related functionality and some page editing that could potentially be done entirely in JS). An emphasis would be placed on implementing features in an intuitive fashion; there should be a _very_ shallow learning curve for users that had been using the template previously, and a moderate to minimal learning curve for those that are new to the taxobox -- the only way to have it used more frequently in new articles is to make it usable and accurate.

A focus will be placed on making functionality that will make the GUI usable for editing existing taxoboxes or facilitating the conversion of manual to automatic taxoboxes.

To scope the project: the interface should be able to
 * 1) (Entry) Create a new automatic taxobox if one does not already exist
 * 2) (Edit) Edit existing automatic taxobox entries (whether or not they were generated by this script)

Required

 * (One) Modify template expansion semantics so as to generate easily parsable output.
 * This will help tremendously with parsing, editing and might help ease some data structures-related woes.
 * (One) Editing of automatic taxoboxes by a graphical interface, specifically:
 * Taxon hierarchy up to nth level.
 * Image
 * Common name/Synonyms
 * Authority
 * Caption
 * (Two) Detection (and guided correction) of red links and errors:
 * For common errors to emphasize checks for, see here.
 * (Three) Template (for hierarchies) generation wizard.
 * (Two) Provide options to format certain sections of output:
 * Bolding/italicizing of taxon

If Time Permits

 * (One) Gracefully fall back to manual taxobox if cannot parse.
 * (Four) Include a live preview:
 * This will require some backend PHP work so that the page isn't just auto-generated every time that one presses a key.
 * (Four) Autocomplete of common fields.
 * (Three) Template editor:
 * This will also require some PHP backend stuff; I'm not sure that it'll be non-trivial convincing mediawiki to edit two pages at once. Alternatively, provide a wizard for creating/editing a taxonomic template on the template page itself.
 * (Three) Include a taxon creation wizard (i.e. that formalizes this).
 * (Five) Categorize and create widgets for all UIs.
 * (Three) Cover special cases and correct ordering/formatting in speciesbox/subspeciesbox/infraspeciesbox.
 * (Three) Autosave and edit persistance (usability for the interface).
 * (Six) Port to non-english Wikipedia.
 * Make direct hierarchy input more robust; on wikis without the taxonomic templates set up, all taxobox input is manual.
 * Replace en-specific terminology in templates with a generic system.
 * Allow for language-specific features (e.g. maybe fr's taxobox template allows for the blink tag, whereas en's doesn't).
 * Allow for language-specific formatting (e.g. maybe ja's taxobox template can define word directionality)
 * Translate.

Project Schedule
It's expected that ~1.8*n weeks (where n is the listed estimate) are needed to complete each milestone -- this time will be used for testing against existing code, testing against other humans and as a buffer against real life (e.g. laptop bursts into flame, internet connection dies, etc.)

One
~2.5-3 weeks
 * 1) Internal representation (1 week)
 * 2) * Breakdown into 'hierarchy'
 * 3) * Code in logic (e.g. what elements can work with what which others, etc.)
 * 4) * Think of an optimal way to store all of those properties (or just use one, long object)
 * 5) * Integrate with rdfQuery.
 * 6) Create GUI (1-1.5 weeks)
 * 7) * Group functionality, work out how best to organize UI.
 * 8) * Make some flow diagrams
 * 9) * Float a mockup by as many current taxobox users as possible, iterate until something sticks.
 * 10) Modify Template Expansion (.5 weeks)
 * 11) * Inject RDF tags in opportune locations in the template parser.
 * 12) * Verify standards adherence.

Two
~1-1.5 weeks
 * 1) Conversion (~4 days)
 * 2) * Determine empirically what is 'safe' and what isn't safe to convert to an automatic taxobox.
 * 3) * Extract and compare generated to explicit taxonomies in the taxobox.
 * 4) * Provide recommendations for user (e.g. when it's 100% safe to replace, when there are unknown tags, when there's a known conflict, etc.)
 * 5) Error Detection (~3 days)
 * 6) * Improve the parser from milestone one to handle broken or non-compliant code.
 * 7) ** e.g. forgot a pipe character, referrenced an unknown parameter
 * 8) * Document resolutions to errors.
 * 9) * Augment UI to provide user feedback on broken sections.
 * 10) Create GUI (~3 days)
 * 11) * Determine how best to represent detected flaws.
 * 12) * Mockup, test, rinse and repeat.

Three
~1.5-2 weeks
 * 1) Create GUI (1-1.5 weeks)
 * 2) * Work up some flow diagrams
 * 3) * Iterate through mockups until something catches users' eyes'.
 * 4) * Represent tree structure in an intuitive fashion.
 * 5) Backend (1 week)
 * 6) * Determine whether or not it would be feasible to implement as a) a wizard on a template page or b) a wizard on the article that accesses the template.
 * 7) * If working from the article (editing two documents at once), determine/establish best practice for multi-page editing.
 * 8) * Work out object representations in the template (hint: trees)
 * 9) * Implement logic -- for example, a child taxon cannot contain a taxon that dominates it.
 * 10) * Implement edge cases, such as polyphyly.

Four
~3.5-4 weeks
 * 1) Live-preview (2 weeks)
 * 2) * Create a caching layer for templates. (non-trivial)
 * 3) * Write a parser for the client-side that can affect certain changes without server retrieval (e.g. changing names of already fetched items, setting a different image URL, etc.)
 * 4) * Augment UI to cleanly add and remove preview (i.e. add controls to host window, make some transition effects).
 * 5) * Work out triggers for preview (e.g. per-key for text fields, on enter for images, etc.). This isn't necessarily trivial, as people have preconceived notions about how previews should function.
 * 6) Autocomplete (1-1.5 weeks)
 * 7) * Find a good place to store a dictionary.
 * 8) * Hook into jQuery's autocomplete library.
 * 9) Visual Editor Integration (.5 weeks)
 * 10) * Lift wizard functionality for images and formatting.
 * 11) * Interface with display code

Five
~1 week
 * 1) Document All Parameters (~2.5 days)
 * 2) * Determine sets of valid input.
 * 3) * Write a short description of how to use the parameter.
 * 4) * Encode where not to use the parameter (e.g. when it conflicts).
 * 5) Create GUI (~3.5 days)
 * 6) * Categorize the many, many parameters. (note: are there parameters that can be referrenced more than once in the template?)
 * 7) * Mockup, test, rinse and repeat.

Six
~? (post-Summer)
 * 1) Generalize GUI for all Sideboxes
 * 2) Generalize to Infobox
 * 3) * Extend RDF annotations as required.
 * 4) * Document best practices 'in the wild'.
 * 5) * Classify workflow so as to optimize for both infoboxes and taxoboxes.
 * 6) * Generate and edit valid, complete infoboxes.
 * 7) Generalize to Sideboxes
 * 8) * Define the typical sidebox template.
 * 9) ** List common parameters and positions.
 * 10) ** Either integrate with a WYSIWYG framework, or implement a click-based selection interface yourself.
 * 11) * Implement an editor that functions well in the general case.
 * 12) Internationalize

Participation
I like coding in a social environment; after settling into a community or group, I chat regularly to get feedback on work after I implement each 'chunk' (e.g. a particularly beefy method, a small class, etc.). I'd plan to post regular progress updates on a subpage of my profile, in addition to equally regular updates on the mailing list (to hopefully attract more eyes to my code). I've been taught through my university courses to use SVN, but I'm competent with git (did the openhatch tutorial, read a few more online...) and willing to use it heavily.

I have a strong bias towards asking questions in a media that allows for quick clarifications; the IRC room will always be my first choice for seeking coding advice, followed by the mailing list. If my mentor has some other means of communication (IM, prioritized email address) I will pester them endlessly to make sure that what I'm coding makes sense and that I'm not overlooking anything important.

Past Open Source Experience

 * None at the moment, but I'll be fixing minor bugs shortly!