User:Baconprime/GSoC2012

Identity

 * Name:
 * Connor Bartol


 * Project Title:
 * Automatic Taxobox Usability and Feature Enhancements


 * Email:
 * @gmail.com

Contact Information

 * Timezone:
 * Vancouver (PST/UTC-8)


 * Typical Working Hours:
 * 0900->1700 (although this is highly flexible)


 * IRC or IM networks/handle(s):
 * baconprime on freenode

What The Heck is an Automatic Taxobox
Being developers, you're probably familiar with the concept of taxonomy -- abstracted a little bit so as to make the comparison, it's the result of organizing things into hierarchical tree structures. You've probably seen this taxonomy in action before, such as in the term Homo Sapiens (that's us, btw); here, Homo refers to the genus (the parent taxon of ...), whereas Sapiens refers to the species (... its child). Now that that's out of the way: what is a taxobox?

Click on the link to the Wikipedia article on Homo Sapiens. Notice the box on the right-hand side near the top? It's called a taxobox (short for taxonomy box), and it summarizes some data about the taxon in question (recall that a 'taxon' can be a genus, species, family, order, etc.) in an easily digestible manner. The trick is in making them quickly and well; at the moment, it involves a lot of template trickery and help-page spelunking in order to create a box that looks and functions the way one wants it to. My project is to create a GUI to abstract away writing/editing the template calls (and the templates themselves, as in milestone three) directly so as to generally make life easier for all parties involved.

Automatic taxobox = taxobox that automatically generates the taxonomic hierarchy above and optionally below the indicated taxon. This is useful because one would only have to correct the name or lineage of a taxon in one place, as opposed to at every reference.

Internationalization
Unfortunately, there doesn't appear to be an equivalent to the automatic taxobox system on other language Wikipedias (at least, the fr Wikipedia). That being said, there is still a great deal of use in a WYSIWYG editor for regular taxoboxes regardless of language, and there are sure to be private wikis that implement similar taxonomic systems (not to mention the prospect of representing any sidebox with this GUI! but I'm getting ahead of myself).

I believe that, these points in mind, internationalization should be of low priority; however, any spare time at the end of the project period (and afterwards!) will be dedicated to generalizing and porting the interface for use elsewhere.

Objective
Have the automatic taxobox's main and most used features exposed as a GUI (Extension; but most of the work is done in gadget-land) (the exceptions would be caching, some auto-complete related functionality and some page editing that could potentially be done entirely in JS). An emphasis would be placed on implementing features in an intuitive fashion; there should be a very shallow learning curve for users that had been using the template previously, and a moderate to minimal learning curve for those that are new to the taxobox -- the only way to have it used more frequently in new articles is to make it usable and accurate.

A focus will be placed on making functionality that will make the GUI usable for editing existing taxoboxes or facilitating the conversion of manual to automatic taxoboxes.

To scope the project: the interface should be able to
 * 1) (Entry) Create a new automatic taxobox if one does not already exist
 * 2) (Edit) Edit existing automatic taxobox entries (whether or not they were generated by this script)

Required

 * (Milestone One) Modify template expansion semantics so as to generate easily parsable output.
 * This will help tremendously with parsing, editing and might help ease some data structures-related woes.
 * (M. One) Editing of automatic taxoboxes by a graphical interface, specifically:
 * Taxon hierarchy up to nth level.
 * Image
 * Common name/Synonyms
 * Authority
 * Caption
 * (M. Two) Detection (and guided correction) of red links and errors:
 * For common errors to emphasize checks for, see here.
 * (M. Three) Template (for hierarchies) generation wizard.
 * (M. Two) Basic preview capabilities.
 * (M. Two) Provide options to format certain sections of output:
 * Bolding/italicizing of taxon

If Time Permits

 * (Milestone One) Gracefully fall back to manual taxobox if cannot parse.
 * (M. Four) Include a live preview:
 * This will require some backend PHP work so that the page isn't just auto-generated every time that one presses a key.
 * (M. Four) Autocomplete of common fields.
 * (M. Three) Template editor:
 * This will also require some PHP backend stuff; I'm not sure that it'll be non-trivial convincing mediawiki to edit two pages at once. Alternatively, provide a wizard for creating/editing a taxonomic template on the template page itself.
 * (M. Three) Include a taxon creation wizard (i.e. that formalizes this).
 * (M. Five) Categorize and create widgets for all UIs.
 * (M. Three) Cover special cases and correct ordering/formatting in speciesbox/subspeciesbox/infraspeciesbox.
 * (M. Three) Autosave and edit persistance (usability for the interface).
 * (M. Six) Port to non-english Wikipedia.
 * Make direct hierarchy input more robust; on wikis without the taxonomic templates set up, all taxobox input is manual.
 * Replace en-specific terminology in templates with a generic system.
 * Allow for language-specific features (e.g. maybe fr's taxobox template allows for the blink tag, whereas en's doesn't).
 * Allow for language-specific formatting (e.g. maybe ja's taxobox template can define word directionality)
 * Translate.
 * (M. Six) Handle Sideboxes Generically
 * Extend the editor to function with the city infobox.
 * Remove any taxobox-specific strucutre.
 * Test on non-english mediawiki deployments.
 * Improve the interface to function with edge-case shaped sideboxes.

Project Schedule/Milestones
It's expected that ~1.8*n weeks (where n is the listed estimate) are needed to complete each milestone -- this time will be used for testing against existing code, testing against other humans and as a buffer against real life (e.g. laptop bursts into flame, internet connection dies, etc.)

I will only be taking one or two classes in university over the summer, meaning that it will not interfere with my outlined schedule. Additionally, neither do I have any vacations planned.

One
Able to edit certain parameters on existing taxoboxes, as well as creating them. No preview; just a form with a set of key-value pairs (e.g. taxon name - canis lupus, etc.) ~2.5-3 weeks (April 23rd -> May 21st-27th)
 * 1) Internal representation (1 week)
 * 2) * Breakdown into 'hierarchy'
 * 3) * Code in logic (e.g. what elements can work with what which others, etc.)
 * 4) * Think of an optimal way to store all of those properties (or just use one, long object)
 * 5) * Integrate with rdfQuery.
 * 6) Create GUI (1-1.5 weeks)
 * 7) * Group functionality, work out how best to organize UI.
 * 8) * Make some flow diagrams
 * 9) * Float a mockup by as many current taxobox users as possible, iterate until something sticks.
 * 10) Modify Template Expansion (.5 weeks)
 * 11) * Inject RDF tags in opportune locations in the template parser.
 * 12) * Verify standards adherence.

Two
System can detect common errors (e.g. specifying a grandparent name when impossible, etc.), as well as offer the possibility of converting taxoboxes losslessly to automatic taxoboxes (if it is so desired). User initiated previews are added. ~1-1.5 weeks (May 21st-27th -> June 3rd-June 15th)
 * 1) Conversion (~4 days)
 * 2) * Determine empirically what is 'safe' and what isn't safe to convert to an automatic taxobox.
 * 3) * Extract and compare generated to explicit taxonomies in the taxobox.
 * 4) * Provide recommendations for user (e.g. when it's 100% safe to replace, when there are unknown tags, when there's a known conflict, etc.)
 * 5) Error Detection (~3 days)
 * 6) * Document resolutions to errors.
 * 7) * Augment UI to provide user feedback on broken sections.
 * 8) Create GUI (~3 days)
 * 9) * Determine how best to represent detected flaws.
 * 10) * Mockup, test, rinse and repeat.
 * 11) * Hook into API:Parsing to provide a user-activated preview.
 * 12) ** Lay the groundwork for the live preview in milestone four.

Three
GUI is polished and will allow for the guided generation of new template pages. Allows for editing of existing template pages. ~1.5-2 weeks (June 3rd-June 15th -> June 22nd-July 10th)
 * 1) Create GUI (1-1.5 weeks)
 * 2) * Work up some flow diagrams
 * 3) * Iterate through mockups until something catches users' eyes'.
 * 4) * Represent tree structure in an intuitive fashion.
 * 5) Backend (1 week)
 * 6) * Determine whether or not it would be feasible to implement as a) a wizard on a template page or b) a wizard on the article that accesses the template.
 * 7) * Hook into API:Edit
 * 8) * Work out object representations in the template (hint: trees)
 * 9) * Implement logic -- for example, a child taxon cannot contain a taxon that dominates it.
 * 10) * Implement edge cases, such as polyphyly.

Four
For certain types of modifications, the preview window will be able to show live changes. Certain fields autocomplete. Emphasis stays on from-based (key-value pair) manipulation. ~3.5-4 weeks (June 22nd-July 10th -> August 3rd)
 * 1) Live-preview (2 weeks)
 * 2) * Create a caching layer for templates (should still rely heavily on ). (non-trivial)
 * 3) * Modify the client-side such that it can affect certain changes without server retrieval (e.g. changing names of already fetched items, setting a different image URL, etc.)
 * 4) * Augment UI to cleanly add and remove preview (i.e. add controls to host window, make some transition effects).
 * 5) * Work out triggers for preview (e.g. per-key for text fields, on enter for images, etc.). This isn't necessarily trivial, as people have preconceived notions about how previews should function.
 * 6) Autocomplete (1-1.5 weeks)
 * 7) * Find a good place to store a dictionary.
 * 8) * Hook into jQuery's autocomplete library.
 * 9) Visual Editor Integration (.5 weeks)
 * 10) * Clean up code; start to strip out any taxobox-specific structures.
 * 11) * Get as much working as well as possible in the time allotted.

Five
GUI is expanded to include wizards, switches and fields for all applicable options. Emphasis is placed on keeping the main editing window free of excess clutter. ~1 week (August 3rd -> August 10th)
 * 1) Document All Parameters (~2.5 days)
 * 2) * Determine sets of valid input.
 * 3) * Write a short description of how to use the parameter.
 * 4) * Encode where not to use the parameter (e.g. when it conflicts).
 * 5) Create GUI (~3.5 days)
 * 6) * Categorize the many, many parameters. (note: are there parameters that can be referrenced more than once in the template?)
 * 7) * Mockup, test, rinse and repeat.

Six
~? (post-Summer)
 * 1) Generalize GUI for all Sideboxes
 * 2) Generalize to Infobox
 * 3) * Extend RDF annotations as required.
 * 4) * Document best practices 'in the wild'.
 * 5) * Classify workflow so as to optimize for both infoboxes and taxoboxes.
 * 6) * Generate and edit valid, complete infoboxes.
 * 7) Generalize to Sideboxes
 * 8) * Define the typical sidebox template.
 * 9) ** List common parameters and positions.
 * 10) ** Either integrate with a WYSIWYG framework, or implement a click-based selection interface yourself.
 * 11) * Implement an editor that functions well in the general case.
 * 12) Internationalize

About Me
I'm 19 years old, starting my third year at Simon Fraser University for a joint major (BSc); Computational Linguistics. My absolute favourite programming language at the moment is javascript (because it's marvelously functional), although in terms of experience I'm split between it and java. I've dabbled with PHP, but in order to write at a reasonable speed I need a reference virtually cracked open in front of me.

My minor's in cognitive science, and I adore creating well-designed user interfaces. No really! Writing usable, intelligent software is always a goal in the back of my mind (an objective that extends to making good, clean and reusable code), so building web UIs is a natural interest of mine.

I'm hoping to get a start with open source coding through GSoC (and in particular, with the mediawiki community!). Nearly every project I've worked on in my life has been assigned for a class; the few that weren't were the only interesting ones, so the prospect of working on anything that I want on software that I use regularly appeals greatly to me.

Oh, and I'm french bi-lingual, in case there's a mentor who's made it this far down the page and finds that important.

Participation
I like coding in a social environment; after settling into a community or group, I chat regularly to get feedback on work after I implement each 'chunk' (e.g. a particularly beefy method, a small class, etc.). I'd plan to post regular progress updates on a subpage of my profile, in addition to equally regular updates on the mailing list (to hopefully attract more eyes to my code). I've been taught through my university courses to use SVN, but I'm competent with git (did the openhatch tutorial, read a few more online...) and willing to use it heavily.

I have a strong bias towards asking questions in a media that allows for quick clarifications; the IRC room will always be my first choice for seeking coding advice, followed by the mailing list. If my mentor has some other means of communication (IM, prioritized email address) I will pester them endlessly to make sure that what I'm coding makes sense and that I'm not overlooking anything important.

Past Open Source Experience

 * None at the moment, but I'll be fixing minor bugs shortly!