User:Bawolff/GSoC2010

Identity

 * Name: Brian Wolff
 * Email: special:emailuser/bawolff or bawolff+gsoc At sign.svg gmail dot com
 * Project title: Improve metadata support for uploaded media in mediawiki by displaying embedded IPTC and XMP metadata

Contact/working info

 * Timezone: MST (UTC -7)
 * IRC or IM networks/handle(s): bawolff (on #wikinews, #wikinews-en and #mediawiki on freenode)

Project summary
Currently mediawiki only supports displaying jpeg exif metadata on image pages. Some other metadata can also be returned using the api   however this is not that useful to the average viewer since it is not displayed. I propose as my project (should I be accepted and all that) to improve mediawiki's support for metadata of uploaded files. This would include supporting metadata for more media formats, and displaying the metadata on image pages where it is useful. Considering Wikimedia's general stance on copyright/Free-ness, I think being able to view file metadata, especially copyright related metadata, would be a benefit, especially for projects like commons.

About you
I am currently a first year computer science student. I became involved with the wikimedia world first as a contributor to Wikinews several years ago. I also have commit access to mediawiki, and have made some patches, but nothing major as of yet. I would like to participate in Google summer of code as i think it would be a great way to become involved in mediawiki development, as well as an excellent learning experience.

Required deliverables

 * Have some method to output more complex metadata displays then the current table.
 * Improve oggHandler to display the Comment metadata on ogg files (currently collected but not displayed)
 * Output currently collected metadata for pdf files
 * display IPTC for jpeg image
 * parse and display (common) XMP metadata for:
 * png
 * jpeg
 * tiff
 * parse metadata svg
 * display metadata for svg

If time permits

 * Support XMP metadata in other formats (pdf and djvu perhaps)
 * On wiki method of editing metadata in jpeg and png files
 * Support the more complex ogg metadata formats
 * perhaps do something with extracting album art from ogg audio files. (not sure if that would actually be useful or not)

Project schedule

 * per the fixme in, make   more flexible, while retaining backwards compatibility.
 * Since ogg comment data is already parsed, implement formattedMetadata to display the relevant metadata.
 * Similarly for pdf's (although the currently parsed pdf metadata is rather uninteresting)
 * add IPTC data to the metadata collected from jpeg.
 * Address the issue of having the metadata refresh without killing servers somehow (?)
 * Determine a way to present the metadata such as to show both exif and other formats.
 * Extract the XMP info from jpeg
 * Parse extracted XMP metadata
 * Add this xmp to the metadata displayed for jpegs
 * extract XMP metadata from png's
 * Make png files output the collected metadata
 * extract XMP from tiffs
 * Make tiff image page output metadata
 * extract and parse metadata from svg files
 * display the svg metadata

Participation
I plan to hang around #mediawiki and wikitech-l, as those seem to be ideal places to ask for help, and learn from others. I expect to talk with my mentor quite regularly, and ask for advice when needed, as well as receive comments on code review. I expect to submit code as a branch in svn.

Past open source experience
I've made some minor patches to mediawiki, and I also maintain several javascript tools on wikinews, however beyond that i do not have much experience.

Any other info
I was thinking for addressing the issue of formmatedMetadata not being flexible enough (per comments in source code), move the current code in image page that turns the array that formattedMetadata returns into a table to a static method of mediaHandler. Then have formattedMetadata normally return an html string, which would normally be created by a method of mediaHandler. If specific mediahandler subclasses need to do some complicated formatting, they can just return the apropriate html. To retain backwards compatiblility, if the imagepage gets an array from formattedMetadata, it sends it to the static method of mediaHandler. This is just an initial thought.

I was thinking (this is just my initial thoughts) perhaps the metadata table should be like the current table, but have different sections for the different types of metadata. On the other hand, that approach somewhat over-emphasizes the different technical formats, which the average user doesn't care about the different formats. For ogg files, the metadata might be split up by the different streams ("Audio stream ", "Video stream " etc). For example, the metadata for File:Wikinews11Apr2005 Demo (high quality).ogg (With some of the more technical fields hidden by default using the js toggle)

If I do have time, I'd imagine a metadata editor implemented as a special page (perhaps an extension) that user can modify/add fields. On save, it would be roughly equivalent to uploading a new file (like how reverting an image is like uploading a new file), but with a different action in the upload log. I think this would be a really cool feature, but don't want to bite off more then I can chew in the project proposal. If time permits, I would definitely work towards a metadata editor. Currently when people download a file, most metadata on the image description page is lost. If people then find the file on their hard drive six months later, they have no idea where it came from. Imagine if all the information contained in commons:template:description was also in the meta-data. Then when people open the file six months later, their image editing program can inform them the source of the image, the url at commons, its description, etc.