User:Umang13/Gsoc14

Support for New Media file types (X3D OR COLLADA) in commons:
URL : https://www.mediawiki.org/wiki/User:Umang13/Gsoc14

Related Bug :  https://bugzilla.wikimedia.org/show_bug.cgi?id=1790

Announcement : http://lists.wikimedia.org/pipermail/wikitech-l/2014-March/075381.html

Contact Information
Umang Sharma

OBH, Palash Niwas,

IIIT Hyderabad, Gachibowli - 500032

Hyderabad, India

Mobile Number: +919985341596

Email ID : umange@gmail.com, umang.sharma@students.iiit.ac.in

Uses Cases

 * Jack feels that the NASA ISS model available at the NASA web page should be available to the entire community. However, there is no support for a format that would allow Jack to show this model to the people in its full glory.


 * Albert just discovers some discrepancy in the molecular DNA model he was studying. He feels the best way to publish his views and finds is by involving the community. However, the best case he believes is if people don't have to visualize his theory but rather it be presented to them in a self explanatory 3D model. If only there was support for a format that would allow him to upload the model from his PC on to the commons database.

Outline
The Commons is a database of millions of media files with support for many formats. However, There have been requests for the support of several file types that are currently unavailable. One pretty common request was for X3D file type that is used to support and represent 3D computer graphics. For this project I want to focus on being able to implement the support for this file. Also associated with this format is another format called COLLADA which more or less serves the same purpose as X3D. The main objective of my project shall be to incorporate either of these two file types (whichever the community needs more) into the supported file types available for the commons. To make these files supportable on the commons is not the only challenge of the project. These files have to be dealt with in a way such that their content can be validated by Wikimedia to ensure that it is well formed and does not contain any type of malware/trojan content or potential cross site scripting (XSS) vulnerabilities. The implementation that would be used to support these file formats has to be made in a manner that it is supported by maximum (at least the popular ones) browsers and there are appropriate fall back procedures for ones that don't support them.

Possible Mentors : Bryan Davis

Deliverables

 * A complete solution that would enable wikimedia to support either X3D or COLLADA files on the commons and allow users to upload the same.


 * Fixing and updating the raster image generator application to be able to sustain a viable solution without the help of third party plugins.


 * Security measures against improperly formed files and checks for malware or trojan content in the file.


 * Fall back procedures for browsers that cannot incorporate the developed solution.

(BONUS FEATURES)

Further if time permits,


 * Add support for the file type not selected. It is possible that after the work done on the first file type, working on the second might be trivial and on similar lines. In this case, support for the second file type shall also be implemented.


 * A separate JavaScript viewer that is capable of rendering and manipulating the 3D model directly in the browser from a wiki page. This would allow even future additions to the supported file formats easier to handle. (This might require quite a bit of time)

Implementation Outline
During the implementation, special care has to be taken to ensure that the files are clean. We should be able to validate the file contents during the upload to make sure that the files have a normal structure and formation. Further, there should be security against all possible threats like trojans, malware, potential cross site scripting elements (XSS) that may try to make their way along with the files being uploaded. Pre-existing security measures during upload may not suffice as support needs to be extended for new file types which are generally going to be large and different. Once the files have been validated, the image has to be somehow represented on the wiki page. This will require us to extract all contents from the file (particularly meta-data) so that it can be presented on the wiki page.The extraction process may require different methods for different file formats and thus has to be implemented individually. Finally, we need to render these images into a format that is supported by the commons. These images need to be displayed by any web browser and should not require any JavaScript or third party plug ins to do so. The method used so far by Mediawiki is to render all files into PNG format and then thumbnail it. However, using the same procedures for a 3D file will take a lot of time and may not even work as expected. The raster image generation thus needs to be efficient and resourceful in terms of both speed and memory. Furthermore, the already existing renderer might also need improvements so as to incorporate these new file types.

Timeline

 * 21 st April – 18 th May : Get familiar with the community, code base. Establish all communication mediums required for the project.
 * 19 th May – 29 th May : Get started with the project. First aim is to enable validation process for this category of files. Ensure proper formation of file and check all contents. May need to upgrade previous verifier or build a new one on similar lines as the previous one.
 *  31 st May – 10 th June: Now, need to extract the meta data from the files. We will need this data later on for the rendering process.
 *  12 th June – 22 nd June: Use MediaWiki's previous method of representing files that is by creating thumbnails. For this we need to be able to generate a high quality raster image in reasonable time without the use of any third party plug ins.
 *  24 th June – 1 st July: Improve the code developed so far to completely incorporate the X3D/COLLADA files. Mid Term Evaluations in between.
 * 2 nd July – 15 th July: Create thumbnails from the generated raster images. Embed these into the media wiki page. Integration period.
 * 17 th July – 8 th August: Most work should have been over by now. Check the other file type (whichever of the two I did not work on). If the implementation of this new format is trivial on basis of the work already done then incorporate the second format as well. Try working at the JavaScript viewer and implement other bonus features. Project is ready.
 * 8 th August - 18 th August: If I am working on the JavaScript reader/second file type then I shall continue on it. Other wise any other features (not too big) can be implemented according to the mentor's wish. All codes submitted shall be reviewed.

Participation
As shown in the timeline, I have divided this project into several phases. The work mentioned in each time bracket shall be my milestones. I have given very short breaks in between every two time brackets. This may be also used for documentation of the work along with rest. I prefer to work in nights. Since I live in India (UTC + 5:30) it might mean some other time for my mentors. My typical work timings would be 1200 hrs - 1700 hrs and 2200 hrs - 0400 hrs. This is my preference, however, I would be flexible to change my schedule if needed.

To communicate with my mentors I am open to all mediums. I have a decent connection at my home and shall be available at whatever communication method fits my mentor. I would prefer regular video chats as I feel it provides best coordination. Apart from that I shall be a regular at the IRC as I feel the best help I can get would be from the community itself. Mails are obviously an option when nothing else works. I plan to update my mentors with every  update in my project and get feedback at the related IRC channels after completion of each milestone.

About Me
Hi! I am Umang. I am a 2 nd year student at IIIT-H doing my Bachelors in Computer Science Engineering. It is one of the best universities in India for Computer Science. I have been a regular user of Linux and am comfortable with it. I heard of this program from a friend of mine who has been a past applicant. During this time, the project shall be my sole focus and first priority as the duration of the project lies directly in my summer break from college. Also, this is the only project that I am applying for this year. I feel this project shall solve a long lasting problem (9 years I guess) faced by many people in the community. Also, I believe with all the technology available one should be able to express his views in whatever way best describes his find. Not being able to represent computer graphics properly definitely is a flaw and should be fixed.

I am relatively new to open source but have worked in projects of similar kind in the past. Previously, I have worked on debugging the interface of a language translator (ANUSAARAKA) as a part of my college project. It involved work in PHP, JavaScript, HTML and was a 3 month project.