User:Livnetata/OPWproposal

From mediawiki.org

Wikipedia article translation metrics[edit]

Public URL
Wikipedia article translation metrics
Announcement
https://lists.wikimedia.org/pipermail/wikitech-l/2014-October/079121.html

Name and contact information[edit]

Name
Neta Livneh
Email
neta.livneh@gmail.com
IRC or IM networks/handle(s)
Livnetata
Web Page / Blog / Microblog / Portfolio
My Linkedin
Location
Israel
Typical working hours
09:00-18:00

Synopsis[edit]

The number and percentages of translated articles in Wikipedia is unknown and yet non-negligible articles are being translated. Having a good estimation would help in understanding that ratio and more importantly, which content is developed in different Wikipedias. Having this insight would help the ContentTranslation team understand in which languages their product is most needed. Moreover, it might give us insights to the way people contribute to Wikipedia. This project aims at building a model that would estimate whether a page is translated or not, using statistical analysis and machine learning tools. 

Possible mentors
Amir E. Aharoni

Deliverables[edit]

Please describe the details and the timeline of the work you plan to accomplish on the project you are most interested in (discuss these first with the mentor of the project):

Day 1-5:

  • Learn about the meta process of how people translate pages.
  • Work with the content translation team and with the analysis team to find possible variables that might signal these pages.

Day 6-15:

  • Research the MediaWiki Documentation for relevant variables in the tables.
  • Choose one language to start from.

Day 15-25:

  • Get the Data.
  • Data Cleaning.

Day 26-45:

  • Work on possible models that will give a good estimate for translated pages.
  • Optimize.

Day 46-60:

  • Midway evaluation.
  • More optimization.

Day 61-75:

  • Generalize the model to other languages.

Day 76-90:

  • Write proper documentation for the code and create graphs.

Participation[edit]

As part of the FOSS OPW internship, I will write a weekly report about my progress, what I learned and what I plan to accomplish in the following week. I plan to send interesting results to my mentor and other people that might be involved. I plan to publish my code on the project directory, working with gerrit. For help, I will use IRC, Google chat or Skype, it depends on the form of the help needed.

About me[edit]

Education completed or in progress

I have BA in Mathematics and Cognitive Sciences (2006-2009) and a master’s degree in Cognitive Sciences (2009-2012), all from the Hebrew University. I'm just starting my PhD studies at the School of Business Administration in the Hebrew university. By conducting large-scale experiments in online communities, I plan to research social influence bias in order to better learn how individual's behavior is changed in response to information from its social network. As side projects, I am interested in analyzing data that was made by people because interesting insights about how people think can be drawn from it.

How did you hear about this program?

I came upon hearing of the program through a page post that advertised the program on Facebook that was shared by Facebook friends.

Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?

I will have classes and school work (should take around half a day each week).

We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?

No, I am only applying to FOSS OPW.

Past experience[edit]

Please describe your experience with the organization's product as a user and as a contributor (include the information, as well as a link or an attachment, for the required contribution you made to the project you are interested in here)

I have used Wikipedia for at least a decade now, and it’s the first source that I choose when I want to learn about a new subject or just answer a question I don't know the answer to. I think it is an amazing project that has made a change in the world.

Please describe your experience with any other FOSS projects as a user and as a contributor

I have used other FOSS projects as R, Gephi and python. Using R especially, has made me understand that FOSS projects not only brake the boundaries between the user and the contributor, but also build a community that is united under the same goal - answering its needs and making the product better. I hope that I would be able to contribute code for these projects in the future.

Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them (include links)

This is my first FOSS project. However, I have previously worked on data analysis projects. At my previous work as an online marketing manager I dealt with finding interesting trends in the data on a daily basis. Moreover, During my studies, I have worked on analyzing data from twitter in order to try and find "hot" trends. Before that, I was part of a research lab that tried to understand the influence of semantic memory on episodic memory. During my time there, I published an article and another one has been submitted for publication. For these projects, I used R, python, and learned machine learning techniques.

What project(s) are you interested in (these can be in the same or different organizations)?

I am interested in the Wikipedia article translation metrics project. I didn’t know the project before doing the micro task but now I think that having data about user’s actions might improve our understanding on the way they engage with the site, and how we can bring more (new) users to contribute.

Microtask[edit]

For the application I contributed two changes to the ContentTranslation project.

  1. Fixed a bug in the file SpecialContentTranslationStats.php that caused an error message when no pages where found to division by zero. See the change here: gerrit.wikimedia.org/r/#/c/167656/.
  2. Added a function to the class ContentTranslationStats that returns for published pages, their publish date and other metrics. This change should still be implimented in order to replace the manualy created list. See the change here: gerrit.wikimedia.org/r/#/c/168101/.