User:Isaac (WMF)
About me
I joined the the Research team at the Wikimedia Foundation on October 2018 as a Research Scientist. I currently live in the glorious New York City, NY, USA.
My work
My background is in geography and human-computer interaction, with a special focus on understanding (and trying to do something about) how structural inequalities find their way online and into algorithmic systems. Since joining WMF, I have also been heavily involved in research towards better understanding reader needs and behavior, how to model and make predictions about Wikimedia content in a language-agnostic manner, and the impact of external re-use of Wikimedia content.
Tools
A collection of tools that I've built (or helped build) for showcasing some of our research work:
- Language-agnostic content tagging models
- List-building models
- Social media traffic report
- Differential privacy parameter exploration
- Search referral data for Wikipedia
- User scripts for visualizing link data
And specifically, a number of Python packages:
- mwparserfromhtml: parsing Wikipedia HTML (parsoid output)
- mwedittypes: structured analysis of wikitext diffs
- wiki-nlp-tools: word / sentence tokenization for Wikimedia content (under development)
- mwsql: parsing Wikimedia SQL dumps
Musings
Various writings about topics relevant to Wikimedia data, research, etc.
- Trade-offs between performance and sustainability in language modeling for Wikimedia
- Potential content tagging models for Wikimedia
- Data gaps that inhibit equitable and effective ML for the Wikimedia projects
- Various analysis "gotchas" when working with Wikimedia data
- Standard approaches to various Wikimedia research tasks
- Aspects to consider when comparing/studying different Wikipedia language editions
Projects
Last updated on 11/07/2022
ActiveActive projects that I am currently working on: |
CompletedCompleted research projects and reports:
|
BackburnerProjects that I've started, but had to put down for the moment: |