Wikimedia Technical Talks

Overview
Technical talks are presentations created by and for members of the Wikimedia technical community. Technical talks cover technical concepts and ideas that make it easier for others to contribute to Wikimedia projects. Formal technical talks are typically broadcast live, recorded and posted publicly for others view.

This page contains information about Wikimedia technical talks and how to view or contribute to them. You will also find links to additional information about setting up and creating technical talks.

Who can give a technical talk?
Short answer: You!

Long answer: Technical talks are open to anyone who wants to share what they know about the technology we use on Wikimedia projects.

Why give a technical talk?
Technical talks are a good way to share information with other people who are working in technical spaces on Wikimedia projects. We all benefit from each other's knowledge, and we all benefit from having different ways to learn.

When you produce a technical talk, you build your skills as a public speaker, story teller, trainer, and teacher. You become more visible to others in the community, and you contribute valuable knowledge to the Wikimedia movement.

Who watches technical talks?
Technical talks are broadcast live and archived for anyone to view in the future. Most people in your audience will be technical contributors just like you.

Wikimedia monthly technical talk
The Wikimedia Foundation currently supports a monthly technical talk with A/V support and hosting for speakers. This talk is generally 45 min in length with a live question and answer session at the end. These talks are broadcast using Google Hangouts On Air, and are immediately available on the MediaWiki Youtube channel. Viewers can ask questions on the IRC channel, which is reserved for the talk.

Wikimedia monthly technical talks are announced via email lists and social media. Speakers are encouraged to upload supporting material, including slideshows and videos to Wikimedia Commons following their talk.

Propose a Wikimedia monthly technical talk
Monthly technical talks can be scheduled up to 6 months ahead of time. You can propose a technical talk for 2020 by creating a task in Wikimedia Phabricator. (See Phabricator/Help.)

An organizer will reach out to you to discuss the proposed talk and to schedule a time if the talk is appropriate for this series.

Other technical talk opportunities
The Wikimedia monthly technical talk is only one forum for sharing technical information with others. Some potential speakers may want to start with a shorter talk, or they may want to record their own talk or video to share.

Learn more about recording technical talks and video tutorials.

Tips for successful tech talks
Need some guidance to help you get started with you talk? See these tips!

Episode 8: Modern Event Platform
September 23, 2020 at 17:00 UTC

Join Youtube stream:

Slides: TBA

Speaker: Andrew Otto

Topic Areas: Technology

Description: TBA

Episode 7: openZIM/Kiwix ETL toolchain for Wikipedia dumping
August 26, 2020 at 17:00 UTC

Join Youtube stream:

Slides: TBA

Speaker: Emmanuel Engelhart / Kelson

Topic Areas: Technology

Description: Summary of the talk: Enjoying Wikipedia offline wherever, whenever is easy with Kiwix. But behind the scenes, a bunch of tools are needed to make it work. From article selection to dump publishing through scraping, optimisation and packaging: here is a quick overview of how we do it.

Episode 6: Retargeting extensions to work with Parsoid
August 12, 2020 at 17:00 UTC

Join Youtube stream: Youtube

Slides: TBA

Speaker: Subramanya Sastry

Topic Areas: Technology

Description: The Parsing team is aiming to replace the core wikitext parser with Parsoid for Wikimedia wikis sometime late next year. Parsoid models and processes wikitext quite differently from the core parser (all that Parsoid guarantees is that the rendering is largely identical, not the specific process of generating the rendering). So, that does mean that extensions that extend the behavior of the parser will need to adapt to work with Parsoid instead to provide similar functionality [1]. With that in mind, we have been working to more clearly specify how extensions need to adapt to the Parsoid regime. At a high level, here are the questions we needed to answer:

How do extensions "hook" into Parsoid?

When the registered hook listeners are invoked by Parsoid, how do they process any wikitext they need to process? How is the extension's output assimilated into the page output?

Broadly, the (highly simplified) answers are as follows: Extensions now need to think in terms of transformations (convert this to that) instead of events (at this point in the pipeline, call this listener). So, more transformation hooks, and less parsing-event hooks. Parsoid provides all registered listeners with a ParsoidExtensionAPI object to interact with it which extensions can use to process wikitext.

The output is treated as a "fully-processed" page/DOM fragment. It is appropriately decorated with additional markup and slotted into place into the page. Extensions need not make any special efforts (aka strip state) to protect it from the parsing pipeline. In this talk, we will go over the draft Parsoid API for extensions [2] and the kind of changes that would need to be made. While in this initial stage, we are primarily targeting extensions that are deployed on the Wikimedia wikis, eventually, all MediaWiki extensions that use parser hooks or use the "parser API" to process wikitext will need to change. We hope to use this talk to reach out to MediaWiki extension developers and get feedback about the draft API so we can refine it appropriately. [1] https://phabricator.wikimedia.org/T258838

[2] https://www.mediawiki.org/wiki/Parsoid/Extension_API

Episode 5: Beyond Wikipedia - Knowledge that even a computer can understand
July 22, 2020 at 17:00 UTC

Join Youtube stream: Youtube

Slides: Google Slides

Speaker: Zbyszko Papierski

Topic Areas: Technology

Description: Everybody knows what Wikipedia is, right? This magnificent source of knowledge has been helping countless people with their everyday lives for nearly two decades. Whether you want to know how to calculate the circumference of the circle, whether hyenas are pack animals or what really happened to the Ottoman Empire - Wikipedia’s got your back.

Well, unless you happen to be a computer.

One issue with Wikipedia is that knowledge there isn’t very well structured. There are links to other pages, sure - but unless you actually understand the text, you won’t understand what the link actually is. This is, of course, a field day for AI/ML experts - and there are a lot of people already scavenging Wikipedia for any meaningful relations. Fortunately, this is not the only way.

Enter Wikidata - Wikipedia’s younger sister. Wikidata is also a source of knowledge curated and provided by a community of volunteers, but presented in a relational graph format. Structuring the knowledge has huge ramifications - it not only makes it easier to digest by software, but also allows you to infer new knowledge.

There are different ways for developers to interact with Wikidata, but we’ll focus on Wikidata Query Service - a service my team is responsible for. It provides a queryable interface - using an RDF graph language called SPARQL (not to be confused with a hundred other things in IT with “spark” in the name).

Let’s do some discovery!

Episode 4: API portal and gateway project
June 05, 2020 at 17:00 UTC

Join Youtube stream: Youtube

Slides:

Speaker: Evan Prodromou

Topic Areas: API, technology

Description: How does Wikimedia become "the essential infrastructure in the ecosystem of free knowledge"? One way is by making a platform that helps software developers become successful. In this talk, Evan Prodromou, Product Manager for APIs in the Platform Team, discusses the ongoing work to provide a Wikimedia developer platform. With this platform, app creators can include Wikimedia data and content into their software in new and emergent ways. From modernizing our API paradigm, through unified user authorization, documentation, and developer onboarding, the Platform team is working to make a developer experience that rivals those from other major Internet players.

Links
 * API Gateway project page

Episode 3: The basics of cryptography using OpenPGP and GnuPG
April 29, 2020 at 17:00 UTC

Join Youtube stream: Youtube

Slides: TBA

Speaker: Lars Wirzenius

Topic Areas: Technology, structured data

Description: OpenPGP is the prevalent standard for cryptography for secure software distribution and GnuPG is its prevalent open source implementation. This talk introduces things at a conceptual level: what cryptography is for, why is it useful, and the basic use of GnuPG by creating cryptographic keys, using digital signatures, and encryption. No previous experience with GnuPG or OpenPGP is needed, but all examples will be using the Linux command line.

Episode 2: Understanding Wikimedia Maps and its challenges
March 25 2020 at 18:00 UTC

Join Youtube stream: Youtube

Slides:

Speaker: Mateus Santos, Software Engineer

Topic Areas: Maps, product, Site reliability

Description: The WMF Product Infrastructure Team has been maintaining the Wikimedia Maps service for the last year and a half with help from SRE. This talk will share the challenges and work of creating a better development environment to enhance productivity, solve technical debt and keep up with platform modernization.

Episode 1: Data and Decision Science at Wikimedia
February 26, 2020 at 18:00 UTC

Join Youtube stream: Youtube

Slides: TBA

Speaker: Kate Zimmerman, Head of Product Analytics at Wikimedia

Topic Areas: Technology, data, data visualization

Description:

How do teams at the Foundation use data to inform decisions? Sarah Rodlund talks with Kate Zimmerman, Head of Product Analytics at Wikimedia, about what sorts of data her team uses and how insights from their analysis have shaped product decisions.

Kate Zimmerman holds an MS in Psychology & Behavioral Decision Research from Carnegie Mellon University and has over 15 years of experience in quantitative and experimental methods. Before joining Wikimedia, she built data teams from scratch at ModCloth and SmugMug, evolving their data capabilities from basic reports to strategic analysis, automated dashboards, and advanced modeling.

Links mentioned in talk:


 * Product Analytics
 * Data dumps
 * Wikimedia Stats

Episode 11: Structured data on commons
December 11, 2019 at 18:00 UTC 

Join Youtube stream: Youtube

Slides: TBA

Speaker: Cormac Parle, Software Engineer

Topic Areas: Technology, structured data

Description:

The talk will cover Structured Data on Commons:


 * what structured is
 * the structured data we store for a media file on commons
 * where we store it
 * how it helps with search
 * the UI and the API calls we use to manipulate it

Episode 10: Wikidata, behind the curtain
November 20, 2019 at 19:00 UTC, 45 Minutes

Join Youtube stream: Youtube

Slides:

Speaker: Amir Sarabadani, Software Engineer

Topic Areas: Technology, Wikidata

Description:

Wikidata is a complex and large-scale project. We all know how to use it and how to contribute to it but it's a little bit hard to understand how it actually works, how it scales and what parts are tricky about it. To lots of developers, it's a black box and this is not good. This talk plans to explain internals of Wikidata to other developers and explain future changes to Wikidata on its technical layer.

Episode 9: ResourceLoader tips and tricks
October 23, 2019 at 45 Minutes

Join Youtube stream: Youtube

Slides: TBA

Speaker: Roan Kattouw, Principle Software Engineer

Topic Areas: Technology

Description:

Did you know that you could require files in JavaScript? That you could make your own icon modules with 10 lines of code? That there's a new way to export configuration variables to JavaScript?

Learn about new ResourceLoader features introduced this year, and how you can use them to improve your code. We'll start with a quick introduction to ResourceLoader, then dive into some of the advanced features like require, config var bundling, generated JSON files and icon modules.

Episode 8: How to compare text across multiple languages
September 25, 2019 at 18:00 UTC, 45 Minutes

Join Youtube stream: Youtube

Slides: TBA

Additional Links:TBA

Speaker: Diego Saez-Trumper, Research Scientist

Topic Areas: Technology, languages

Description:

This talk will explain how cross-lingual word embeddings works, and how they can be used to measure the semantic distance between words and documents across different languages, as well of showing some use cases in our section and template alignment work.

Episode 7: Documenting Wikimedia technical projects
September 4, 2019 at 18:00 UTC, 45 Minutes

Join Youtube stream: Youtube

Slides: TBA

Speaker: Sarah R. Rodlund

Topic Areas: Technology, technical writing, technical documentation, Toolforge, Wikimedia Cloud Services

Description:

This talk will discuss what technical writers do, and why they are critical members of our technical community. You will learn more about the skills needed to be a technical writer and how to build these skills by participating on Wikimedia and other open source projects.

The talk will also cover some ongoing initiatives to improve technical documentation for Wikimedia projects.

Episode 6: A Deployment Pipeline Overview
July 10, 2019 at 16:00 UTC, 45 Minutes

Join Youtube stream: Youtube

Slides:

Speaker: Alexandros Kosiaris

Topic Areas: Technology, Deployment, Mediawiki

Description:

The deployment pipeline project has been ongoing for a while, sometimes with more resources poured into it, sometimes less, but it's finally in a state that is ready to be used (it's already being used!). This tech talk is about a presentation to wider technical audiences, discussing the goals of the project, the implementation decisions and how it's meant to be used and adopted by the deployers of services (and eventually MediaWiki) in the coming months.

Episode 5: Just what is Analytics doing back there?
June 25, 2019 at 18:00 UTC, 45 Minutes

Join Youtube stream: Youtube

Slides:

Speaker: Dan Andreescu

Topic Area: Data flow, Analytics Infrastructure

Description:

We take care of twelve systems. Data flows through them to answer the many questions that our community and staff have about our piece of the open knowledge movement. Let's take a look at how these systems fit together to answer questions. Let's also look at an example trick we use to join big data in a distributed world.

Episode 4: Wikimedia and W3C
May 23, 2019 at 15:00 UTC, 45 Minutes

Join Youtube stream: Youtube

Slides:

Speaker: Evan Prodromou and Gilles Dubuc

Topic Area: Standards

Description:

The Wikimedia Foundation is now a member of the W3C, as of April. We will walk you through how you can join working groups, what to expect of W3C participation, what we hope Wikimedia staff can achieve through W3C and we will share our own experiences as W3C members.

Episode 3: Sharing global opportunities for new developers in the Wikipedia community
April 24, 2019 at 18:00 UTC, 45 Minutes

Join Youtube stream: Youtube

Slides: Wikimedia Commons

Speaker: Srishti Sethi, Developer Advocate, Wikimedia Foundation

Topic Area: Developer Advocacy, onboarding new technical contributors

Description:

Wikimedia offers a plethora of opportunities for newcomers to get involved; however, as with many other free software projects, getting involved with the Wikimedia technical community can be a daunting prospect for newcomers. This talk is a gentle introduction to the Wikimedia ecosystem, and gives pointers on how to get involved as a volunteer. I will delve into the various ways newcomers can make successful contributions in areas ranging from design to documentation, from programming to testing, and much more.

Episode 2: Ouch, I have an OOUI: using OOUI without pain
March 27 2019 at 18:00 UTC, 45 Minutes

Join Google Hangout Meet Live: Youtube

Slides: TBA

Speaker: Moriel Schottlender

Topic area: OOUI

Description: OOUI is the interface widget library we are using for UI in the Wikimedia projects. The library is meant to allow implementers to create useful interfaces that automatically answer internationalized needs that are unique to the global nature of our projects. Right-to-left support, supporting old browsers, accessibility, etc, are things that OOUI is doing in the background for you. This tech talk will present OOUI’s history, basic and advanced usage, and demonstrate how to create great interfaces without (much) pain within our wiki ecosystem.

Links mentioned in the talk:
 * Code: mw:User:MSchottlender-WMF/oouiExamples/basicWidgets.js
 * RCfilters, cf.: Edit_Review_Improvements/New_filters_for_edit_review
 * Notifications, cf.: mw:Extension:Echo
 * Both RCFilters and mw:Special:ContentTranslation
 * Content Translation v2, cf.: Content_translation/V2

Episode 1: The long and winding road to making Parsoid the default MediaWiki parser
February 27 2019 at 19:00 UTC, 45 Minutes

Video: Youtube

''Slides:

Speaker: Subbu Sastry, Principal Software Engineer

Topic area: Parsoid, Wikitext Parsing

Description:

This will be a talk in two parts: The first part will provide a bunch of background to make sense of the roadmap presented in part 2. The second part will have 3 components: (a) Parsoid history (b) Porting Parsoid to PHP: the whys and wherefores (c) From here to Parsoid as the default.

Parsoid started in 2012 as a project to support Visual Editing and since then has gone on to support a number of products (Flow, Content Translation, Kiwix, and Android app). Given that (a) Parsoid's annotated HTML output enables clients to infer things about wikitext without having to parse wikitext, (b) the PHP parser cannot support Visual Editor and other products, and (c) we cannot continue to have two parsers, it is inevitable that Parsoid will be the default parser for MediaWiki. This has been known since at least 2015 but while we are nearer to that goalpost, we are still not quite there yet.

In this talk, we'll talk about what else needs to be completed, and what the porting of Parsoid to PHP means for this goal.

Older tech talks
You can browse through past tech talk recordings in the Commons category and on the MediaWiki YouTube channel.


 * 31 October 2017 - Selenium tests in Node.js, by Željko Filipin (Selenium/Node.js/Write, T173488)
 * 9 February 2017 - A Gentle Introduction to Wikidata for Absolute Beginners (including non-techies!)
 * 15 November 2016 - Using Kibana4 to read logs at Wikimedia - Bryan Davis
 * 31 May 2016 - Integrating user behavior to design better products
 * 18 April 2016 - UX Prototype Labs: Understanding Wikipedia Readers
 * 22 March 2016 - Reflections on WMF: Community Dynamics
 * 18 March 2016 - New readership data. Some things we've been learning recently about how Wikipedia is read (video, slides)
 * 29 February 2016 - Automated citations in Wikipedia: Citoid and the technology behind it (YouTube)
 * 8 February 2016 - Tech Law Training: Privacy, Security, Licensing, & Beyond (private event)
 * 8 February 2016 - A Hands-on Estimation Exercise
 * 14 January 2016 - Creating Useful Dashboards with Grafana(abstract)
 * 15 December 2015 - The creation of Histography, from concept to design
 * 9 December 2015 - Secure Coding For MediaWiki Developers
 * 3 November 2015 - The making of a MediaWiki skin(abstract)
 * 2 November 2015 - Nothing Left but Always Right: The Twisted Road to RTL Support (YouTube, slides)
 * 23 October, 2015 - Introduction to Free and Open Source Licensing at Wikimedia - Stephen LaPorte
 * 20 August, 2015 - ELK: Elasticsearch, Logstash and Kibana at Wikimedia (WebM format on Commons, also on YouTube) - Bryan Davis (slides)
 * 18 August, 2015 - Let's talk about web performance - Peter Hedenskog
 * 15 June, 2015 - Kanban: An alternative to Scrum? - Kevin Smith (slides)
 * 14 May, 2015 - Graphs! Visualize maps and data graphs live on Wikipedia - Yuri Astrakhan and Dan Andreescu
 * 14 April, 2015 - The state of Team Health across Wikimedia Engineering - Kristen Lans
 * 4 March, 2015 - Hack - An Evolution of PHP - Josh Watzman from Facebook
 * 6 January, 2015 - A developer's-eye view of API client libraries - Frances Hocutt
 * 11 December, 2014 - Phabricator for Wikimedia Projects - Quim Gil and Andre Klapper
 * 25 November, 2014 - MediaWiki-Vagrant What is New With MediaWiki Vagrant with Bryan Davis and Dan Duvall (slides, video)
 * 03 November, 2014 - Language Engineering: Content Translation Tool - Joel Sahleen
 * 22 October, 2014 - Design Research in Product Development - Abbey Ripstra
 * 06 October, 2014 - The Dashboarding Problem - Dan Andreescu and Nuria Ruiz
 * 24 September, 2014 - The Very Basics of Phabricator - Andre Klapper
 * 29 July, 2014 - HHVM in production: what that means for Wikimedia developers - Paul Tarjan
 * 15 July, 2014 - Hadoop and Beyond. An overview of Analytics infrastructure - Andrew Otto
 * 11 June, 2014 - How, What, Why of WikiFont - May Galloway and Monte Hurd
 * 15 May, 2014 - Elasticsearch - Nik Everett
 * 15 April, 2014 - A preliminary look at Parsoid internals - Subramanya Sastry and Gabriel Wicke

Nominations/Ideas for future tech talks

 * The work Analytics engineering are doing and how we could help
 * An understanding of Flow, where it's at and where it's going
 * on the new visual design for MediaWiki/mediawiki.ui or whatever that library is called these days.
 * QA (Zeljiko? Chris?) about browser testing
 * Daniel Kinzler on core refactoring of Title and other classes.
 * Pau Giner on dos and don'ts in user testing
 * Niklas Laxström on conversing with robots that know Wikipedia (part of his Phd research)
 * Antoine Musso on Jenkins - status: emailed Antoine/Antoine on paternity leave. hold.
 * Moriel Schottlender on right-to-left support (adapting her blog post into a talk)
 * Wikidata - Lydia/Community member - open to all
 * Tony Thomas - ?
 * Webinar on Zotero translator coding
 * Aaron Halfaker on machine learning support for wiki-workflows.
 * C. Scott Ananian on using Parsoid output to implement bots/scrapers/offline readers