Wikimedia Developer Summit/2018/Participants
Not all participants have filled out the participant registration form, so some participants and position statements are still missing and will be added to this table as they register.
A co-founder of Siri Inc. and formerly a director of engineering in the iPhone group at Apple.
Cofounder Skyliner, worked at Stripe, and helped build Etsy.
Relevant essay: http://mcfunley.com/choose-boring-technology
The John L. Hinds Professor of History of Science in the History Department at Stanford University and Director of the EU/US Gendered Innovations in Science, Health & Medicine, Engineering, and Environment Project. From 2004-2010, Schiebinger served as the Director of Stanford's Clayman Institute for Gender Research. She is a member of the American Academy of Arts and Sciences. Lengthier bio: http://web.stanford.edu/dept/HPST/schiebinger.html
Data Science Lead at Clover Health, a health insurance startup in San Francisco. She studied Linguistics and Math/CS at MIT and dropped out of a CS PhD at UPenn. Previously, she worked at Oracle on machine learning, at Klout on natural language processing, and at Twitter on big data ad munging.
Fan Bu: Google, Tech Lead Manager of WikiQuality Team
Jiang Bian: Google, Tech Lead Manager of Dataz(formerly Wiki-Infra) Team
John Bennett: Wikimedia Foundation, Director of Security
Katherine Maher: Wikimedia Foundation, Executive Director
Toby Negrin: Wikimedia Foundation, Chief Product Officer
Tony Sebro: Wikimedia Foundation, Deputy General Counsel
Zainan Victor Zhou: Google
Please let rfarrand know if there have been any errors in position statement posting
Aaron Halfaker (halfak): The future of responsible AI design is auditing systems
When we deploy AIs that make inherently subjective judgments that affect people and their work, we must also provide a means for them to audit and critique the AI. Did the AI mark the wrong thing as vandalism? Then it can silence a contribution. Did the AI fail to note a high quality article? Then we might direct traffic away from good content? Did the AI recommend the wrong type of thing? Then we might keep people in a filter bubble rather than helping them broaden their knowledge. There's a lot of conversation in the public sphere about how AIs cause ethical and social problems. Google's image search suggests that all CEOs are men. Facebooks feed filter reduces the visibility of conflicting opinions. The general call among researchers and ethicists is for transparency. At Wikimedia, transparency is an old idea. We've always developed our technologies transparently. But this hasn't made us immune from the problems that AIs wreck. Auditing systems are the future. They are a means towards giving users power over the AIs that govern our experience. We should be talking about how to build them.
The future of digital information will likely be brokered by major platform providers such as Google, Apple, Amazon, Microsoft, and international equivalents and social networks. We're thankful they extend our reach, even as we seek to help consumers on the platforms join our movement. We could help platform providers, their users, and our users solve problems better through adoption of the open standard Schema.org into Wikipedia pages mapped with templates and, ideally, federated and synchronized Wikidata properties. Benefits:
- Wikipedia will have even better presentation and placement in search engines and other data rich experiences.
- We provide an opportunity for a more consistent data model for template authors and people/bots filling template values. And the richly defined Schema.org entities provide a good target to reach on all entities represented in the Wikipedia/Wikimedia corpora. Standardization can reduce duplication of effort and inconsistencies.
- We introduce an easier vector for mobile contribution, which could include simpler and different data entry, mapping, and modeling.
- We can elevate an open standard and push its adoption forward while increasing the movement's standing in the open standards community.
- Schema.org compliant data is more easily amenable to machine learning models that cover data structures, the relations between entities, and the dynamics of sociotechnical systems. This could bolster practical applications like vandalism detection, coverage analysis, and much more.
- This might provide a means for the education sector to educate students about knowledge creation, and data modeling, and more. It might also afford scientists and other practitioners a further standardized way to model the knowledge in their fields.
What would it take? And can this be done in harmony with the existing
I believe that the technical community should strive to collect and effectively disseminate technical knowledge as per the Wikimedia missions statement. Ability to grow our technical community can be compared with one's own ability to gain knowledge in technical spaces within the Wikimedia movement. Currently, there are many barriers to entry that have been surfaced year after year with some but little movement forward past them. To scale and ready the community, we should push forward and enable the use of emerging trends in technology, such as knowledge retaining Q&A platforms. There are many other organizations and softwares that do this much better than Wikimedia. We should learn from them. Looking at Q&A platforms specifically, talk pages have never really been a good place to ask questions and retain knowledge in a searchable way for use in the future. Stackoverflow, as an example, has proven to be an invaluable resource for people in technical spaces and we can learn from that. MediaWiki is an amazing piece of software, but we should not feel 'boxed in' by it. The Wikimedia foundation is not the MediaWiki foundation, MediaWiki does not have to always just be a wiki page. Our commitment to Open Source is often something that slows down many actions within the movement, however this is not something that should change as it is integral to what Wikimedia stands for at the core. We should embrace our Open Source commitments and reach out to and engage with organizations using our software more. Wikimedia Germany does this outreach specifically with the Wikibase extension, looking for other users and engaging them to discover how they are using it, why, and how it can be better. The Wikibase extension also specifically the Wikibase Query Service shows us that not everything has to be a wiki page, as the query service disseminates knowledge under a free licence effectively. I hope that the summit will agree that entry to our technical space, and increasing knowledge persistence within our technical space needs some thought and work, and that we should stay committed to MediaWiki as a software and platform, but that it can look, feel and act different while Wikimedia stays true to its mission.
Amanda Bittaker Frameworks to connect infrastructure to the mission
We will better achieve social impact, succeed in our strategy, and fulfill our mission when the Foundation uses non profit programmatic frameworks when prioritizing and planning improvements to MediaWiki and other technologies.
Impact is an intangible, abstract social benefit and it can be difficult to consider how changes we make in MediaWiki will help or harm it. To illustrate the connections between infrastructure choices and impact and to incorporate those connections into our plans, we can use programmatic frameworks developed in the nonprofit professional communities. Frameworks used by these nonprofit communities for various types of programs and impact can explicitly and concretely link our engineering choices to the movement strategy and the social benefit we create. This increased attention to the social impact of our technical decisions and investments will in turn create increased investment from our communities, partners and potential allies beyond our community towards fulfilling our mission. WMF programs such as New Readers and Structured Data on Commons, and Wikimedia community programs, such as Wiki Loves Monuments, model how building technology for well-defined social impact can structure our engineering and infrastructure choices towards more strategic and mission driven impact. These programmatic frameworks can be helpful during annual planning, quarterly check ins, and throughout the process of deciding on, planning, implementing, and evaluating technological changes. We would be able to weigh and design intentionally for broad-end users while also supporting the targeted and specific organizing communities who use our technology towards our desired social impact. We could expand the impact that we achieve by consulting expert communities, such as educators, librarians, and activists, who will design additional social-impact programs and processes on top of those tools. We could also identify parts of our communities which already create desired impacts, and build technologies and technological services which increase the scale, effectiveness and efficiency of organizing contributors to fulfill our mission. Socio-technological decisions in our movement can be most successfully achieved when considering both social and technological benefits.
Anne Gomez: Wikimedia properties need to keep pace with the norms of browsing and information consuming behavior to stay relevant
Wikimedia properties need to keep pace with the norms of browsing and information consuming behavior to stay relevant, grow readership, and bring new editors to add their knowledge to repository. We need to support smaller content types - both for contributions and for consumption. At the same time, we need to support multimedia content, from video to interactive graphics to augmented reality. Structured data will allow us to be more flexible in our presentation of information, and create more complex interactions with that information. Video and audio will open the doors to new contributors and new projects. Content consumers online now, whether among the highly connected or using the internet for the first time, are looking for the right information available to them at the right time. They don't necessarily want long, encyclopedic content, but instead prefer snippets of information served to them just when they need it. And they learn through more immersive experiences - video, augmented reality, interactive graphics - rather than long form text. Even beyond that, huge portions of the world can't access our content for a number of reasons: they don't have internet access, they can't read, their languages don't have keyboard support, there isn't content in their language. The internet as a whole is evolving to meet these changing needs. Messaging apps support walkie-talkie like communication, Google serves just the right answer to any question (in English), and language support for smaller languages is growing cross-platform. Our infrastructure needs to meet these needs.
ArielGlenn: Think like a Pirate: How to beat Internet censorship
Universal access to a digital good such as the knowledge curated and made available via Wikimedia projects, presupposes access without censorship. Censorship and circumvention methods become more advanced over time. Censorship ranges from blocks of single articles to targeting DNS providers to seizing servers to shutting off Internet access completely. Some of these methods are in use right now against Wikimedia projects.
One form of censorship evasion has proven virtually impossible to stamp out: piracy of copyrighted content, in particular music and movies. Let's look at the methods used by the pirates and adapt them for use by Wikimedia content providers and users. We would like our content to be widely shared, available everywhere. Here is what we need to get started:
- Content must be downloadable and usable off-line. Content meant to be used online, that requires contact with an external server, fails this test. Movies and music do not.
- Content must be partitionable. You don't grab all alternative music for 2017, but just the albums from the artists you want. Users will likely not need or want all of the English language Wikipedia (for example) but only subsets.
- Content must be usable off-line by applications everyone has. Movies and music are downloaded in formats that play in apps that come standard with every OS on every platform. Usability must include navigation and search of content.
- Downloadable content must be easy to find, both before and after censorship. You ask Google to find the music or movie you want on YouTube or elsewhere, click and download. Failing that, there is a fallback (see below).
- Tech-savvy downloaders must be able to seed the distribution of content to everyone else. For music or movies, folks who download from private torrent trackers make copies to give to all their friends; six degrees of separation later, we have reached saturation.
- Content must be popular enough to be widely shared. If a group of consumers cannot locate a content source or redistributor, the distribution chain breaks. Poorly seeded torrents are the classic example.
- People must not rely solely on the original online content source for access. If no one has downloaded or mirrored a copy before access to the original content source is blocked, this approach fails. Note that most people will have little incentive to save copies of content for offline use from a reliable site, unless Internet access itself is spotty, or the content bundles for download add value.
In some jurisdictions, it may be dangerous to possess certain content, including that of the Wikimedia projects. This issue is outside the scope of this proposal.
Related topics: https://www.mediawiki.org/wiki/Wikimedia_Apps/Offline_support, https://www.kiwix.org/, http://xowa.org/ and so on
Currently Wikimedia distributes its content almost exclusively using the Internet. However, the Internet is controlled by gate keepers in the form of governments and ISPs. While historically these entities rarely controlled the flow of information, more recently we have seen an increase in censorship, particularly by governments. Since Wikimedia is distributed almost entirely over the Internet, we are vulnerable to their whims. The risk of having our distribution lines interfered with, is an existential threat to our mission. While at present time, only a few geographic locations practise such interference, the future is unknowable and does not appear to be heading in a comforting direction. Furthermore, in the face of such interference, there is very little we can do. TOR is often spoken as a solution to censorship, but any such on-Internet system will either have to be obscure or rely on secret information (e.g. TOR bridges) to avoid blocking, and thus cannot be used by the public at large. The most effective solution to censorship so far seems to be political pressure, combined with bundling to make censorship decisions as broad as possible. When much content is bundled together, such as entire domains with TLS, or Github and New York Times, it can reduce censorship if there is political will to censor a specific part, but not the whole thing. However, political opinion is fickle, and cannot be relied upon. Thus, we should reduce this risk by diversifying how we distribute our content. Multiple distribution routes means no single point of failure. I see two ways of doing this: First, by expanding offline versions of Wikimedia. Kiwix already provides an offline version of Wikimedia sites. We need to expand this capability to allow for better updating. Offline apps should be able to efficiently update their contents in accordance to a scenario where users only have intermittent access to the open Internet. More importantly, offline apps should be able to update in a P2P fashion with other apps. In a community with limited access to open Internet, a single person with an up to date version of Wikipedia, should be able to easily synchronise his/her app with other people's apps to spread the knowledge. This could be especially helpful in a scenario where a small number of people have access via methods such as TOR, but such methods are too burdensome for most people. Second, we could experiment with broadcasting recent edits widely. To broadcast html versions of all main namespace pages recently edited on English Wikipedia, would only require about 12 KBps . This is not a huge amount of bandwidth. During the Cold War it was common to broadcast propaganda using short wave radio, which could be listened to across the world. Perhaps we could broadcast everything that is edited across the world in a similar fashion, allowing users to stay up to date regardless of their connectivity. This could be combined with the P2P app, so a few power users could listen in to the RC stream, and then spread the data among their communities.
 Based on very rough experiment,
?action=render of a wikipedia page roughly gzips to the size of the raw wikitext. From there the 12 KBps number is based on the enwiki result of:
SELECT sum(l)/(1024*3600*24) FROM (select max(rc_new_len) 'l' from recentchanges WHERE rc_namespace = 0 and rc_timestamp BETWEEN '20170926000000' AND '20170927000000' AND rc_type <= 1 group by rc_cur_id ) t;
Benoît Evellin (Trizek (WMF)): How to built a discussion system that would ease user interactions and content creation on the wikis?
I believe that Structured discussions are a must-have for MediaWiki. Build such a system will reduce communication gap on the wikis, ease newcomers first steps, empower all users and allow powerful interactive tools to be built. It will also increase a lot the adoption of MediaWiki as the knowledge creator system. The MediaWiki community has a strategic priority decision to take on this topic. The Wikimedia communities and organizations, through MediaWiki, wants to give everyone a way to create (free) knowledge collaboratively, for all users from everywhere. Imagine doing it without a powerful discussion tool that would face international interactions, scale and manage to keep everyone aware of the ongoing work. MediaWiki powered experiences have proven that it is not possible.
Unstructured messages are based on a blank page which hasn't evolved since 2002. You can do anything using a blank talk page. But Discussions as the are now don't provide basic things people are used to on social networks or Gdocs for example. Among many missing features, users can't reply to a discussion by email, or using mobile the interface; users have to know where to post and how to use a unique technical etiquette to discuss; and more.
Current discussion default system is not welcoming everyone. Several communities like Wikimedia and WikiHow create inventive ways to structure discussions a bit: templates, contents preload, mentions, surcharge of discussions with HTML, local scripts and bots…. Those are not unified and supported by other than communities themselves. Some wikis have decided to use Flow and expect improvements to have a better experience. Some others communities, often the small ones, prefer to use Facebook or other social networks to discuss, which is not a free, safe and open environment.
The approach supported by the Wikimedia Foundation is Structured Discussions extension (re-scoped from Flow) to focus on user-to-user discussions. Consider that extension as a MediaWiki high-priority building block extension is a political decision the MediaWiki community needs to take. It will permit to build strong and diverse communities, decreasing technical barriers. Built that discussion system requires a clear strategy and resources, like it has been done for the visual editor a few years ago. Any important effort will have side effects that will benefit to other projects (like VE project did notably by developing Parsoid), by being used by other extensions or services that would benefit discussions to create very powerful features, like in-articles notes or suggestions, or easier request systems.
Work on discussions on the Web is not a new topic. We can benefit of studies made about on-line discussions, both about UX design and technical implementation. The MediaWiki community also have some experience about what is not possible or not desirable, taken from LiquidThreads and Flow.
Birgit Müller: Refactoring the Open: First steps to get ready for the next level
Wikimedia's technical environment has grown into a very complex system throughout the past 15 years. Measured in internet years, parts of the software are ancient. When implementing a new feature, refactoring of a piece of the (extended) MediaWiki software is often required first. Following this principle of a.) refactoring and b.) implementation of something new, I suggest to start the discussion of the future technology direction by reflecting (and possibly: refactoring) the current Open Source practices and processes within the Wikimedia context.
A mono perspective won't let us survive (and is less fun, too)
When we talk about "Open Source" within Wikimedia we're not only talking about free licenses and open code repositories. We're talking about global collaboration and the technical contributions of many: Through this, we ensure that the Wikimedia projects stay alive and evolve, that we constantly develop new ideas, that multiple and diverse perspectives shape the development of our infrastructure and tools.
We are great in having ideas, and we are good in trying things out. But we still partly fail at prioritising the problems we know we have and address them accordingly.
I believe that we should better maintain the Technical Community and find ways to grow by
- allocating stable code review resources from paid staff for volunteer and 3rd party developers
- improving the documentation of the code base
- providing a single entry point that is easy accessible for interested developers
- building up partnerships with Open Source communities we might share interests in the future with (for example, communities around audio, video or translation technologies)
constantly take diverse perspectives into account by
- finding better ways to gather and address feedback from smaller language communities and non-Wikipedia sites
- being less Wikipedia-centric when it comes to research: Not yet existing or emerging communities might not be interested in creating articles, but in contributing data or multimedia content or in building tools to reuse data and multimedia content
build more bridges across local wikis and increase knowledge of local requirements by
- fostering cross-wiki exchange (example activity: template Hackathon)
- increasing the knowledge of the requirements that come along with different languages (example activity: multilingual support conference)
Open Source doesn't mean anything is possible - does it?
We have established processes and regulations for contributions to MediaWiki itself. But we lack processes and practices for local developments to ensure both, the freedom and space to experiment for the Technical Community and the stability and reliability of tools for users.
I believe that we should e.g.
- raise priority for implementing a code review process for JS/CSS pages on Wikimedia sites
- start thinking about a technical sysop user right
- make it clear which user scripts/gadgets/tools are maintained, which are stable and which are proofs of concept or prototypes (for example: provide a (central) "store" of maintained gadgets/tools with different levels: "stable version", "experimental version" …)
Let's start refactoring.
C. Scott Ananian (cscott): One World, One Wiki!
Instead of today's many siloed wikis, separated by language and project, our goal should be to re-establish a unified community of collaborators. We will still respect language and cultural differences — there will still be English, German, Hebrew, Arabic, etc. Wikipedias; they will disagree at times — but instead of separate domains, we'll embrace a single user experience with integrated navigation between projects and languages and the possibility of split screen views aligning related content. On a single page we can work on articles in different languages, or simultaneously edit textbook content and encyclopedia articles. Via machine translation we can facilitate conversations and collaborations spanning languages and projects, without forcing a single culture or perspective.
Machine translation plays a key role in removing these barriers and enabling new content and collaborators. We should invest in our own engineers and infrastructure supporting machine translation, especially between minority languages and script variants. Our editing community will continually improve our training data and translation engines, both by explicitly authoring parallel texts (as with the Content Translation tool) and by micro-contributions such as clicking yes/no on a proposed translation or pair of parallel texts ("bandit learning"). Using "zero-shot translation" models, our training data from "big" wikis can improve the translation of "small" wikis. Every contribution further improves the ability of our tools to make additional articles from other languages available.
A translation suggestion tool will suggest an edit in one language whenever an edit is made to a parallel text in another language. The correspondences can be manually created (for example, via the Content Translation tool), but our translation engine can also automatically search for and score potential new correspondences, or prune old entries when the translation has drifted. Again, each new correspondence trains the engine and improves its ability to suggest further correspondences and edits.
Red-links and stubs are replaced with article text from one of the user's preferred fallback languages, perhaps split-screened with a machine translation into the user's primary language. This will keep "small" language wikis sticky, and prevent readers from getting into the habit of searching in a "big" language first.
We should build clusters specifically for training translation (and other) deep learning models. As a supplement to our relationships with statistical translation tools Moses and Apertium, we should partner with the OpenNMT project for modern neural machine translation research. We should investigate whether machine translation can replace LanguageConverter, our script conversion tool; conversely, our editing fluency in ANY language pair should approach what LanguageConverter provides for its supported languages.
By embracing unity between projects and erasing barriers between languages, we encourage the flow of diverse content from minority languages around the world into all of our wikis, as well as improving the availability of all of our content into indigenous languages. Language tools route around cultural or governmental censorship: by putting parallel texts and translations in the forefront of our UX we expose our differences and challenge preconceptions, learning from each other.
How and with whom should we partner to create the technologies needed to support the mission?
A substantial, growing community of MediaWiki users and developers outside the Wikimedia movement has evolved, creating wikis that vary in size, number of editors, number of readers, access restrictions, and activity. The Wikimedia movement benefits from this third party MediaWiki developer and user community's technology contributions and innovation. Similarly, this community benefits from the Wikimedia movement's stewardship of MediaWiki as the foundational technology in support of Wikipedia and its sister projects. There are many areas in which the needs of these two groups are identical, including stable, well-performing software that supports community authoring. Partnering with the third party MediaWiki community will result in a platform that is better for all parties.
How should MediaWiki evolve to support the mission?
There is much knowledge in the world that cannot find its place within Wikipedia or its sister projects. MediaWiki is powerful software crafted especially to support the expression of all knowledge. Third party MediaWiki wikis can provide a home for knowledge that does not belong in Wikipedia, supporting the mission of sharing in the sum of all knowledge. In order for the third party MediaWiki community to continue to thrive and to grow, several impediments to MediaWiki adoption that especially affect that community must be addressed:
- Installing and maintaining all but the smallest and most basic MediaWiki installation currently requires a high level of craftsmanship and expertise.
- While a large number of novel MediaWiki extensions exist to support third party applications, it is difficult to ascertain the level of maturity and support of these extensions.
- Some enterprise consumers require a guaranteed level of support and/or service level agreements before adopting a technology.
- The barrier to entry for those wishing to experiment with MediaWiki in production quality environments is high.
How do we maintain and grow the technical community and ready it for the mission ahead?
The third party MediaWiki community already significantly contributes to the code base. In the last two years, 22% of the commits to MediaWiki core were made by third party contributors, and 62% of the authors of commits to MediaWiki core were third party contributors. Even more striking, in the last two years, 40% of the commits to MediaWiki extensions hosted on gerrit were made by third party contributors, and 67% of the authors of the commits were third party contributors. The third party MediaWiki community is a significant training ground for skilled MediaWiki developers used to tackling a diverse set of challenging problems who sit poised to help forge the path ahead for MediaWiki.
Dan Andreescu / milimetric
Our strategic goals include scaling our communities to a truly global level, and expanding our understanding of human knowledge. To do this, in my opinion, we need to have a much better understanding of our communities' actual work. We have tens of thousands of people doing millions of hours of work every month, and nobody knows exactly what is being done, what the definition of "done" is, and how fast or slow the progress is. We are the leaders of the free knowledge movement, and we are mostly blind except for some big picture notions like pageviews and edits. It is my opinion that we need to develop a good understanding of the work being done on the wikis. Very capable people have already spent lots of time trying to do this, but I believe we have largely failed because of technical limitations. This is a big data and big compute problem, and we have not yet approached it as such. A close collaboration between our communities, Analytics, Research, and Audiences teams is needed, as well as the power of the WMF Hadoop cluster. I have had sessions on this topic already, and am excited to finish planning and transition to actual work. There are some very valuable implications of taking on and finishing this work. Most importantly, we will all be able to more objectively talk about frustrations in the community over changes that cause "more work". For example, when we launched Visual Editor there was huge backlash about the amount of work this change implied for our community. But because this was largely based on subjective opinions, emotions got involved and it took years to calm the negative effect of those emotions. This effort would also give us, for the first time, a way to celebrate these millions of hours of work. People could see, share, and take pride in their part of building human knowledge (if they wanted to, privacy is of course one of our top priorities). I am also interested in expanding our Open Source efforts, and examining changes that we can make to spur more collaboration. My reading of the strategic goals for 2030 is that the WMF will not have enough resources to execute by itself. That's where collaboration will be crucial, and where problems like in-house developed libraries without true Open Source presence will slow us down. We let documentation and third-party user support lag behind because we're busy with other stuff, and that's arguably fine for our scale so far. But this approach will not allow us to grow the way our Strategy is defined.
I would like to discuss how assumptions drive our day to day work, and how to may sure we properly understand and regularly challenge these assumptions. I'm particularly interested in how technological assumptions shape product decisions, and how product assumptions shape technological decisions. Three major axioms come to mind:
- MediaWiki needs to run in a shared hosting environment: This has been an explicit requirement for a long time now, but the baseline product that actually does run in such an environment (LAMP with no root access) is becoming more and more sub-par. We are already struggling to provide a decent mobile browsing experience there, not to mention search or WYSIWYG editing. So we should have a discussion about for how long we want to kep this requirement, what the consequences would be of dropping it, and what alternative platform we should target for the baseline installation of MediaWiki.
- The primary medium for knowledge sharing is text: This assumption used to be hard-coded into MediaWiki until the introduction of ContentHandler, and it still seems to be hard coded in the minds of many long term contributors, to the software and to the wikis. I believe that it is high time to invest into exploring other media formats and alternative forms of collaboration. It seems to me like "Beyond Wikitext" is the major technological challenge that has come out of the movement strategy process, and that we should start thinking and talking about it - from the technological side as well as the product side.
Daren Welsh: What technologies are necessary for embracing mobility?
How and with whom should we partner to create the technologies needed to support the mission?
Multilateral, asynchronous, bidirectional synchronization of wikis: How astronauts taught the world to wiki on the go
The world is certainly transitioning their internet usage from the desk to their mobile device. Let's not limit our focus on mobile devices that always have an internet connection. Let's talk about the millions of travelers stuck in a moving vehicle with nothing better to do than look out their window. I'm talking about passengers on planes, trains, and automobiles. The easiest target here are the millions of people who fly. As a passenger onboard a plane for hours, we're lucky if we have an in-flight movie system. But what if the plane offered a local intranet with a copy of Wikipedia? What if the airlines gave promotions to those who contributed? What if each flight competed with other flights for most contributions? The same approach could be applied for passenger trains, buses, subways, and ferries. The main limitation here is a technical one. If you have thousands of Wikipedia clones buzzing around, each collecting contributions during their offline time, how do you reconcile the changes with the master database? While tools like Kiwix already offer an offline copy of Wikipedia, there is much work needed to support thousands of wiki clones reconciling changes every few hours. This will require revolutionary branch management and revision conflict handling. But if you pull this off, it might kick off the biggest surge in user participation in years. With whom should you partner to accomplish this? Why not start with NASA? They use MediaWiki to train astronauts and plan for spacewalks. Begin this development by running wiki servers onboard the International Space Station. Get astronauts to contribute to the same wiki used for their training while they are putting all that knowledge to use. Once the NASA wiki synchronization between the ground and the ISS is working, expand this model to Wikipedia. Yes, have a clone of Wikipedia onboard the ISS. Astronauts love to share their experience, their story, and their photos from their 6-month stays aboard the station. These lucky few represent countries from around the world and they have a huge influence on the rest of us on the ground. Once people see astronauts contributing to Wikipedia during their journey, they will want to join the movement on their travels (albeit aboard slightly less cool vehicles).
In a Wikimedia cultural orientation, the moderator instructed the class by explaining that "technology is not part of our mission, technology is only a means to an end." While it may be appropriate to use digital technology in order to disseminate free educational content effectively and globally, the existence of Wikimedia Technology is not strictly part of Wikimedia's mission. Wikimedia is therefore stuck in a precarious position of maintaining a large open source software project only as a means to an end. This produces a double-minded mentality within the movement: satisfying the mission versus satiating a massive software operation. It is with this understanding that Wikimedia ought to consider partnering with an existing open source community in order to evolve MediaWiki to support the mission, maintain and grow the technical community, and build technologies necessary for embracing mobility.
While Wikimedia's mission statement does not cover technology, the mission of Drupal "is to build the best open source content management framework." In addition, Drupal is more than capable of handling all of Wikimedia's traffic needs and is flexible and modular enough to allow us to implement all of MediaWiki's features in a UI and API backwards-compatible way. In fact, every feature of Drupal core is an extension and every extension is a first-class citizen that has full control over every aspect of Drupal. Perhaps the next major version of MediaWiki ought to be a collection of Drupal extensions that can be run independently and are also available in a pre-configured "MediaWiki" distribution of Drupal.
The primary user of MediaWiki is, by far, Wikimedia. While the software can be run by others outside of Wikimedia, its usage outside of Wikimedia is extremely low. Because of the low adoption rate, it is difficult to gain any users outside of Wikimedia. As a result, there are very few developers outside of the movement that contribute towards its development and almost no outside financial commitment to the project. In contrast, Drupal's usage within the top 1,000 websites is 14 times that of MediaWiki. "The Drupal community is one of the largest open source communities in the world." By building MediaWiki on top of Drupal, Wikimedia would be tapping into a user and developer base that is substantially larger than MediaWiki's. Wikimedia would be assisting Drupal in their mission and Drupal would be helping Wikimedia in theirs.
Drupal is committed to an API-first strategy. This strategy has enabled Drupal to expose all of its resources in a consumable, highly-cacheable API. They believe strongly in this strategy, because it's part of their mission, and in doing so, helps others like Wikimedia achieve the mission on a global scale. By embracing the API-first strategy, Wikimedia would propel its mobile development into the future.
To further Wikimedia's mission, the foundation should consider using Drupal as the foundation of its software. Doing so would facilitate evolving MediaWiki to support the mission, maintaining and growing the technical community, and building technologies necessary for embracing mobility.
David Chan: Embracing real-time collaboration
Real-time collaboration (like Google Docs and Etherpad) has many benefits but also imposes certain workflow requirements. There is prototype code that can enable real-time collaboration within VisualEditor. But rolling out collaborative editing requires more than technical work. It will require a coordinated effort to re-imagine what editing is like. We will need mechanisms to create user groups, real time chat mechanisms, mechanisms to temporarily persist collaborative sessions, perhaps even new core mechanisms for describing revisions. We also need to think about social mechanisms and preventing harassment and vandalism of collaborative sessions. In exchange, we will gain improved mechanisms for mentoring, translating long articles, reporting on current events, and assisting non-native speakers.
We should embrace this opportunity to reimagine our platform, starting by organizing a number of trials to gain insights into whether, or how best, a real-time collaborative editing option would benefit our projects.
Sessions in previous Wikimanias / Hackathons / Developer summits identified potential uses, including for:
- Translating long articles
- Current events
- Assisting non-native speakers
Potential issues identified include:
- How to log authorship
- Who decides when to publish
- Preventing in-session abuse
- Coexisting with non-real-time editors
Derk-Jan Hartman: Growing and complexity
Our strategy is pointing us towards a bold and inclusive world in terms of projects and people. Almost by definition this will lead to increased complexity, not simply of our technology, but also of how to deliver to and to enable people to make use of our technology. In the last few years we have spent energy in creating more api's and a more service oriented architecture. An area where we however have not made such major changes is how we design for and work with the front end of the software, which is where the majority of people are actually using all the other stuff we make. Here we continue to think in larger products and problems to solve, and quite often tend to fail and even clash with our own 'customers'. By taking on a more diverse strategy, we risk being even more vulnerable to this. I have two suggestions: Smaller engineering. Allowing more time for smaller projects, smaller bugs, smaller tests of ideas and refinement of existing software. Let's embrace the success of Community Wishlists and be closer to our communities by writing more Gadgets or tools (toolforge) when we can, instead of going for 'the big fix'. Have three 1 week tests instead of one 6 month beta. etc. Fix small bugs that annoy many and that make our website feel amateurish, and improve the experience for everyone. Working more often on the needs of smaller projects, giving them a bigger voice and sprucing up our own solutions by gaining a more diverse experience. Be closer to our communities by working nearer to them. The second point that we should work on, is to stop thinking of our platform as a website. It is a work environment for an increasingly diverse crowd. We have a limited amount of space on the screen and a huge amount of tasks that various people want to do. Gadgets and even more so userscripts are hugely helpful, but have long since become unmanageable. It is time to think beyond the simple APIs and widget kits. We need to take a step towards becoming an application environment. We need users to be able to install and use complete apps made from recognizable and reusable building blocks. I want to see and use Gadgets as my browser uses extensions. I want those extensions to put apps in recognizable and consistent spots, to allow for fullscreen or splitscreen views, to have a familiar UI, but without having to cram everything into the limited shared space that we have. Apps as gateways for diversifying the specific solutions we build.
Eric Evans: More than just servers
A modern approach to tooling and infrastructure is needed, not only so that we may scale future content and users, but the deployment of new technologies and services as well. Our way of approaching infrastructure is in need of modernization; What we do we do well, but if we are to grow our capacity, not just for storing and serving content, but our capacity to create the technologies that empower movement strategies, we need a departure from the idea of infrastructure as merely clusters of servers. We need high-level, easily consumed platforms for computation, storage, deployment, and management. We need environments that make experiments cheap, and allow teams to fail-fast or iterate on the next stage quickly. We need systems that are distributed by nature, self-service, secure, and that are able to provide insight into availability and performance. Some efforts have been (or are being) made here. Examples include recent work toward a Kubernetes deployment, a change-propagation service, and RESTBase. These efforts, while worthwhile, are piecemeal, and no holistic strategy exists. It is my belief that as we discuss the future of our platform, we should consider the requirements in the context of the bigger picture, and discuss a strategy aimed at modernizing our infrastructure.
Erik B: Empowering Editors with Machine Learning
Advances in machine learning, powered by open source libraries, is becoming the foundational backbone of technology organizations the world over. Many tedious, time consuming, tasks that previously required 100% human involvement can now be augmented with human in the loop machine learning to empower editors to get more done with the limited time they have available to contribute to the sum of all human knowledge.
- Invest directly in applying known quantity machine learning, such as pre-trained ImageNet classifiers, to add structured data to our multimedia repositories to increase their discoverability. Perhaps via tools that provide editors with lists of appropriate items that they can easily click to add if appropriate to the multimedia.
- Engage academia to work with Wikimedia data sets and employ developers to move the most promising results from research into production. There is already a significant amount of work being done in academia to test and evaluate machine learning with our data sets, but little to none of that work ever makes it back into Wikimedia sites. With more focus on collaboration we can encourage research that is specifically applicable to deployment goals.
- Wikimedia has the ability to collect significant amounts of implicit user data via browsing sessions, searches, watchlists, editing histories, etc. that can be used for machine learning purposes. We need to be continuously thoughtful of the privacy implications of how we use this data.
Giuseppe Lavagetto (Joe): The future of the MediaWiki infrastructure at the wikimedia foundation
MediaWiki is at the core of the infrastructure that serves all of the Wikimedia projects, and the current setup of MediaWiki in production poses various challenges: from the future of our current runtime (HHVM), to the to ability to serve MediaWiki from multiple DataCenters, to long standing issues as resource usage efficiency and flexibility. Here are some of the things we have to tackle in the future.
Transition off of HHVM: Since the HHVM team has made it clear they're parting ways with full PHP compatibility, and that maintaining support for both HHVM and PHP in MediaWiki would be arduous, we need to make plans to move off of HHVM, back to PHP 7.x. This transition, while technically necessary, should not come at a cost for our users: page load times should not degrade. We can proceed by marking responses coming from either engine, collecting metrics and analyzing data. In order to achieve this, we should run the two runtimes in parallel on the same servers (which have plenty of capacity, given no MediaWiki cluster has an utilization over 40%), and we will then be able to programmatically route individual users or a percentage of traffic, or even specific wikis, to one or the other. The deadline for this transition is set for the end of 2018 (EOL of the last compatible version of HHVM), and planning and resources should be allocated to this goal.
Multi-Datacenter support: We currently are using our datacenters in a active/passive setting, as far as MediaWiki is concerned. While this is ok in line of principle, this is a huge waste of resources and means we both have 50% of our servers doing nothing at all, and also limits our ability to expand the number of core datacenters we can use. Diverting the read load to secondary datacenters could also allow us to use caching in a less aggressive way when not needed. There is already a program underway to add first-class multi-DC support to MediaWiki, so we can focus on what specifically needs to be done in order to achieve this longtime goal: our final goal should be to serve reader's traffic from all datacenters, and to be able to switch the "master" datacenter in matter of minutes.
Elasticity, resource usage efficiency: At the moment , our infrastructure is plainly inadequate to react to sudden spikes of non-wiki content production and to changes that generate a lot of asynchrounous jobs, as a change of a popular template. The issues with the current jobqueue are widely known and publicized, but even the current transition to a new model won't solve the starvation of resources that result in a degraded user experience. Moreover, a single editor uploading videos via video2commons can easily overflow our media processing capacity for weeks. This happens because we allocate our resource statically (we have 4 videoscalers per datacenter, for example), we have an inefficient resource consumption, and reallocating servers requires time and effort. Modern applications stacks are elastic, meaning the operation of scaling up or down the capacity of a single cluster or functionality can be handled programmatically and/or manually whenever the need occurs, allowing the infrastructure to react to such changes. For economic and privacy/security reasons, Wikimedia doesn't make use of external cloud services, so the only way to achieve such flexibility is to build a serviceable infrastructure that can serve MediaWiki and any other project Wikimedia will support: the effort to do that is underway with the rollout of our Kubernetes-based IaaS in production. I think we should work, sooner than later, at moving the MediaWiki application stack (and maybe its semi-ephemeral caching) to the kubernetes platform. While the advantages of such an approach seem clear, it won't come without costs: specifically habits around code deployment, testing and configuration changes will need to be completely revisited and superseded by new approaches.
Fundamentally, Wikimedia's technology are tools to achieve our mission – absolutely vital tools, but not objectives in themselves. Where a tool has dulled we should sharpen it, where it has rusted we should polish it, and where it has blunted we should replace it. The majority of our tools have sprouted over time in response to immediate needs, and grown ad hoc when we've spotted something they can also do, or been pruned back when they proved too unwieldy to retain. Our communities have taken these tools and built amazing things with them, often despite rather than in line with their intended use. Subsequently these unplanned use patterns have shaped what we think about the tools and how they should be used, when we do so. This haphazard, tactical development has worked well enough, but has limited us in several ways. We often fail to serve some of our audience because we rush in with a quick fix that listens to a few voices and decides that that's the best thing to build. When we've tried to build more systemic change, it's often been unrooted in serious evidence, and so is like constructing ivory towers into the clouds: baffling, hopeless, and unfamiliar. We should develop comprehensive methods to collect and monitor actionable data on how well our tools are serving their purposes, and where we can improve. This should come from all stakeholders, covering our great, already-empowered, experienced editors in major languages but also those from whom we rarely hear – those contributing in and speaking smaller languages or not interacting with other users on meta-editing issues, and those with a looser relationship to the movement like readers and casual editors. We should have numbers clearly attached to our tools as to how we expect them to perform. How these are obtained will differ. Sometimes quick numbers like success rates of false positives against false negatives from anti-abuse features, or how many users having made changes try to press the submit button, will work. Sometimes simple surveys with expected happiness thresholds will be appropriate. In others we may need to work harder to come up with the right way to understand how different tools and experiences interact with each other, like how much "knowledge" readers successfully glean from the article, or whether the burden of allowing logged-out editing is worth the mindshare of "anyone can edit" feeling true. Ideally, changes to user features and especially introductions of new features should progressively roll out based on these numbers – and if they have adverse effects, they should be automatically rolled back. This is how others operate, but it's very distant from today. It's a far-off dream now, but I believe we can build it.
Jan Dittrich (UX Design/Research)
I believe that we need to achieve a better separation of concerns – in code as well as in work on product and our communication with the communities to reduce the dept we build up in these areas. Therefore, I want to suggest three interrelated topics:
- Use of modern MVVM frameworks for our front end code, to develop more efficiently
- Provision of a modern customization infrastructure, to decouple gadgets from our code
- Participation beyond code and feature wishes
Provision of a modern customization infrastructure: The introduction and larger use of a MVVM could also be a chance to provide clear frontend APIs for Gadgets. They currently use DOM-hacks, which break continuously and would not anyway not possible when using a modern frontend framework (due to DOM flushing). Why should bother, since we have a large user base in which different tasks are shared using specific tools, just like each manual work has many different specialized, often even customized tools. Additionally, gadgets/userscripts could provide a low-barrier opportunity to onboard new developers. Other organizations successfully show that user provided extensions can enhance an ecosystem with user driven innovation and help with onboarding developers, e.g. Firefox’ and Chrome’s WebExtensions as well as LibreOffice. I would like to work on finding a way fulfil the possibilites of gadgets and extend them while providing sustainable and secure infrastructure for doing so.
Participation beyond code and feature wishes: We already do extensive user research. A large area for expansion and further development is doing this research and sense making *with* the community. This may already be done, often implicitly, based on feature- or UI focused requests of community members. But this has large caveats: The solution may net be feasible or sustainable to implement. Furthermore, without understanding the underlying need, we risk building technical- and UX debt and give away the possibility of learning from our community. To achieve an active, needs-based involvement of communities in design and research we could build on existing participatory design methods. They could be used and integrated in our research and product planning frameworks. Clearly integrating community in up front research could enable us to gather needed knowledge, have community participation and reach a better understanding between Wikimedia Foundation and communities as well as of the communities among each other. I want to define future participatory design strategies to be used on our way towar
WMF should focus on the technical issues it is uniquely positioned to handle, and let the volunteers have the fun stuff. When we think about what technical work the WMF is engaging in, I don't believe enough time is spent considering volunteer motivation, and the great potential we are systematically choosing to ignore, or end up devaluing entirely due to the inherent unpredictability of volunteer work. I do believe that there is a long enough history of deeply understaffed WMF engineering teams getting set up to tackle fancy front-facing projects, only to have those teams simultaneously struggle to deliver, and deter everyone else from getting too near decision-making in their territory. It is time to change our approach. I would like to talk about what it would take, to refocus the majority of WMF's technical work away from taking full ownership of all the "important" new ideas, and toward making it as easy as possible for momentarily highly motivated outside parties to make meaningful contributions to new features. I imagine many new tools would be required to scale release engineering, security, and the technical community in general. We would have to take a greater role in mentoring interested parties. There are also known big hairy unsolved problems in the way we currently think of maintainership. Major changes would have to be made in our current approach to product timelines and product/project management. Of course, there will always be things that do require a high level of predictability in the outcomes. Donor money can and should be spent on ensuring predictability around the things we absolutely cannot function without. However, there is a whole world of ideas that absolutely do not have to be accomplished on a strict "shipping" timeline, and it seems that the WMF will always hold the keys to that door. I would like to figure out how the WMF could start embracing that unpredictability at every level, and move much more deliberately from "bottleneck" to "enabler".
We have well established that volunteers are the lifeblood of the Wikimedia movement. We prioritize their contributions and work to ensure they are given the tools they need to succeed. But in the Wikimedia development community, we've neglected volunteers instead of nurturing them – and this is a serious problem that we need to rectify. There are a lot of areas where we can improve, but I'm going to focus on just one: improving the volunteer developer's code review experience. While Wikimedia Foundation product teams are building new things, it's usually the volunteers who are keeping critical tools that the community depends upon alive (AbuseFilter, CheckUser, etc.). The MediaWiki codebase has gotten so massive that it's not practical to try and have the Wikimedia Foundation attempt to maintain all of it. It would not be a good use of movement funds either. Instead, I'm proposing that we utilize our volunteer base and ensure they are the valued and respected members of the Wikimedia development community. I think we can do it in three steps: first, set reasonable standards for code and the review process, second, prioritize code review of patches coming from volunteers, and finally empower volunteers to be maintainers and owners of code and create a sustainable community.
1. Set Reasonable Standards for Code and the Review Process
The status quo is that depending on who reviews your code, you will have a wildly different experience. Some will mandate that principles like dependency injection are followed or others will require 100% test coverage. And others might not care for any of that and just ensure the code does what it is supposed to before merging. But the people who face the worst of it are volunteers – WMF staff will have consistent reviewers through teammates who already communicated standards for merging code. So we need reasonable standards for code we accept, and use those throughout the review process. As an example of "reasonable", if someone is trying to fix a bug in legacy code that is difficult to test, it would be unreasonable to mandate a test case before merging the fix.
2. Prioritize code review of patches coming from volunteers
Our current process of reviewing volunteers' patches after finishing code review for teammates isn't working – we have a giant pile of unreviewed patches. When you start your day and look through your list of reviews, pick one or two patches from a volunteer and review them first. Most likely it'll take minimal time, but for previously-neglected volunteers, it will make a big difference.
3. Empower volunteers to be maintainers and owners of code
Some of our volunteers have been around for quite a while and are well trusted. Let's give them +2 rights! There's nothing that makes you feel better than getting an email from someone telling you that your contributions are valued and they'd like to nominate you for +2 access (exactly how I got hooked). And quite a few years I'm still around, so it must have worked.
Combined knowledge as a service (KAS) and knowledge equity (KE) is identified as our strategic direction (draft). We have decided to focus on knowledge in a broader sense and beyond just encyclopedic knowledge, create KE, and become the infrastructure that offers KAS. In this position paper, I offer some of my early thoughts on where we should focus our efforts to move in this strategic direction. Given the limits of word-count, I will not go through the details of research methods and techniques that can be used to address each point.
As the central focus of the strategic direction is knowledge, we need to arrive at a unified working definition of knowledge. English Wikipedia defines knowledge as familiarity, awareness, or understanding of someone or something which is acquired through experience or education, by perceiving, discovering, or learning. This definition, however, is not a working definition that can help us decide what new content to include. Research on user behavior, needs, and learning patterns can help us define knowledge.
Our goal is to remove structural inequalities that limit our ability to represent knowledge from all people and by all people. To this end, we need to meet our users where they are. Today:
- language is a barrier to sharing in knowledge. Content should be available to our users in their languages.
- text-only knowledge is a blocker for gathering knowledge, especially from parts of the world that are already left behind. Our systems should become technologically receptive to accepting and allowing editability of new forms of knowledge (e.g., voice for oral knowledge).
- limits in proficiency and literacy is a blocker for our users. The content and its presentation will need to become a function of these parameters.
==Knowledge as a service==
Our goal is to offer KAS: both in terms of the infrastructure that supports it as well as the content of it. To do this, we need to:
- empower our users to learn, create, and go beyond consuming content: Wikimedia projects' talk and discussion pages are an asset for building systems that can help our users think critically and learn how to deliberate. We need to do research to surface this critical thinking and step by step deliberation to gain insights from it, and share it with others as part of our KAS effort.
- do research and development on building systems where deliberation and decision making can be possible at scale. Today, there is no such system available but one of the building blocks of KAS is infrastructure for discussion, deliberation, and decision making.
- empower our users with ways to assess the trustworthiness of the content. Trust and reputation become especially important as we move to new forms of knowledge such as oral knowledge. We should do research to build trust and reputation models for Wikimedia and its users and understand how to surface such metrics as measures of reliability of the knowledge we serve.
Lucie-Aimée Kaffee (User:Frimelle): Languages in the world of Wikimedia
One of the central topics of Wikimedia's world is languages. Currently, we cover around 290 languages in most projects, more or less well covered. In theory, all information in Wikipedia can be replicated and connected, so that different culture's knowledge is interlinked and accessible no matter which language you speak. In reality however, this can be tricky. The authors of  show, that even English Wikipedia's content is in big parts not represented in other languages, even in other big Wikipedias. And the other way around: The content in underserved languages is often not covered in English Wikipedia. A possible solution is translation by the community as done with the content translation tool. Nevertheless, that means translation of all language articles into all other languages, which is an effort that's never ending and especially for small language communities barely feasible. And it's not only all about Wikipedia- the other Wikimedia projects will need a similar effort! Another approach for a better coverage of languages in Wikipedia is the ArticlePlaceholder. Using Wikidata's inherently multi- and cross-lingual structure, AP displays data in a readable format on Wikipedias, in their language. However, even Wikipedia has a lack of support for languages as we were able to show in . The question is therefore, how can we get more multilingual data into Wikidata, using the tools and resources we already have, and eventually how to reuse Wikidata's data on Wikipedia and other Wikimedia projects in order to support under-resourced language communities and enable them to access information in their language easier. Accessible content in a language will eventually also mean they are encouraged to contribute to the knowledge. Currently, we investigate machine learning tools in order to support the display of data and the gathering of new multilingual labels for information in Wikidata. It can be assumed, that over the coming years, language accessibility will be one of the key topics for Wikimedia and its projects and it is therefore important to already invest in the topic and enable an exchange about it.
 Hecht, B., & Gergle, D. (2010, April). The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 291-300). ACM.
Lydia Pintscher: Breaking Down Barriers To Cross-Project Collaboration
"... a world in which every single human being can freely share in the sum of all knowledge." Wikipedia has been our flagship for many years now and our main means of achieving our vision. As information consumption and expectations of our readers are changing, Wikipedia needs to adapt. One crucial building block for this adaption is re-using and integrating more content from the other Wikimedia projects and other language versions of Wikipedia. Connecting our projects more is vital for helping especially our smaller communities serve their readers. Surfacing more content from the other Wikimedia projects also gives them a chance to shine, find their audience and do their part in sharing in the sum of all knowledge. This integration comes with a lot of challenges. Over many years the Wikipedias have lived largely independent from each other and the other Wikimedia projects. This is changing. Sharing and benefiting for example from data on Wikidata means collaborating with people from potentially very different projects, speaking different languages. It brings a perceived loss of local control. Editors see them-self first and foremost as editors of "their" Wikipedia at the moment and often don't perceive this integration as worth the effort - especially on the larger projects. We need to address this on both the social and technical level if we want to bring our projects closer together and have them benefit from each other's strength and compensate their weaknesses. We need to think about and find answers to these questions: What can we do in order to bring our projects together more closely? How can we help break down perceived and real barriers for cross-project work? What can we do to make cross-project collaboration easier?
"We are in the business of democratizing knowledge, and I believe that lowering and removing technical barriers to entry, and creating a culture of inclusion in our technical spaces is essential to our success." The Knowledge as a Service aspect of our strategic directions focuses on building infrastructure and platforms that help create and share open knowledge. The key to successfully building and scaling such infrastructure, in the context of the Wikimedia Technical spaces, is enabling everyone, irrespective of their experience or backgrounds to be able to utilize and create research, data, and tools on top of our infrastructure. When designing infrastructure and other technical products, we often fail to take into account technical barriers, inessential complexities and social costs that can discourage or prevent people from being able to leverage them. For instance, is it enough to build a dataset and store it in a database, if we do not provide friendly ways for researchers to access and analyze this data? Is it enough to put out a call to contribute to a project, but not provide easy-to-setup development environments to be able to test changes? Is it sufficient to have a state of the art environment to host applications, but not design good, simple processes around gaining access and deploying to them? These conversations are crucial, because we are not building products for technology's sake, but are in fact trying to build a culture where it is easy to use and contribute to our technical projects, whether you are a volunteer who has a few hours to spare or a paid employee; a newcomer or a long time contributor. We also want our technical communities to be diverse, and these complex systems and processes, and unsaid social constructs around how to interact with our projects, often bias against traditionally underrepresented populations in technology. I have always worked on or pushed for creating and supporting simple graphical interfaces that provide unified access to data sources, building platforms and processes that lets people just create tools/APIs/dashboards and be able to painlessly host them, developing tutorials and good documentation for getting involved in our projects, and codifying friendly and inclusive social norms and promoting a culture of being excellent to each other in our technical spaces. When talking about the future directions of new and existing projects, we should take into account the costs and barriers to access, and who we may be failing to include as a result. I hope to be this voice in the Developer Summit.
It is foreseeable that the way our content will be consumed is going to change a lot in the next years, as both the demographics of the internet as well as the devices used by our readers are changing. We should try to adapt to this by offering our content in ways best suited for many user scenarios. In order to achieve this, we need to modularize and structure our content so that it can be easily re-interpreted and used in many different ways. Wikidata gives us the possibility to easily cater for this trend, by providing machine readable data about any subject, which can be formatted and presented in a wide variety of ways and languages. Wikidata makes it possible to more easily maintain up to date information on subjects in many languages, without the burden of manual data maintenance. We should strengthen this by improving the integration of Wikidata with other Wikimedia projects, providing easy ways to use and profit from Wikidata especially for small communities and by making the power of Wikidata more visible. While all Wikimedia communities will profit from this, it can be especially worthwhile for small communities, that currently don't have the resources for managing data, like Infoboxes, themselves. Example projects that will help in this area are the "ArticlePlaceholder", that allows serving Wikidata-data about a certain subject if there's no article about it. Also the plans for automatic Infoboxes derived solely from Wikidata and other means of using Wikidata-data on Wikimedia projects. While both of these projects can have a big community impact, they need to fit in with the current infrastructure. Also they pose certain new scalability and data presentation challenges that need to be addressed. Furthermore, Wikidata's information should be easy to reuse by third party projects to increase visibility and in order to gain contributions and data donations, making Wikidata the true data hub of the internet. This goal raises longstanding issues with the current Wikimedia dump infrastructure, which is neither very flexible nor does it provide a machine readable interface for data consumers. Also bringing more individual editors and organizations into Wikidata poses various infrastructure and scalability issues coping with the sheer amount of data and changes happening, as well as providing convenient tools for establishing and maintaining data quality.
When Marshall McLuhan said "The medium is the message", he was saying that how the message is understood is affected by what is used to present that message. MediaWiki is a fundamental part of the medium used to present Wikimedia's work (the "message"). Because the medium is an integral part of the message, it requires comparable attention to its availability and accessibility. For example, effort is made to ensure that people in remote areas have access to selected content through Kiwix, but a very limited effort has been made to incorporate their knowledge into the "sum of all knowledge." While there are efforts underway that include copying edits into Wikipedia by hand, it should be possible to provide people in remote areas with an editable copy of Wikipedia so that their edits could be incorporated with less intervention. Improvements in the installation and resource consumption of a simple MediaWiki installation could be made without sacrificing the current PHP-based application such that someone could, for example, run a current MediaWiki installation an a un-rooted Android phone. Work could then be done to automate the synchronization of that MediaWiki with the current Wikipedia content. This work on MediaWiki could, of course, be used by other people who use the tool besides the WMF which could create a virtuous cycle that would benefit the Foundation. In fact, deeply incorporating McLuhan's thinking into WMF culture would mean that, while Wikipedia would remain the most visible product of the Foundation, there would be more room to focus on expanding MediaWiki's capabilities beyond what fits into the current focus on GLAM efforts, the website, etc. Most of the world does not use Wikipedia every day, but many people use something they've learned as a result of reading from or contributing to Wikipedia every day. Making it easier for people to deploy MediaWiki where the potential users do not have the resources of the WMF (for example, in a place that doesn't have a stable Internet connection) could encourage more people to actively embrace of Wikimedia's vision of freely sharing knowledge.
There is a huge potential for MediaWiki development outside of the Foundation's organized tech world. Thousands of organisations are running MediaWiki on the internet or intranet. They are investing time and money to make it their platform for information sharing, knowledge management or collaborative work. Yet, a lot of the development and design work stays contained on those installations instead of being published and provided to the greater MediaWiki community. I think this is not because of seclusiveness of the authors, but because we make it hard for externals to contribute. So how can we tap into this potential? I think there are a number of measures we can take. Among others, these are:
- Support standard ways for code contribution. For example, a lot of developers do have a github account, and know the github workflow of forking and requesting pulls. However, there is currently no way for them to contribute their code directly, instead they have to set up with our gerrit infrastructure. This is a hurdle many will not take.
- Maintain extensions as a community. There are a lot of extensions which are not actively maintained by their authors. In order to get them working, you have to wait for the maintainer to +2 your code. Although I have +2 rights, it is not clear under which circumstances I should actually +2 code, nor is there a general review queue for extensions. We can establish a group of volunteers who review changes to extensions on a regular basis.
- Create a template and gadget repository. A lot of work goes into site customisation using gadgets, templates or on-site-CSS. There are brilliant solutions out there, but we do not have a structured way to centrally collect this content or even curate it.
- Make it attractive for professional developers and consultants build their projects on top of MediaWiki. For example by increasing the visibility of highly used extensions on MediaWiki.org, by providing good entry points for technical documentation or by adding automated quality checks to the extensions. There are already some initiatives pursuing the general goal of fostering an ecosystem, e.g. MediaWiki Stakeholders or the recently announced Enterprise MediaWiki Consortium. Together with the Foundation, they can encourage MediaWiki maintainers to contribute their ideas and code and be part of the MediaWiki world.
Mediawiki is built on the basis of many other open source tools, libraries, packages and other software types. Our ability to write, run and use Mediawiki depends on their availability, support of the upstream and maintainability. As a few examples, debian, the OS WMF is running, PHP, or Elasticsearch, our search back end. In the light of recent discussions of migration from HHVM, to zend php as our runtime, I would to raise the discussion point of what is our position in the open source world of the underlying parts of our stack. Whether we choose the be just a user of what upstream produces, or we want to actively influence the decisions made while writing the software. In order to be able to influence the decisions made while writing the software as the known phrase says: "decisions are made by those who show up" we will need to show up in those communities, but an active part in them and contribute, in the exact same manner we hope third party mediawiki re-users will contribute, discuss, send patches and show up. If we are to choose this path, it has resources implication, Time, money and dedication to involvement in other communities. For instance, having sponsoring a php developer working on our needs upstream for instance might be a good investment but might be a waste as whole. I would like to have an open discussion about this approach, whether it is desired, feasible, and worth the effort. I think it might affect where our tech stack will be in the years to come and has a significant statement towards the outside open source ecosystem. Thank you
We need to re-evaluate scaling, on both the technical community side and the content side. On the technical side, too often we think as if we were an isolated organization, rather than a respected leader that many wish to collaborate with. This causes us to ask ourselves the wrong questions and get the wrong answers. For example, we asked ourselves whether we should limit ourselves to existing open source translation tools, or use proprietary translation services to fill in the gaps. Instead, we should have stayed committed to open source, and asked how we can use our engineering and financial resources to advance open source translation. This is a major problem that no organization can solve on its own. However, we have both the motivation and resources to be a major contributor to the solution. We also asked whether we should support the proprietary MP4 format, or limit ourselves to weak device support for open formats. Instead, we should be staying committed to open standards, and working to support their uptake among software developers and device manufacturers. We already have significant relationships with wireless carriers that give us a foot in the door with such manufacturers. By seeking important partnerships, where we are prepared to put in significant effort, we can greatly scale both our own efforts and those of the broader movement. On the content side, to achieve sustained long-term growth, we need to grow every type of user activity, including writing, editing, discussion, organization, curation, maintenance, workflows, and moderation. We have historically provided good (and improving) support for writing, editing, discussion, and moderation. However, we have neglected the related processes of organization (e.g. categorization, tagging), maintenance (e.g. tracking articles that need fixes, updating them as they become out of date), curation (e.g. quality images, featured articles), and workflows (used in multiple areas, but particularly supporting organization, maintenance, curation, and moderation). It is vital that we improve discussion, curation, workflows, and moderation tools. Otherwise, we will be unable to keep up with increasing content and activity as our improvements to writing and editing succeed. We should look at past successes (e.g. the Teahouse) and failures (e.g. Article Feedback) and learn lessons. In both cases, we made a very specific product, which then succeeded or failed. This is not scalable to hundreds of wikis, and it is hard to iterate in response to lessons learned. Instead, we should focus on platforms, such as workflow systems. In order to keep up with the community, we need to give them the flexibility to constantly use the software according to their needs.
I'm highly interested in having a deep discussion about our technical debt. I've been active in operating on this front myself and I really care how we fare in this aspect, however I see a lack of consensus on tech debt from the development community at large. We deprecate things and then continue using them. People get irritated when their extensions break due to slightest core changes, even when the extensions themselves are misusing the core interfaces. We can't really run lots of types of static analysis against our code base because the sheer amount of problems detected would make the signal to noise ratio unacceptable. Developing skins for MediaWiki is incredibly painful. Our tests access database a lot, as a result they're slow and fragile. These are just a few examples of pain points haunting our code base and extracting their daily toll from everyone working on it. I would like to have Tech Debt SIG work in person on addressing these issues. We should define code quality metrics, identify problem areas and create some actionables to address them. We should also discuss approaches to handling this without causing too much discontent from broader developer community. I believe this would be an important step towards making MediaWiki a better ecosystem and improve our development pace.
How do we maintain and grow the technical community and ready it for the mission ahead? Maintaining and growing a technical community is difficult, particularly when the majority of that community is contributing their time and code on a volunteer basis. However, we can look at other successful projects for guidance, to see what we can learn and apply to our own movement: Clearly articulating the value for participants. It's important that we articulate what participants will get (socially, professionally, personally) from contributing to our projects, and it's important to socialize that value through feedback loops, communication, and positive reinforcement. One of my favorite projects — the Smithsonian Transcription Project — hired a full-time community manager for their volunteer community. It was her role to pair participants with projects, follow up to ensure things were going well, and intervene if changes needed to be made. Creating feedback loops that reinforce the value for participants. It's not enough to get people in the door; we must continually reinforce why participation is meaningful for both participants and the mission of free and open knowledge. People will have different reasons for participating — some want to build a skilset, others want to contribute to a meaningful project, still others are completely an assignment. The value for all of these participants differs, and the messaging/communication should reflect that. Finding pathways to participants through non-technical means. GitHub does it particularly well. They want to reach students. So they have a space - https://education.github.com/ - aimed at teachers. This is a particularly smart strategy: How do we reach participants where they are, and think about conduits who might identify possible participants? #100WikiCodeDays: The project #100WikiDays is successful because it creates a habit for participants, gives them ample feedback, provides them with community support, and gives them a goal. Are there similar efforts that we could think about re: code contributions? Continually communicate the value: The best open source projects continually communicate to participants and the larger world about what's happening. Someone files their first bug report? Great, maybe they get an email saying "Here's the next step you can take." Someone creates a tool for Tool Lab? It's amazing? Send them to the blog for a profile. Let's elevate their work and use it to bring others in.
Michael Holloway: Free Software is Fundamental to Our Mission
MediaWiki is a prominent free software project, and the Wikimedia projects have always run on free and open-source technology, but our relationship to free and open-source software needs clarification. We should formalize that we are committed to making, using, and leading in the development of free software, even when doing so is more difficult or less efficient in delivering user value than adopting closed solutions, as a central part of our educational mission.
How does free software relate to the free knowledge movement? In this movement we are building a body of open knowledge, curated collectively and accessible to all. We develop the software that powers these projects in the open, and we run our backing infrastructure on free and open-source technology. We choose to do these things not because they are easy, but because they are hard. Existing free software is not always, or even most of the time, practically superior. We work in the open so that others can contribute to and learn from our processes; our work product is educational content in its own right, and in that way directly contributes to our mission. By ensuring that our tools and processes are open, and working through problems with free software projects rather than rejecting them in favor of closed solutions, we empower others everywhere to join us in doing this hard work, or to launch like-minded projects of their own. It's often tempting to conclude that our users could be better served by adopting closed or proprietary software solutions to our engineering problems, rather than adapting free software to meet our needs or writing our own. This may be true in the short term, but over the long term this contributes to the cloistering of software engineering expertise in closed commercial enterprises. Our goal is to expand and not restrict the knowledge of software engineering principles and practices, and we are playing a long game. What could a formal commitment to free software mean in practical terms? This is intended as an open-ended question for discussion, but here are a few ideas:
- We should take a leadership role in the development of free software languages and technologies on which we depend (e.g., PHP).
- Where we develop software for closed platforms (such as the mobile apps), we should promote free alternatives for their distribution channels (e.g., F-Droid) and ensure they can be run without depending on proprietary software.
- We should encourage and recognize contributions by our engineers in the broader free software community.
Mingli Yuan / User:Mountain: Embracing a new era with only small language obstacles
Recent progress on neural machine translation gives us better translation results. The industry invests huge amounts of money in this area for a promising future. For the first time people can communicate with only small language obstacles. We should be prepared for this near future by evaluating our position and understanding the impact. Also we should seek new opportunities, and contribute to the trend.
- Cooperate with the industry to enhance our translation infrastructure
- Continuously release our translation data as an open corpus
- Evaluate the impact. For example, probably very radical, how about setting up one unified Wikipedia in the future?
All of the Wikimedia projects have, in technical terms, MediaWiki - the software - at its core. Thanks to this fact, MediaWiki has become a widely-deployed system, drawing many volunteer developers. Alas, there is a disparity in scale between the WMF-run install and other, external set-ups, which hinders the speed with which the platform supporting the Wikimedia projects. On the other hand, architecting microservices has proven as a good way of achieving scalability, increasing developer productivity, improving maintenance and reducing technical debt. Gradually moving towards 'de-monolithising' our core infrastructure will enable developers (both WMF staff as well as volunteers) to start working on all sorts of interesting features, ranging from simple add-ons to full-blown companion sub-systems. While this transition is (arguably) already happening, everything still gravitates around MediaWiki - the software. Instead of focusing our efforts on compatibility in scale (e.g. one JobQueue system for WMF, another for external installs), we should focus on the products and features that allow the projects to grow, both in terms of number of projects and features they offer, as well as in the number (and diversity) of their users. Microservices can greatly help in achieving this goal, since all installs can select the components they want to run based on the available resources at their disposal and their potential reach or scale. Much like the advent of extensions enabled various parties to complete their systems with sought functionality, microservices can refocus our technical community to think about features and components without worrying about scaling them (up and/or down). If we want our developers to assist the Wikimedia projects and their communities, we need to bring our core infrastructure to the 21st century. Let's not leave the technology behind - it is central to the success of the communities we are trying to enable.
Moriel Schottlender: Spreading knowledge with our code libraries
The Wikimedia Foundation is a leader in many fields, but none as so obvious and otherwise so underserved anywhere else than that of language and accessibility. We are not just the fifth biggest site online, or one of the biggest open source endeavors available, we are the de facto leaders of technology that other commercial companies consider “edge case” and “less profitable”. This gives us an advantage of developing tools that don’t just help our own audience, but could — and should — serve as a repository for allowing everyone online to reach, support, and embrace these audiences with minimal effort.
We have many of the tools available already, for our own users and products, but they are still limited when it comes to sharing and using them outside the movement. And why? Developing our tools to be accessible to outside projects — and to cloud tools, to bots and to other Open Source organizations — is a doable task that is not just worthy in general, it also follows our mission.
What better way to empower “every single human being [to] freely share in the sum of all knowledge” than to share our own powerful tools with others to allow everyone to prioritize support for language, accessibility and right-to-left technologies and push these relevant technology forward?
I suggest we look across our technologies and libraries — from OOjs UI to CSSJanus, ResourceLoader to wfMessage(), and many others — and work to better generalize these to serve our own users better in their projects, bots, and cloud tools — and to place ourselves firmly and officially as the leaders of this technology that we already are.
Now that my position is known, my direction is unknowable — Heisenberg Uncertainty Principle.
So let’s break reality, and figure out both.
Moritz Schubotz (physikerwelt): Developing software in a wiki way
Over the last 15 years, MediaWiki evolved from a simple PHP script to a complex and highly integrated family of products and services, serving knowledge to billions of humans. Every change might cause an instability or a complete failure of the system. Thus, measures including code review, automated unit testing and code/ product ownership, have been established to guarantee the stability of the software. The drawback of this approach is that improving the software became very challenging for volunteer contributors. This proposal seeks to lower the barriers for volunteer contributors while maintaining the stability of the system.
- Reduce the effort of code review by applying Artificial Intelligence methods. Thus, reviewers can focus on non-formal comments.
- Develop a dialog platform that ensures that volunteer contributors are aware of the next steps and the roadmap for their change on the way to production.
- Establish a team that supports volunteer developers, who want to make a difference that is not listed in the annual plan by providing temporary code or product ownership.
- Improve testing and evaluation to measure the effect of every single change and to identify code or even whole services that are no longer necessary and can be switched off.
Niharika Kohli: Investing in our communities
This position statement captures my thoughts about why and how we should be investing in our communities. There are a lot of ways we can encourage and support them, that we currently don't. Prioritizing to build tools for our communities is a crucial step for long term survival of our projects. It's fairly common knowledge how a lot of our communities suffer from toxicity. It's incredibly hard for newcomers to edit, to stick around and stay engaged in the midst of the existing toxicity in the community. The problem frequently also exists in smaller communities. Just recently, the English wikipedia community has pushed WMF into implementing ACTRIAL and preventing brand new users from being able to create articles on the site. These are signs that all is not well with our communities. If we envision a future with an active, thriving editor community 15 years from now, we've to become more aware of how our communities function and do more to support them than what we do today. The problems also exists on the technical side. Communities without technical resources lose out on gadgets, templates, editing toolbar gadgets and so on. The editors on these wikis are still forced to do a lot of things the hard way. Non-wikipedia projects are probably the worst affected. Quite often our software projects also cater to the bigger projects. Often just wikipedias. I am sure we can't solve everything but I'm sure we can try to help solve at least some of the problems. We can invest in better tools for new users to create articles, to edit and experiment with wikitext markup. We can build a better "on boarding" experience for new users. For example, English wikipedia currently has "Article Creation Wizard" which is outdated, poorly maintained and very confusing a lot of times. We can think about a more standardized solution which would be useful across wikis. We can also try to showcase user contributions in a better way, to build user engagement. Various wikis have been striving to create and sustain "wikiprojects" since a while with the result that several big wikipedias have come up with their own homegrown solutions for it. These are things the Foundation can help with building and standardize it for all wikis. For the technical problems, there is a big backlog of projects which are long overdue. Global cross-wiki watchlists, Global gadgets, templates, lua modules have been asked for by the community since many many years now. There are a lot more such projects to be found on Phabricator and the wishlist survey. These are projects which can be building blocks in making our communities more sustainable and thriving places. They are big and important enough projects that should make it into the product roadmap of teams outside of Community Tech. Another important thing we should think about is tools. Some tools such as pageviews analysis is one of the most important volunteer-maintained tools out there. What happens when it stops being maintained? When is a tool important enough for the Foundation to start thinking about incorporating that functionality in an extension/core? These are all important discussions to be had.
Nikerabbit: Translation as a way to grow and connect our communities
The Wikimedia movement depends a lot on translation, but I believe we are not currently using the full potential of it. This affects us in many ways – most importantly: - language barriers isolate communities – but we all need to work together, - our content is not accessible to every human, - our movement is massively multilingual, but not the forerunner in using translation and other language technology. We should improve our translation tools and leverage machine translation in a sustainable way. Translation should be a core part of our infrastructure and integrate into our projects seamlessly. It will help our communities to grow, as demonstrated by the Content Translation tool. I suggest three focus areas.
- Find partners to build high quality open-source machine translation Our projects run on free software. Currently, we depend a lot on proprietary data-driven (statistical) machine translation. For translation to be an essential part of our infrastructure, then this is neither sustainable nor acceptable. We already use expert-driven (rule-based) open-source machine translation software, e.g. Apertium, which provides some high quality language pairs. However, the proprietary services cover a lot more language pairs, albeit with lower quality. Building machine translation engines is hard work, therefore we should find partners to pursue both data-driven and expert-driven engines. The impact of this could be big and extend beyond our movement.
- Bring translation everywhere We already have good translation tools, but we need to move beyond user interface and Wikipedia pages. We should integrate translation tools into our discussion systems to support multilingual discussions as well as to understand discussions in foreign languages. This should be combined with summarizing tools. We have a lot of (structured) content that can be translated but doesn't have a proper tooling for translation, e.g. Wikidata and Commons image description, labels in SVG files. We should adapt and integrate our existing translation tools to support these types of content. We should also make language selection available to all users, including those not logged-in in our multilingual projects, such as Wikidata, to show the translations.
- Improve our translation tools Our translation tools have serious issues that result in slower translations or not being translated at all. Our translation memory is not working well. It often fails to suggest good matches. This is apparent when translating the Weekly Tech News. Translators' time is wasted when they need to re-translate (introducing inconsistencies) or searching previous translation manually. Without improvement our translation memory is not suitable for use in Content Translation either. When translating documentation pages, announcements, etc. using the Translate extension, a significant amount of extra markup is added to the wikitext. Editors find this markup inconvenient and justifiably resist using this tool. This feature should be improved so that it works with Visual Editor and doesn't require additional mark-up in the wikitext.
Users should not be punished for logging in WMF wikis are slower for logged-in users than for anonymous users, which is unhelpful for trying to get users to contribute. This is a long standing problem that's hard to solve, but we should have a vision for how we're going to solve it. WMF has caching data centers in strategic locations around the world (Amsterdam, San Francisco and soon Singapore), which make the wikis faster for users who are not near the primary data center (in Virginia) but are near a caching location. However, this only benefits anonymous users. For logged-in users, every page view contains their user name and other user-specific information in the personal tools area, so logged-in page views are considered uncacheable and are always routed to the primary data center. This means that if a new user browses the site for a while, then creates an account because they want to contribute (or makes an anonymous edit), the wiki suddenly becomes slower for them. All users are affected, because uncached requests are slower to serve than cached ones, but users outside North/South America are affected the most, because their traffic now has to cross an ocean that it didn't have to cross before. It's not nice that a new user's "reward" for creating an account is a slower experience, but it's especially not nice that users in emerging communities are affected the most. If we want to encourage readers to become contributors, slowing the site down as soon as someone contributes is not very helpful. Some requests will always have to go to the primary data center, such as POST requests saving an edit, and those are always going to be slower for users outside North America. But for logged-in page views this isn't fundamentally necessary, and serving them from the edge caches would speed up the site for logged-in users and reduce the load on the app servers. There are different ways that this could be done, each with their own obstacles. For example, a single-page application for MediaWiki could use a content service to retrieve only the new page's contents when navigating, but this would require modifying or rewriting a lot of code in MediaWiki; ESI could be used to have Varnish inject cached page contents into a user-specific chrome, but that would require using advanced and partly unproven Varnish features. In both cases, we'd have to reimplement certain rendering preferences using CSS or a post-processing step. It's far from trivial, but let's start talking seriously about how we can address this problem.
Sam Reed: Security is important
Although Wikimedia/MediaWiki has a generally good track record, we should always be striving for better, ideally, in an automated fashion via testing of the code, and providing a decent and easy to use framework inside MediaWiki to allow people to do this without causing them excess work or effort. In some cases, we can do this through documentation; we have some options that allow for complex things to be done, but it's not very clear to developers that what they are doing may lead to security issues down the road. In the same way we run phpcs, and are moving towards running phan, it would be very nice to improve our automated testing with a security focus. Helping point people to potential pain points in the future, or just general bad practices now. As Tim Starling said a few years ago, people shouldn't be doing CR for code style etc. It doesn't make sense, it's a waste of time. This should be automated away as much as possible. Which is happening, slowly improving the MediaWiki codebase. How can we use this basic idea, and improve the MediaWiki software and it's extensions for security best practices?
Sam Wilson: Encourage use of MediaWiki outside WMF projects
tldr: Encourage use of MediaWiki outside WMF projects, because doing so furthers the mission (and improves the software).
MediaWiki is primarily a tool that helps the Wikimedia movement. The movement is not primarily about MediaWiki, but MediaWiki is the central tool with which we are currently fulfilling our mission. Lots of other people use MediaWiki too, but their needs are not the focus of WMF development. I think we should do more to encourage 3rd party use of MediaWiki, and in doing so broaden the developer community and end up with higher-quality software that is easier to use for more people. Our mission is about empowerment and education, and people running their own wikis should be seen as part of that. Just because content isn't hosted on WMF servers doesn't mean it's not part of the movement. I imagine a future in which MediaWiki is as common for collaborative websites as WordPress is for blogs, and people don't have to rely on Facebook, YouTube, GitHub, etc. to host their content. A couple of parallels (sort of):
- OpenStreetMap hosts the central database of their map, but they actively discourage people doing anything other than editing on the OSM infrastructure. Instead, there is a large ecosystem of tools and systems for serving that data. This is mainly because it would be impossible for OSM to provide the bandwidth and required formats etc. for all the possible uses — similarly, the set of Wikimedia sister projects are never going to provide every wiki that people want.
- Automattic runs wordpress.com and also manages the opensource development of WordPress; the latter seemingly in conflict with the former, but because of the long history (and relative late-starting of wordpress.com perhaps) the software has remained a favorite of self-hosted websites. (I'm ignoring the obvious security arguments for a now.) One reason that MediaWiki is not better for 3rd party users could be that there are not all that many non-WMF people making a living out of developing for it, at least not in comparison with other web frameworks (Drupal, WordPress, etc.). WordPress had to be easy to install on cheap web hosts because that's all there was; MediaWiki seems these days to only have that characteristic as a historical hangover, and it could well end up moving away from being aimed at amateur sysadmins all together. Perhaps there's a fundamental clash between the scale requirements of the WMF sites vs. the ease-of-administration requirements of 3rd party wikis — but if there's a choice to be made, it should be explicit and well-communicated. At the moment, it feels like it's happening by attrition.
Santhosh Thottingal: Mediawiki is one of the rare software system where the i18n is done right
This infrastructure need timely improvement and maintenance. The technology and resources for supporting that technology is important as 2017 movement strategy states: “We will build the technical infrastructures that enable us to collect free knowledge in all forms and languages.”. But most of these infrastructure is running under volunteer capacity and no official team responsible.
- Opensource strategy - Mediawiki language technology is isolated and a less known in general opensource ecosystem. There is a need to have proper ownership, maintenance and feature enhancements as good open source project, so that our contributions comes from other multi lingual projects, while we help with our expertise.
- (a) Our localization file formats and the libraries on top of them are very advanced and supports languages more than any other system. But it was when the libraries made mediawiki independent, other projects started noticing it. We use that independent library(jquery.i18n) for VE, ULS, OOJS-UI and present in mediawiki core. But it is not actively maintained, issues and pull requests not addressed because there is nobody in foundation now in charge of it, except volunteer time. There is a lot of demand for its non jquery, general purpose js library. Code is aged, tech debt is increasing.
- (b) We developed one of the largest repository of input methods(100+ languages) to support inputting in various languages and an input method library. This is a critical piece of software for many small wikis - jquery.ime. The code is aged, not actively maintained by anybody from foundation, except some in their volunteer time. Not updated to take advantage of browser technology updates about IMEs. This is a mediawiki independent open source library.
- (c) Universal Language selector - a language selection, switching mechanism for our large list of languages, also delivering input methods, fonts, need ownership and tech debt removal. Navigating between different wikis is done using this and now the team authored this system does not exist.This is a also mediawiki independent open source library. VE, Translate, ContentTranslation, Wikidata depends on this library. (d) Mediawiki core i18n features(php) are also started showing its age. There were plans to make some of them as standalone opensource libraries. Not happened. Nobody officially responsible for this infrastructure too.
- The Translate extension - helping to have mediawiki interface available in 300 languages - something that we always proud of - is not officially maintained by foundation now. The localization happens because of volunteers and volunteer maintaining Translate extension code. Moreover, the translatewiki.net, which hosts the Translate where localization happens by volunteers also outside foundation infrastructure.
- It is time to have machine translation infrastructure within wikimedia. Content translation used machine translation - but that is an isolated product. Translation of content can be used in various contexts for readers. CX tries to provide a service api for MT, There are lot of potential for that. Multiple MT services, even proprietary services might be needed to cover all the languages. At the same time, our content and translations are important for training new opensource MT engines.
- Wikipedia follows very traditional approach for typography and layout. Language team had limited webfont delivery to aim missing font issue, but too old code not got any updates in last 3 years or so. Language team plans to abandon that feature due to maintenance burden, but not happened and no team now owns it. Other than this a few wikis does common.css hacks to have customization of default fonts. Typography refresh attempt from reading team was for Latin. Every script has its own characteristics about font size, preferred font family sequences, line heights etc. Presenting knowledge in all these language wikis, in 2017 or later need serious thoughts about readability, typography and general aesthetic of wikipedia in a language compared to other websites in that language.
Our platforms should refocus on collaborating, drafting, and experimenting. Currently much focus is on polished presentation and restriction, hindering experiments and limiting participation.
- Editing tools focus on fast smooth drafting: multiple simultaneous editors, suggested changes. A terse, readable history highlights major revisions of an article. Discussion is integrated into the draft interface, and toggled on / off.
- Articles can be forked & merged, supporting all sorts of experimentation. Different groups can work on parallel forks, merging later if they like. Newcomers not following a policy can be channeled to an individual branch while sorting it out, avoiding edit wars. Sandboxing helps avoid "deletion": questionable or disputed contributions can be sandboxed to a hidden or low-visibility personal page. [This is also conducive to distributing an online/offline federation of editors, e.g. over IPFS]
- Editing, creation, and uploading are encouraged prominently in every page interface. Matchmaking tools help creators find others with similar interests, learn and collaborate. Tools for similarity checking, merging, metadata / license review, and meta-moderation, help anyone contribute and learn new ways to do so. Deleted material [unless oversighted] is reviewable by all who know where to look.
- The reading experience focuses on contextual connections and human connections. Real-time conversation is available as an overlay while reading. Data-rich interfaces help readers browse multiple versions of an article, and get a sense of persistence, reliability, and interest. For instance, heatmaps for revised / controversial / commented areas; wikiblame for granular provenance; different colors for different sorts of cites; visual cues about how much complementary or conflicting knowledge is available in other articles, files, languages or Projects.
Cultural aspects (& related tools)
- Namespaces include every potentially useful topic: completeness, notability, and copyright uncertainty affect how things are presented, not whether they exist. Similarly, media repositories include all useful material that is legal to host.
- File uploads are welcome as contributions to the global commons even when they need work. Files are transcoded to free formats where possible. File formats with no free-codec options, or that cannot be thoroughly checked for malware, are stored in their own flexible repository [such as the Internet Archive]: using the same Wikimedia upload interface and metadata, and providing similar wikilinks to reference files from within the Projects.
- The newcomer experience is simple, flexible, and protected. Contributions from people who "don't know how to do it right" are welcome, and kept separate from the flow of updates from regulars, with their own visibility defaults. Matchmaking tools help newcomers find active work in their area. Blocks, deletions, and warnings happen only for spam / vandalism. Other concerns at worst hide their work from public view, with a friendly review with a peer after the first weeks. A broad group of peers can protect newcomers, for instance by redirecting concerns and complaints about a newcomer to themselves.
It is time to move away from a "single latest revision viewable by all" model, and the conservative policies designed around it. We need a more flexible model embracing multiple working copies, long-lived drafts, and a greater freedom to experiment, collaborate, and create.
Subbu Sastry: Transparent typing layer over wikitext
Problem: To satisfy the 'Knowledge as a service' theme, in addition to providing access to full page content, Wikimedia APIs should provide access to semantic units at: - an abstract document level (sections, headings, tables, etc.) - a domain specific level (infoboxes, geolocation, taxoboxes, etc.) Wikitext, the core content creation technology on wikis, evolved as a string-processing language where one set of strings is replaced with another set of strings mostly via regular expression matches to yield the output HTML string. There is no notion of document structures here. This lack of structural semantics gets in the way of being able to robustly identify semantic units and developing tools and features that operate on a page structurally at sub-page granularities.
Solution: Transparent typing layer over wikitext
Types improve abstraction, reasoning, and tooling abilities in programming languages. A transparent typing layer on top of wikitext can provide similar benefits.
A: Enforce structural types on output of wikitext constructs including templates and extensions
- Specify that all wikitext constructs have an output with type: String, DOM, CSS property, HTML-attribute, or a List of one of those
- Extensions and templates can specify the expected output type. All other core wikitext constructs have the DOM output type.
- Parser enforces the output type of all wikitext constructs. Examples: For DOM types, unclosed tags and misnested tags are fixed up. For String types, HTML tags are escaped, wikitext strings are nowikied. For CSS types, values are sanitized. Among other benefits, this basic typing mechanism enables MediaWiki to provide an API to extract and edit document fragments without introducing adverse side-effects on the rest of the page.
B: Unified typing mechanism to expose domain-specific semantic information
Editors impose structure in documents through a rich library of templates, policies, and maintainance processes they have developed over the years. If this semantic information (infoboxes, navboxes, sports rankings, railway timetables, etc) is mapped to a centralized ontology system (wikidata, schema.org, something else), the parser can expose this information in HTML and via MediaWiki APIs can expose this information in a wiki-neutral way. There are multiple disparate mechanisms today wherein template authors specify metadata about templates (templatadata, templatestyles, possibly others?) Instead of creating newer mechanisms for specifying structural output types and semantic information types for templates, it is better to provide a consolidated mechanism that unifies all this template metadata into a single user-defined type declaration. This lets newer applications and capabilities to be developed in the future without code changes to the core mechanism.
This typing layer only affects template authors. Editors that use source editing won't see any impact (besides fewer markup errors). Editors that use visual editing might see improved tooling. Even for template authors, this is meant to be an opt-in mechanism with gradual migration over to the new model. The proposal here is a logical extension of what Parsoid does today. Parsoid provides an illusion of structured wikitext and demonstrates what is possible (VE, CX, Linter, Flow among others) by embracing structured semantics.
- Commentable version: Sister project incubator
- See also alternative proposals: Make MediaWiki commercially viable and Turn MediaWiki core into an embeddable library
We need infrastructure for cheap creation of new wikis and experimentation with new project types and their different technical and social needs. Wikipedia was an unlikely bet that paid off extremely well. It does not cover the entirety of human knowledge however - it's limited to long-form, text-based representations of encyclopedic topics on subjects well covered by reliable sources. There have been recurring discussions in the past about covering new types of knowledge (genealogy, fact checking, 3D models, oral history, collaborative video etc. etc.) but they never went anywhere, because creating new projects is a lot of work and the outcome is unpredictable (some things only work in practice), so the WMF was unwilling to take the risk. If we want to make meaningful progress on our vision of curating the sum of *all* knowledge, and truly become the infrastructure of free knowledge, we need to overcome this barrier. We need infrastructure for low-effort, low-risk experimentation with new projects - a "wiki nursery". A system where new wikis can be created at minimal cost, technical and social experiments can be performed on them without undue risk to other projects, and they can be discarded if they prove unsuccessful. Other organizations have used such mechanisms with great success (Wikia has tens of thousands of wikis; Stack Exchange has nearly two hundred sites). Such a system has to be somewhat separate from the existing wiki cluster; a closely integrated approach like incubator.wikimedia.org is both too inflexible and too insecure. It needs to have some level of operational and legal maturity (more than what Cloud VPS offers). For risk segregation and eficiency, it would have to be different from our production cluster in a number of ways:
- single sign-on which does not rely on the local wiki for authentication
- no shared database and configuration acces
- ensure even distribution of server resources without constant human supervision
- flat domain structure (to avoid cookie leaks) with affordable certificate management
- new management interfaces and corresponding core support to avoid spending developer time on common tasks (wiki creation and destruction, configuration and extension management) While there is a significant one-time cost to setting up such a system, keeping it running is cheap, and the benefits are well worth it - information on what content types and what policies enable productive collaboration, a space free of the conservativism and change aversion of large established projects where new concepts can prove themselves, an entry point for projects with a less western-centric interpretation of knowledge, lots of small projects which are welcoming to new editors and provide lots of low-hanging fruit for contribution. (Also, the same feature set is needed for paid wiki hosting service, which is one way to create non-WMF income streams into the MediaWiki software ecosystem - a valid strategic goal on its own.)
A key question for me is how we can maximise the richness of our feature offering despite having entered a period of slow growth in revenue and staff numbers. Wikimedia serves a very large number of users, with a diverse set of needs -- nobody can say that the site as it stands is sufficient to satisfy all of them. There are two main threats to our goal of providing a rich feature set. One is maintenance burden. We are faced with the prospect of sunsetting features because we find the maintenance burden to be too great. But there is no incontrovertible rule in software engineering which says that code, once written, must constantly be rewritten. Maintenance burden most commonly arises from changes in the platform on which the code is implemented. Minimising maintenance burden for a given feature set thus necessitates choosing a stable platform. We need to consider the programming languages we use, and the libraries we require, through this lens. The second threat is needless complexity. Concepts which are hard to understand, and which thus restrict related development to highly skilled developers, are appropriate only if hidden behind a module boundary. In order to enable contributions from developers less skilled than ourselves, and to minimise the time required for learning and familiarisation, the bulk of our code should be simple. Complexity is alluring because it provides developers the opportunity to take pride in their work. But for the benefit of the organisation as whole, its efficiency, and thus the richness of its product offering, we should introduce complexity only with due caution. Code which is complex but stable can be valuable, presenting no great risks. For example, the diff algorithm we currently use in wikidiff2 has its origin in Perl code written in approximately 1998. Only in the last year have we considered adding substantial features to it. We have a PHP port and a C++ port, and neither requires significant maintenance. This is because the requirements are stable and the two respective platforms (C++ and PHP) are stable. Contrast this to OCG, which is at risk of sunsetting only three years after its original deployment. The reason is that its input and output formats are constantly changing, that is, it has changing requirements; and it was written on a modern and rapidly changing platform. Its main developer wrote "the architecture which was state-of-the-art in 2014 is already looking a little dated in 2016". My goals for the developer summit are to encourage people to think carefully about writing code on top of a conceptually complex, rapidly changing platform. I want WMF and the MediaWiki community to write code which is stable and long-lasting, and can thus support a richly featured website into the future.
Timo Tijhof: Embrace open-source and keep our software to the same standards we hold other open-source software
This would prevent our software from becoming isolated, hard to maintain, or hard to contribute to.
Scale the contributor experience.
Ensure our content remains of high quality and value to readers; ultimately to avoid failing our mission. I envision this requires a radical shift in how our application is served, by involving a non-static service capable of scaling to the traffic of our CDN and yet vary responses by user.
We must understand the dangers of producing software that isn't reusable. Such software may be hard to maintain, hard to contribute to, both for future contributions, and our future selves. "Current needs" only exist to serve our long-term needs. Losing track of long-term needs can make software too specific to a current need, risking a trend of releasing software that is only open-source as a courtesy, for transparency, and without being re-usable. Reusable software has a defined purpose and serves it well. It tends be easy to install, well-documented, and easy to contribute to. Re-use between internal services, and external use. Such as for community tools, cloud services, or other third parties. Having a defined goal also encourages designing APIs in a way that we can agree not to break or change too often, because they are public.
Our current infrastructure is highly optimised for the passive reader that doesn't contribute. We serve a static CDN response to most users. For users having logged-in, or made contributions, we bypass these layers for all page loads. As a result, their document load time increases by 5x-10x (eg. NavTiming metric responseStart). In 2015, Ori mentioned the danger of this in (<https://blog.wikimedia.org/2014/12/29/how-we-made-editing-wikipedia-twice-as-fast/>), saying optimising our backend will "allow us to dissolve the invisible distinction between passive and active users". And "enable microcontribution [features] that draw [in] passive readers". Banners (CentralNotice) are a good example of our needs being at odds with our infrastructure. We want banners to show as part of the page, and for banners to vary by user, location, plus random variance. Our current infrastructure can't do so without bypassing the CDN on all requests. As such, the current way is entirely client-side and completes well after page load. In few cases where we do ask readers for data, it is for statistical purposes or to improve the software. Direct (or indirect) contributions to our content remains limited to complex actions like "edit". Moving our contributor's experience to match some of the capabilities and performance of the reader experience, would enable us to start accepting micro contributions that actually produce a change in content (either directly, or e.g. by consensus). It also opens the door to making our web platform work offline (e.g. ServiceWorkers) which further enables high-performant interactions that can be uploaded later.
The Wikimedia movement strategy is making a focus on serving more different kinds of knowledge and sharing them with allies and partners . I believe that the most important ground work for reaching these goals is to focus on the outgoing project of moving MediaWiki from a "wikitext plus media file" collaboration system to a platform allowing people to be able to collaborate on many kind of contents and to organise them in a cohesive way. Two axes seem important to me to pursue this goal:
- Build platforms and tools allowing contributors to create and clean metadata about these contents in order to build together the broadest cohesive set of knowledge ever available and increase its reusability.
Going in these directions would allow us to:
- Allow sister projects (and possible new ones) to use relevant content structure for their projects instead of a designed-for-everything wiki markup. It should lead to an increase of their reusability and their user-friendliness, just like what the Structured Metadata on Commons project is aiming for.
- Build powerful APIs to retrieve and edit content just like Wikidata has and so, make working with partners easier.
- Increase the connections between our contents and their discovery using their metadata
- Build better tasked-focused mobile viewing and edit interfaces
- Be more ready for the possible new environment changes like voice-powered interfaces Some examples of projects we could work on in order to move in this direction:
- Use the multiple content revision facility to migrate progressively the data that could be structured out of Wikitext on all our projects (like the structured metadata on Commons project is aiming for files)
- Federate all our structured content into a "Wikimedia Query Service" that would allow to do unified and powerful analytics and to ease the discoverability of all our contents
- The logical granularity of Wikisource, Wikibooks and Wikiversity contents (and maybe other projects?) is not the wiki page but the set of wiki pages storing a book or a course. MediaWiki should be able to support such use cases by providing a "collection" system allowing us to add metadata and to do operations (renaming, add to watchlist) on sets of wiki pages.
- Switch projects that stores fairly structured data in wiki text templates (like Wiktionary or Wikispecies) from a Wikitext storage to a structured one. Build on top of them user interfaces to edit their local contents (and maybe also the relevant data from Wikidata) and provide nice displays and APIs to make humans and machine both able to retrieve these contents.
Volker E. (WMF): Developer resources & collaboration on Wikimedia
User Interface Style Guide
Presenting the Wikimedia User Interface Style Guide targeted on developers needs. The style guide is including all its resources and is hosted on GitHub as a test case for design collaboration. In the ideation of the new style guide, it was emphasized for it to be a successful resource, it has to address both, designers and developers needs. So far a big part of the overarching and visual style principles alongside one development "base" layer – WikimediaUI Base variables, already being used in OOUI and Marvin as building block – has been accomplished. Within the next couple of weeks, there will also be a "Components" and "Resources" section where developers are presented design principles "in action", combined with demos hot-linking.
The presentation would be on principles behind and application of the components:
- Ideation and design: This is for everyone, internationalization/language
- Open to collaboration
- Design consistency
- Trustworthy yet joyful
- Usability and UX best practices and underlying user research
- Responsive design and mobile first principles, and
- Accessibility measurements
There are clearly topics overlapping with the questions posed on the thematic overview like maintaining and growing technical community, role of open source, scaling, tools for embracing mobility. At the end of the presentation there is an open questions/feedback slot planned getting ideas on how to extend/improve the style guide even more.
Previously accepted, but no longer attending
My purpose in attending the Dev Summit is to enjoy the benefit of collaborating in person with others who are passionate about technology that brings information to the world in a variety of languages. When I imagine a world where everyone really can share in all knowledge, I don't imagine all of them doing so in their native language. The most important foundation for language technologies that will reach as many people as possible is informed realism—with insights from both linguistics and computer science.
- The most common estimate of the number of languages is 6,000. An unfortunate number are critically endangered, with only dozens of speakers; 50-90% of them will have no speakers by the end of the century. Providing knowledge to everyone in their own language is unrealistic. We should always seek to support any community working to document, revive, or strengthen a language, but expecting to create and curate extensive knowledge repositories in a language with barely half a dozen octogenarian speakers whose grandchildren have no interest in the language is more fantasy than goal.
- Statistical machine translation has eclipsed rule-based machine translation for unpaid, casual internet use and building it doesn't require linguists or even speakers. But it does require data, in the form of large parallel corpora, which simply aren't available for most languages. Even providing knowledge in translation is impractical for most of the world's languages.
- English speakers are notoriously monolingual, but in many places multilingualism is the norm, with people speaking a home language and a major world language. A useful planning tool would be an assessment of the most commonly spoken languages among people whose preferred language does not have an extensive Wikipedia. Whether building on the model of Simple English or increasing the readability of the larger Wikipedias, we can bring more knowledge to more people though Hindi/Urdu, Indonesian, Mandarin, French, Arabic, Russian, Spanish, and Swahili—all of which boast on the order of 100 million non-native speakers or more—than by trying to create a thousand Wikipedias for less commonly spoken languages.
- English is particularly suited to simple computational processing—a fact often lost on English speakers; it uses few characters, has few inflections, and words are conveniently separated. Navigating copious amounts of knowledge requires search. The simplest form of search just barely works for English, but often fails in Spanish (with dozens of verb forms), Finnish (with thousands of noun forms), Chinese (without spaces), and most other languages. Fortunately, for major world languages we have software that can overcome this by regularizing words for indexing and search. Again, none of this is to say that we should ever stop or even slow our efforts where there is a passionate language community—or even one passionate individual—working to build knowledge repositories or language-enabling software. But we must be realistic about what it takes to reach the majority of people in a language they understand.
The strategic direction that has emerged has two components: "Knowledge as a service" and "Knowledge equity". "Knowledge as a service", which focuses on infrastructure, seems like the one most related to technology, This proposal is about exploring the less obvious intricacies between the two components, and in particular the technology implications of Knowledge Equity. As a complex socio-technical system, it's not really possible to separate people from technology when talking about Wikimedia. A direction of Knowledge Equity invites the contributors of the Wikimedia movement to take a critical look at themselves and assess their biases and privileges. This, in turn, can help identify structural biases that have been reproduced and ingrained in our technical platform. For example, MediaWiki is currently doing a great job at providing a localized interface in many languages. However, beyond language, interaction design and UX patterns seem very specific to Western culture. Similarly, when our strategic direction talks about building strong and diverse communities, this invites us to consider whether the current tools available to contributors enable them to provide an environment where newcomers can experiment, be mentored, and fail safely. Beyond software, little effort has been invested in exploring alternative interfaces beyond the connected browser. Our primary interface for contribution (the web site) may work well for middle-class contributors from Europe and North America, but isn't necessarily what enables people from other backgrounds or geographies from contributing. These are some of the topics I would like to bring up for discussion at the Developer Summit.
Keerthana S: Breaking the ice and catering to the could be Student Wikipedia contributors
Most of the valuable contributors especially in technically advanced articles comes from people in academia. So my paper is going to discuss why it can be valuable to expose University students about contributing to Wikipedia and give enough guidance for them to stick around, existing infrastructure that helps this cause and some points on how this can improve. As the infrastructure of mediawiki evolves and becomes a platform where beginners to Open Source find it easy to contribute to the project with a really well documented code base, a friendly community and the many outreach programs we should also think about introducing to the University Students about contributing to WIkipedia. Wikipedia serves as an invaluable tool for students worldwide helping them to assimilate their course content. They write term papers as a part of their course work so it only makes sense that giving an awareness to students about contributing wikipedia and giving them guidance can be a source of reliable and high quality contributions to Wikipedia. Existing Infrastructure WikiEduDashboard which is a project of the WikiEducation Foundation caters to universities where students are required to contribute to Wikipedia articles as part of their course assessment and provides tools for the instructors to guide the students in it. Machine Learning tools for Guidance There are existing automatic mechanisms in wikipedia to find out plagiarism/promotional content or any form of spam in the edits. Automatically rating wikipedia articles is to an extent achieved by the Scoring Platform Team. This is being utilised by many bots in wikipedia to spot potential vandalism. This score prediction tool can also be used to give some immediate feedback to the newbie editors in a more friendly manner and point out to the faux paus in their edits.
Other Position Statements
If you were not accepted to the Wikimedia Developer Summit but would like your position statement published please add you name to the table below and we will add your original position statement.