Wikimedia Technical Conference/2018/Session notes/Product Vision WMF/WMDE

https://phabricator.wikimedia.org/T206063

Some slides have speaker notes embedded, so this doc's notes are just a light transcription of any tangents, plus any Q&A.

Toby:
 * Welcome and thanks to organizers
 * We'll be digging into Themes after lunch. Please talk to the people in charge of each theme.
 * [Slides and presentation]
 * 1: Thank you Deb. I’m really happy to be here. It’s great to be part of the newly imagined Tech Conf and its positive and constructive that the Audiences organization is so intimately involved and we’re looking forward to a super productive week! It’s not every day you get to plan the future of the best site on the Internet and we should show continue to show the world the benefits of open collaboration and working together!
 * 2: I’m going to do three things in the next 30 minutes. The product team has been working hard on 3 - 5 year planning and beyond and and I’m going to talk about the process and the framework we’ve used. I’m going to talk about the themes that we’ve identified as we’ve worked through this process. And finally, the slides you’ve been waiting for, I’ll talk about the emergent platform requirements that are making themselves known throughout the process.
 * A quick word about the presentation -- it’s pretty dense. The process stuff is interesting but not essential to master. We really want to give you an idea of how we’re articulating our needs -- we’re not just throwing darts at a board. You’ll get a chance to dig in to the themes right after the presentation at the posters in the courtyard. The emergent requirements will of course be a big part of the Conference and we’ll be able to talk more about those during the week and in the planning period beyond.
 * 3: We call this entire process “product modernization” -- you’ll see why throughout the presentation.
 * 4: I want to share the process we are using to create our 3 - 5 year product strategy. You don’t need to know this by heart but it’s important that we talk about this for a couple of reasons. The first is that it’s important for everyone to understand the level of structure and rigor we are putting into our planning. This is really how it’s done -- strategy doesn’t just happen -- it’s the result of a lot of research and hard work. Second is that it helps connect the dots between our activities and how they line up to the big picture. Our board needs this. Our community needs this. The Foundation needs this.
 * It all begins with aspirations -- the change we want to see in the world. The next step is establishing values and principles which is the culture that we share and the philosophy behind the decisions we make. Then comes perspectives -- integrating the current state of the world with the values and aspirations. Once this is done, we start to plan, bringing in more structured and system analysis of impact, costs and timelines. After a prioritization process, we can put together a roadmap.
 * The levels of the pyramid also correspond to how quickly things change. The vision and mission of an institution should not change very often if ever, but perspectives, plans and definitely roadmaps need to able to respond to change in the world and new opportunities and threats.
 * 5: This slide shows where we are in the process and will help everyone understand what’s next. The base of the pyramid is movement strategy -- the results of the process we went through last year to establish a movement wide vision of the future. We’ve established our product principles which Josh Minor shared during an activities meeting a couple of months ago. The last few months we’ve been doing the heavy lifting of establishing our perspectives and we’re presenting them today as our themes. Later in the year, we’ll prioritize and in cooperation with the board, the executive team and the community we’ll establish the roadmap we can use for the next 3 - 5 years and beyond.
 * The rest of the deck will be a summary and discussion of these steps with an application of what they mean to our technical partners. How we’re going to get to a 3-5 year plan...
 * 6: It all begins with Movement strategy -- the output of the community process that Katherine led to decide our direction for the next 15 years. The goal is for the movement to be the essential infrastructure of free knowledge with two components -- knowledge equity -- to bring in those left out of the current structures of power and privilege and knowledge as a service -- to be a platform that serves knowledge to the world across interfaces and communities. All of the knowledge all in all of the places.
 * I don’t talk about movement strategy again in this document but it drives our 3 - 5 plans.
 * 7: I’m a numbers and models person so I have another way I like to think about movement strategy. In a very general, binary where it’s not actually binary way, we can think about the world as being divided into four quadrants (you thought you were getting out of here without a 2x2 matrix?)
 * Right now, the sum of our knowledge is trends mainly to male subjects  in rich countries.
 * 8: But by 2030, because literally billions of people are coming online in low and middle income countries, our relative knowledge will be much less. Much less.
 * 9: But if we act now, with intention and collaboration, we can be here or at least on the way.
 * For me, this is the root of knowledge equity and service. Our mission will never be complete, but if we don’t act now, we’re actually moving backwards.
 * 10: This is what we’re moving towards. All of the World’s Knowledge is available to Every Person on Earth,in perpetuity, and for Free.
 * 11: Now I’m going to talk about principles
 * 12: The next step up in the pyramid is our principles -- how we integrate the current state of the world with our core values. We have four:
 * Community-centric: Wikipedia is a place not a layer -- it’s about the community. We may be syndicating knowledge across the world but there always has to be a Wikipedia. Now if the community isn’t welcoming or willing to change this is a problem but it’s always going to be central to what we do.
 * Usable for all: Good UX is equity -- if you can’t use figure out how something works or if it’s not on a platform you have; you aren’t going to use it. Expectations and competition are much greater than 10 or even 5 years ago.
 * Intentionally transparent: Intentionally transparent: a core value -- we surface how knowledge is created, referenced and verified; another example is that we tell you what data we’re collecting and how we use it.
 * Extensible and sustainable: Built for extensibility: templates and categories have helped the community grow. Let’s not lose this. We also have to help people and machines use our tools and our knowledge. But they have to give back via attribution, education and branding. The whole system stops working if this doesn’t happen
 * These principles provide guideposts as we move up the planning pyramid and help us make consistent decisions across the department.
 * 13: I want to call out the amount that we’ve put into this process which was designed and led by Margeigh Novotny and Josh Minor. We’ve put together essays on 6 themes with dozens of contributors and hundreds of citations from communities, academics and industry. Many of you in this room has helped. Strategy is a lot of work especially at global scale.
 * 14: Perspectives -- this is area that we’ve been focusing on in the last quarter -- doing the research and the writing and thinking that will establish the actual products that we build in the next 3-5 years. We call the output of this section “themes” and I’ll talk about these in detail in a few slides.
 * 15: We got to our themes in three steps. First Margeigh talked to many people to establish a set of high level aspirational goals.
 * 16: Then she facilitated a reverse roadmapping session where we worked backwards to establish how we might meet these aspirations. From this work, 6 themes were identified and we set to work to investigate them.
 * 17: These are the themes (read them) -- I’ll go into these in the next section.
 * 18: Finishing up the pyramid, we’ll continue to expand and explore the themes during the rest of the quarter. We’ll need to have at least a roadmap for a year for the annual budgeting cycle in March so this date will drive the proposals and the planning. How we’re going to get to a 3-5 year plan...
 * 19: -
 * 20: The first theme that emerged from our roadmapping and other exercises was trust. This is a complicated concept in Wikipedia -- that a crowd-sourced model can produce verified accurate content is counterintuitive to almost everyone. But in the era of fake news, global platforms struggling with content moderation and security, and lack of respect for traditional norms of veracity, it’s more important than ever that people can trust the encyclopedia.
 * The questions that knowledge integrity asks of trust are also complex -- if some people don’t see their perspective or viewpoint represented in Wikipedia will they trust the projects in general? What if their knowledge does not conform to current notions of references and verifiability? How can we make the on-wiki processes more transparent both to new readers and new editors? And finally, in a era of government and platform censorship, how can we make our projects resilient and signal to readers that they should trust the knowledge they find in Wikipedia?
 * 21: The second theme that was identified was experience -- as in user experience -- per Jacob Nielson, this is the encompasses the end-users interaction with an institution, it’s products and services. For us, this primarily focuses on the product. Embedded in experience is rich content and new form factors, as well as how people find content on the internet and customize it to their own needs.
 * The biggest takeway here is that our sites are behind -- users have different expectations today than they did even 5 years ago. If we can’t meet these expectations -- for audio and video content, new form factors and the like, we risk losing our vitality as both readers and editors move to other sites. Benjamin Mako Hill, professor at UWashington and noted Wikipedia researcher has written that the mechanics of peer production have been co-opted by large corporations, and we now have to figure out new models of collaboration with new media.
 * This is a massive topic that touches many aspects of the stack. What are the technical needs for storing and streaming video? How do we integrate structured data? How can ontologies and folksonomies change to support better discovery? What is the role of social interactions in discovering content?
 * 22: Scale is another dense topic. Wikipedia is one of the largest sites on the Internet -- hundreds of thousands of editors, probably billions of readers. Hundreds, potentially thousands of community developers. And our strategic direction says we need to grow, to reach the billions of people who will be coming online in the next decade.
 * So how do we do this? How do we manage the new editors that we need to achieve knowledge equity and integrate them with our existing communities? How do we scale our development and product teams? How do we integrate community developed code?
 * There are also questions about resilience and ubiquity that speak to some of the issues we brought up in experience such as form factor. Syndication and re-use also come up here.
 * Finally, there’s the content itself. How do we identify and fill knowledge gaps, either by humans or machines or both. What is the relationship of content to reading and editing Wikipedia? How do we grow while still keeping our values around verifiability?
 * 23: Too much knowledge. Too many languages. Too many sources to verify. How can humans manage all of this. Machine learning is an obvious and very relevant technology today but wikis have been using bots to create and moderate content from almost the very beginning -- rambot created 34,000 articles from census data in 2002.
 * We’ve analyzed impacts of augmentation in 3 areas -- content creation, moderation/curation and governance. Machine translation and generated content at scale are here -- they will increasingly become more and more important parts of the content creation and consumption cycles.
 * Governance is also critical -- bias detection, surfacing of the impact of algorithms, continuing the community bot approval models. As systems like JADE show, Wikipedia can lead the way in establishing positive models of AI use on the Internet.
 * I just want to say that while some people may find all this threatening, it’s a huge opportunity to let the human editors focus on the things they do best: narrative, judgement and nuance, and freeing them from the menial tasks they might be doing today.
 * 24: Charlotte Gauthier, who led the culture theme, identified very early on the double meaning of the word -- it can refer to the internal cultures of communities -- how people treat each other -- as well as the culture you might find in a museum -- that comprises the knowledge they document.
 * These are interrelated of course -- a community with a strong culture of verifiability might find it difficult to incorporate knowledge that does not meet internal definitions of acceptable references. From this perspective it’s easy to see how culture is critical to achieving both knowledge equity and knowledge as a service.
 * Inclusion is a critical topic here. How can we work with communities to incorporate new points of view? How will communities change as they become more diverse? How will we fill content gaps? Finally, it’s important to call out that definitions of inclusion and diversity are different around the world and we need to make sure that we are culturally sensitive and contextual while also pursuing our goals.
 * 25: The final theme the emerged was tools. We say that the audience team creates tools for editors to make great content for readers and it’s good to finish up the themes with something that’s less high level and a bit more grounded in our everyday work.
 * We explore the needs of developers, contributors, moderators and organizers here. Issues include building on some of Bryan Davis’ work in thinking about how to build and support tools communities need and finding a balance between Foundation and community support.
 * We haven’t focused on building tools for moderators and administrators recently and this theme has caused us to reflect on how important these groups are to the knowledge creation process.
 * Community organizers are another group that does a lot for the projects and has received relatively little support from the product team. These people are doing a lot of the frontline work on editor recruitment, content diversity and so forth and it’s time we helped them in their efforts.
 * 26: So — lots of questions, but not many answers. What next with this? As I’ve said, we’re continuing to dig into the themes and prepare proposals for addressing these issues that will land later this quarter and into the next.
 * 27: I looked across our themes and identified specific technical directions that we’ll need to support them. Eight emerged and I’m going to finish up and talk about them now.
 * 28: Just to be clear, this isn’t a problem today. The uptime of our sites is awesome and we’d like to keep it that way! Initiatives such as the caching layer move to ATS, the work in failover are heartily appreciated. We’d like this work to continue and extend to active-active operation. Yahoo had this in 2008; we should be able to give our users this level of support 10 years later.
 * 29: We know that there is significant work here across both technology and audiences and it’s critical. We need to be able to employ modern product development principles and be able to iterate quickly and the deployment stack needs support this.
 * Embedded in this ask is also the need for more QA. Right now our ratios across the two departments are something like 20 to 1 which is frankly unacceptable for a top 10 Internet site. We will need to invest here in people as well as technology.
 * 30: like to think of content flexibility as “moving past the article”. To support syndication and radically new presentation mechanisms such as voice assistants, we need to be able to have structured machine readable knowledge (wikidata) available but also presentation information as well. We need to be able to combine and remix content from commons and wikidata across the other sites and support seamless editing on a much more granular level than is currently available. Multi content revisions is the first step here and to see it launched in commons last week was awesome!
 * 31: People expect audio and video on the Internet today and we need to move assertively in this direction. While we don’t know exactly what form video contribution will look like on Wikipedia, we know that if we don’t advance our offerings here, we will fall behind other sites like YouTube that are already the goto place for some types of knowledge for many people. We need to start planning for the bandwidth and storage needs of video, as well as making progress on the navigating the technology and patent landscapes
 * 32: This need was explained to me as Global gadgets and to be totally honest, it took about a year for me to truly understand the meaning! Right now, only communities with substantial technical skills can create tools that help contributors and moderators alike and this is a substantial barrier in growing both communities and quality content. The Foundation can help by porting individual tools but every community, small and large alike, will benefit from being easily able to adopt tools and workflows developed elsewhere. (I know this will be hard)
 * 33: The audiences team is taking this on this year, but I do want to call it out as something we need to get better at. Speaking again to the need to learn from our users and iterate quickly, data needs to be a first class platform attribute. Fundraising can AB test a dozen banners a day but right now we can at best do one every couple of weeks. This is a major gap that needs to be filled.
 * To be clear, this doesn’t mean tracking individual users or selling data. These are important values. But we do need to be able to track sessions and see the effectiveness of our contributor and reading funnels.
 * 34: This is probably the longest horizon request -- we need to be able to ensure that the content is tamper resistant and we have methods for users to circumvent censorship and blocking. New technologies such as blockchain and IPFS provide potential paths forward alongside more mature technologies such as Tor. The state of our world demands that we take on these problems.
 * 35: We have ORES, we’ll have JADE and we want more! We need to continue our investment in high quality platforms that make it easy both to integrate AI with our products but also make sure that our communities both existing and new can help correct bias and provide responsible uses of algorithms in every aspect of knowledge creation.
 * We’ll have to wrestle with problems such as how to surface algorithms usage across the site and research ways to make users aware of AI usage within their workflows. We’ll also have to explore the role of humans in knowledge production as machines start helping with many of the low level repetitive tasks that communities have traditional performed.
 * Our view is that while machines and automated translation will create more and more content, we will always needs humans to create meaning and narrative. We have a strong role to play in the evolving nature of the Internet.
 * 36: How we’re going to get to a 3-5 year plan...
 * 37: Thanks everybody!

Lydia:
 * [Slides and presentation]
 * We have many similar projects with shared traits, but some distinct and unique requirements for specific areas.
 * Want to talk about what the common traits are amongst the family
 * We need to decide where to stop trying to fit distinct products into the framework of collaborative encyclopedia software. How does knowledge as a service changes the DNA of our projects
 * We've been putting a lot of effort into making Wikibase more usable outside our projects. Firmly believe it can immensely useful outside our ecosystem.  Should be easy to install for third parties and be more integrated into all Wikimedia projects. It brings so many benefits to preventing duplicated efforts in knowledge curation.

Cindy:


 * [no slides]
 * MediaWiki as core platform for Wikimedia and as third-party software
 * Why do we support third party use?
 * One of our key principles is FLOSS. Many 3rd parties have used MediaWiki for a wide variety of innovative reasons.
 * Opens new paths through the software, surfaces issues
 * Brings in additional people who find bugs and contribute fixes and new ideas.
 * Creating entirely new features, from visualizing data, authenticting enterprise systems, determining who is watchlisting pages, and so many more.
 * I’m not happy with the “third party” terminology, trying to come up with a better term.  (“Non-Wikimedia MediaWiki users”?)
 * There are opportunities for us in the foundation to make better use of wikis and to support the use cases of enterprise extensions.  Plenty of knowledge creation functionality is not used in our projects but still deserves our support.
 * E.g. SNPedia: collects snippets of genetic info,
 * E.g. DIY wikis for makers
 * E.g. Gamepedia farm (22m monthly unique users, 315m monthly pageviews)
 * E.g. government uses, from City of Vienna to NASA (who use a number of internal wikis to support the ISS)
 * E.g. General wikifarms
 * E.g. Semantic MediaWiki
 * E.g. Hallo Welt! Building the BlueSpice distribution