Wikimedia Search Platform/Why Search is Important

From mediawiki.org

Why does the Search Platform team even exist? Obviously it would be easier to use an external search engine to search on-wiki—which some wiki projects have decided to offer—and the results would probably be pretty good—certainly in English.

Privacy…[edit]

In keeping with our mission and our values, our default stance is staunchly pro-privacy, and any deviation from that default is carefully considered, with privacy always at the forefront. Data retention, data sharing, user profiling—all would be easier and more useful, and possibly even give better user experiences, if we held on to all our data forever. Instead we consciously throw almost everything away after 90 days; the rest is vigorously depersonalized and aggregated. No for-profit company is likely to give up such a valuable and exploitable resource.

…and, by extension, Safety[edit]

Casual American and European users of Wikipedia probably don't think about this often—because they don't have to—but not everyone who uses Wikipedia is safe in doing so. The best way to protect data from any entity that would exploit it is for the data not to exist. It can't be leaked, stolen, or forced into public if it doesn't exist.

Underserved Communities[edit]

We have often joked that the team at any search engine company that works on a language with a "mid-sized customer base"—for example, Turkish, with about 75 million speakers—is probably larger than our whole Search team by an order of magnitude. Thousands of engineers and billions of dollars dedicated to search can accomplish a whole lot.

On the other hand, it's never going to be particularly profitable for such companies to spend any time or effort on Mirandese, Breton, or Dagbani, because they can already reach most speakers of those languages in Portuguese, French, or English. Preserving and sharing knowledge and culture is not a priority.

Admittedly, we sometimes move slowly when working with smaller communities, and we can't regularly spend time on each of the 300+ languages we support. We do what we can, when we can, and our principle of considering "local impact" prioritizes big or easy fixes for smaller communities that would fall beneath any profit-based threshold for what is worth working on.

Our Principles[edit]

In addition to the principles discussed above—privacy, safety, sharing and preserving knowledge, and caring about all our communities regardless of profitability—there are other tenets we hold close.

Our commitment to open-source software is a natural extension of our goal of sharing knowledge. The work we do primarily serves the Wikimedia community, but the open-source nature of our work makes it available to anyone who uses MediaWiki or Elasticsearch. Sharing freely of our work, expertise, and resources also enables others—including teams within the Foundation and individuals outside it—to build tools, interfaces, and user experiences that both leverage and improve on the value of the knowledge that we have collectively built in Wikipedia, Wikidata, and other projects.

Despite the considerable resources the Foundation has, we hope our non-profit status confers a level of trust from the public (including our readers and editors), because they know that they are not our product. We don't hoard or exploit their data for profit, and while not everyone agrees with the Foundation's decisions 100% of the time, they can hopefully see that we have the same broad goals—reflected in our mission statement, values, projects, and longer-term plans—that they do. Keeping on-wiki search out of the hands of a for-profit corporation serves that trust.

For Readers, For Editors, For Projects and Teams[edit]

Popular search engines generally set the bar for the public's expectations of search—sometimes too high for us to meet completely. (Again, with the thousands of engineers and billions of dollars!) But if we are going to have search on-wiki, we need to try to keep up with the key features that users expect if we want on-wiki search to serve its purpose—outlined partially above—and be successful. We can't match them feature-for-feature, of course, but we can keep up with the basic functionality our communities—taken in the broadest sense—have come to expect.

We do have an ace in the hole: we provide wiki-specific features that the big search engine companies can't (or, more realistically, won't), because they aren't profitable or generalizable or feasible to scale to the whole internet! Regular expression searching and insource, deepcat, hastemplate, and articletopic and other keywords provide really useful and powerful features for readers and editors, as well as other teams in the Foundation working on search-related projects.

Discovering knowledge—and features and projects that support that knowledge—is made easier by features like cross-project results, cross-language results, and the as-yet-undeployed "explore similar" feature that can lead a reader to all sorts of related intra-wiki and inter-wiki content.


For readers and editors, and for the open-source ecosystem, our search is beneficial. For underserved communities, and for other teams within the Foundation, our search is unique. For our community's principles, for privacy and safety, our search is essential.

That's why search is important.