Talk:Wikimedia Discovery/FAQ


Be clearer on data sources[edit]

The first mention of "data sources" is

and incorporating new data sources for our projects

But that just links to a map, which seems to be a different way to display search results. Please give actual potential data sources instead of an unclear link, thanks. -- SPage (WMF) (talk) 19:47, 9 November 2015 (UTC)[reply]

The section below calls out that its supported by external OSM data. That data corpus includes items (buses, trains, etc) that are outside of what our elastic indices include. RU WikiVoyage and soon EN WIkiVoyage will default to our tiles and are already starting to surface transit, points of interest, and articles for discovery of new content. As for other data I cite 'census, national gallery, etc' in but that's really up for a community discussion of what data sources can help in the same way that OSM did Tfinc (talk) 19:57, 10 November 2015 (UTC)[reply]
If Fox News or TeleSUR have the slightest chance of appearing as data sources of this searching project, I will campaign to stop it. --NaBUru38 (talk) 14:27, 15 February 2016 (UTC)[reply]
I'm right there with you NaBUru38. There are data sources, like OpenStreetMap, census data, and other bits of useful open data that we could pull into search results. These would provide a more rich search experience - on wiki - that what we currently have. Of course, we want to engage early and frequently with the community to determine what would be acceptable to include. I created a task (phab:T126980) to track this concern. I encourage you to reference it as we move forward.
To be honest, we probably won't get around to this for a while. We have more immediate improvements in this quarter and are working on our longer-term plans. Where something at this level most likely resides! CKoerner (WMF) (talk) 15:38, 15 February 2016 (UTC)[reply]

OpenStreetMap is not incorporated into search (e.g. in elastic) and is separate. Of course, search results could be displayed on a map with OSM tiles. Parts of OSM data could also be used as an overlay for Wikivoyage, but don't think it should being mentioned as part of the grant and "search engine" in this way. Aude (talk) 19:36, 17 February 2016 (UTC)[reply]


This page has been up for a week but, as of the time of writing this literally nothing links to this page, it's an orphan. That's kind of ironic given that it's a strategy document for the 'discovery' team :-) Is there a plan for when this 'not-a-knowledge-engine' strategy will be announced widely? Wittylama (talk) 23:26, 9 November 2015 (UTC)[reply]

Thanks for the feedback. I've linked it from so it's no longer an orphan. Next we'll be adding a set of wiki pages to compliment the discussions that have been happening on phabricator, email lists, and on wiki to bring it all together Tfinc (talk) 21:13, 10 November 2015 (UTC)[reply]
Also would love your feedback on the Discovery Roadmap linked in the FAQ. Tfinc (talk) 21:17, 10 November 2015 (UTC)[reply]

"Are you building Google?"[edit]

This FAQ included the question "Are you building a search engine?" But after a complex edit history from a few days before the November 2015 Board of Trustees meeting, that is not at all the question that is addressed; specifically, the answer begins by stating "We are not building Google," and then includes a couple sentences I basically don't understand. This does not even come close to addressing the important question "Are you building a search engine," and leaving the section title intact is IMO highly deceptive to the casual reader. I'm not qualified or positioned to improve the answer (though I think that should be done). But I do think it's important that the question reflect what is actually said, which is why I have (for the second time) changed the question title to "Are you building Google?" Pinging Nocturnalnow who reverted this the first time. Happy to hear your thoughts, but I hope you can at least agree that this question is not addressed in this section? -Pete F (talk) 03:57, 14 February 2016 (UTC)[reply]

"There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors. - Jeff Atwood"
Discovery is not building a competitor to Google in the sense that Google searches everything it can find. We are trying to improve search across Wikimedia projects to provide better results. That's it. Imagine being able to search for "leaning tower in Italy" and not only see the Wikipedia article in your language, but photos from commons, a map of it's location, and information on it's physical properties from Wikidata.
I'm new to all of this. Let me ask around and see if we can get some clarification. CKoerner (WMF) (talk) 18:11, 15 February 2016 (UTC)[reply]
Pete F As you may be aware a blog post went up yesterday. Lila's also hosting an AMA-style (ask me anything) on Meta. I believe Tomasz is looking into an interview as well with the Signpost. I don't know if any of that helps with clarity, but I wanted to highlight the efforts to bring some clarity to things. I'm updating the FAQ with some of the questions that have been asked on this talk page and others. Again, more than happy to help where I can. CKoerner (WMF) (talk) 19:42, 17 February 2016 (UTC)[reply]

More questions[edit]

The answer to the first question is very unhelpful. Whatever you are calling it, please actually describe it. That is what all the questions below are about. In what follows, please replace "Knowledge Engine" with whatever it is you are calling it. The name is not what is important. What you are working toward, is important.

From what I have been able to piece together, the Knowledge Engine is a) a bunch of data, contained in or linked to the Wikidata database; b) an interface to receive a query; c) algorithms that create and display an answer to the query based on the data, formatted sort of like a WP article.

I have no idea how KE search results are envisioned to relate to existing WP content; if the notion here is just to archive existing content, or somehow fragment it and import it all into Wikidata, or if existing content will somehow remain in existence and available to the public. I have no idea if the WMF intends to put any further energy into making existing WP content more available to the public in the form in which it currently exists. Please address all that.

I don't understand what role the existing editing community is intended to play in all this.

If I am way off track, would you please describe in some detail what the Knowledge Engine actually is imagined to be, and what it will do, and how it will relate to the Wikipedia that exists today -- concretely, so it is understandable to the average reader of WP? (without technobabble) Three key questions there.

More concretely

1) Please rewrite the question "Would users go to Wikipedia if it were an open channel beyond an encyclopedia?" in plain English.

The "Wikipedia" we all know is an encyclopedia full of articles created and maintained by people.

I don't know what an "open channel" means.

I don't know what "an open channel beyond an encyclopedia" means. The question implies that the "encyclopedia" won't exist anymore - instead, the "open channel" would exist. Is that what is envisioned? If so, what happens to the Wikipedia content that exists today?

2) What would a Knowledge Engine search result look like? Are this and this prototypes of what you have in mind?

3) How exactly does Wikidata fit into the Knowledge Engine vision? Is the vision here that results like those above will be created on the fly by the algorithms based on Wikidata when someone makes a query to the Knowledge Engine? And that WMF will aggregate a bunch of other source data into Wikidata, or link to, or whatever?

4) Will there be any content curated by editors as there is today, or will the editing community become curators of Wikidata?

Please incorporate answers into the FAQ.

Thanks Jytdog (talk) 01:22, 17 February 2016 (UTC)[reply]

Good feedback Jytdog, thank you for taking the time to share. I updated the FAQ a little. Please have a look and let me know if there's something that is still not clear. CKoerner (WMF) (talk) 22:41, 17 February 2016 (UTC)[reply]
CKoerner (WMF) Thanks for your reply!
Let me first say that if you are not aware of it (and I can't believe you are not :) ) your management's behavior has generated a lot of distrust. So I am looking for clear answers that make sense in light of other information that is out there, primarily the actual Knight grant agreement.
Second, I have worked a lot with grant agreements in academia, and I understand how they work, and I know that you cannot change the scope unilaterally. Would you please let me know either way, if WMF and Knight have amended the agreement to change the scope? I am assuming the scope has not changed....
Third, I understand you are not using "Knowledge Engine". What do we call this?
OK, specific responses.
a) A bunch of the changes made today don't address my questions which were "where is this going - what is the vision" questions. Many of the answers are down in the weeds of what is happening in this phase of discovery. If that is mostly what you are going to talk about here, that is fine, but please be explicit about that so I can ask somewhere else, where I can get the answers I am looking for. Please don't waste my time. What I am looking for is a simple statement of the big picture, like this: "the whatever-you-are-calling-it system is envisioned to be a) a bunch of data in the WMF domains and outside it that are linked to the Wikidata database; b) an interface to receive a query at; c) algorithms that create and display an answer to the query based on the data, formatted sort of like a WP article sort of like this." Something high level and understandable. Can you do that?
b) You simply removed reference to the "Would users go to Wikipedia if it were an open channel beyond an encyclopedia?" question, which is part of what you are meant to be exploring in this phase of the grant and probably the most alarming aspect for me. I don't see that you re-wrote this and moved it elsewhere. Why you have removed reference to this? What would be really helpful would be an FAQ like "What is 'open channel beyond an encyclopedia'?" with a clear explanation.
c) I understand that the current WMF line is that the KE is meant to search "Wikimedia projects." This contradicts what WMF said in the grant application, which is no where limited to WMF domains but instead talks about "the internet". Again unless you have amended the agreement you have to be dealing with "the internet." Now my sense is that you want to link other freely-available data sources on the internet to Wikidata or have the "KE" directly query them too. But please give an answer here that deals with "the internet" in reasonably plain English.
d) WMF is putting out what are to me misleading statements saying "What are we not doing? We’re not building a global crawler search engine." and here you have an FAQ that says "Are you building a search engine to replace Google?" Nobody thinks you are doing either thing, and it is a bit frustrating. (I find the former especially... bad as I have never seen anyone imply you are building a "crawler") But to your FAQ... I don't think that anybody thinks that WMF ever intends to do all the things that Google does nor even all the things a Google search can produce (e.g. show me relevant flights or movie times). But the grant makes it very clear that the WMF finds great fault with "commercial search engines" and is proposing to do something much better - more transparent, more privacy, and not driven by money.
I think you are intending to provide "better" answers for certain kinds of queries than commercial search engines provide and the WMF wants people to come to (our main search page) to do them. And as is noted explicitly in the revised FAQ, there is also a desire to keep users who already within WMF domains, within them, instead of losing them when they can't find stuff and having them go off and doing a Google/Bing etc search. Wanting people to come to instead of Google/Bing/etc for certain queries, and not losing users who already here to Google/Bing/etc, is competing with them. It just is.
Would you please revise the FAQ to deal with the heart of the matter? The current FAQ is really a distraction and doesn't ask nor answer the real question. A question like: "Why would I search with the envisioned search engine instead of Google's or Bing's" or "How would the results of a search through our engine be different than a search in Google or Bing or other search engines?" might be good.
e) You don't say anything about what a "query result" is envisioned to look like. Please do. (this is really important to me, at least) Without understanding what a search result is envisioned to look like, and whether it would lead to actual WP articles or if it will lead to a machine-generated "knowledge graph" or mini-article, I cannot make sense of this whole thing. I really can't, and this is a very key issue about what people will find when they come to "Wikipedia" Will they find an encyclopedia, or will they find a "channel beyond" it? Please do clarify.
f) This is a funny edit note. But see above. And please note that I have seen this video which looks an awful lot like content created by a "robot" (to go with your funny term) in response to a query, and I am aware of Approach 6 discussed here. This really is a big vision thing - is the WMF walking away from having search point to articles and making article content more available the public? Where are they taking us? I really (really!) do see the value in having search work better and many other benefits to what I think the KE could do.... I just don't see the vision of how that fits with the WP-that-exists. That is what I am looking for. And I understand that you might not be the one to articulate that. But someone needs to. Please point me to them if it is not you. Thanks.
And thanks for your patience. Jytdog (talk) 03:20, 18 February 2016 (UTC)[reply]

Structure of the page[edit]

Merging in the Knight grant stuff makes sense, but probably not at the very top of the page. I would propose having a few top-level sections, like Knowledge Engine, Knight Grant, and Discovery work. And then all the existing questions could fit within one of the top-level buckets. While the term "knowledge engine" is interesting from a historical perspective, and the Knight grant is interesting from a transparency perspective, I suspect a lot of people just want to know what the Discovery department has been doing, is doing now, and plans to do in the future. --KSmith (WMF) (talk) 00:04, 18 February 2016 (UTC)[reply]

Yes, sorry for any confusion there KSmith (WMF). I saw we had two FAQs with related topics and wanted to bring clarity to things. If you want to take on the restructure, please do. CKoerner (WMF) (talk) 16:17, 18 February 2016 (UTC)[reply]
What's appropriate here? These are conversations and I'd hate to move things around and upset anyone. CKoerner (WMF) (talk) 18:13, 18 February 2016 (UTC)[reply]
Nevermind, I've been looking at too many Talk pages today and got confused. CKoerner (WMF) (talk) 18:14, 18 February 2016 (UTC)[reply]
I created the basic structure I envisioned. I'm pretty sure it could be improved. --KSmith (WMF) (talk) 21:00, 18 February 2016 (UTC)[reply]

Are you really trying to improve plain old WP search??[edit]

Is part of what you are trying to improve is searches through this? If so I am wildly happy as that search engine is awful. I waste so much time looking for stuff -- especially trying to find things in old Talk page discussions or archived discussions on notice boards. I waste so. much. time. with that search engine. Just in en-wiki, which is where the information I want is. If that is what you were really doing (outside of the Knight Foundation grant stuff) I would be very happy. Please do tell if that is part of what you are fixing. Thanks! Jytdog (talk) 03:24, 18 February 2016 (UTC)[reply]

Yes. I've been a volunteer for about 5 years now and a visitor even longer. I too keep going crazy with the current search. It's getting better. We're a relatively new team mind you, and are already making small, progressive improvements. We have a beta feature now for the Completion Suggester which is a small step in the right direction. That should be going out to everyone in the near future (by the end of March, communication and feedback withstanding). We also have a list of goals for this current quarter if you want to keep up. There's a lot to improve. We're tracking our thoughts and progress in Phabricator. If you see something you're interested in knowing more about, let us know. CKoerner (WMF) (talk) 18:49, 18 February 2016 (UTC)[reply]
This would be amazing. A lot of what I have heard has been about making all the various WM content and WP content more accessible when someone searches. Like I said I get especially frustrated trying to find specific diffs as well as old discussions. I am sorry to ask this, but is your team aware that if you search (for example) the ANI archives, you get results like this? There is no discernable time-order, and no way to refine the search - not even old school boolean works. I waste so much time going from those results, to what I am looking for, and I have always wondered "How can this be?" Jytdog (talk) 19:51, 20 February 2016 (UTC)[reply]
It will be amazing! :) There's some smart folks working on it. Check out the beta feature if you haven't already. It's a small, but apparent improvement. I've reached out to the Discovery team to ask how/if some of our near-term goals will impact the types of searches you are asking about. I'll let you know what I find out. CKoerner (WMF) (talk) 21:27, 23 February 2016 (UTC)[reply]
that would be great. Jytdog (talk) 02:41, 24 February 2016 (UTC)[reply]

The "robots" question[edit]

The following FAQ is framed in a pretty disrespectful way, and it is not at all clear to me, that the response here speaks for the WMF board and the ED. I am parking it here for now.

Is the Wikimedia Foundation looking into replacing editors with robots?

No. We think technologies like machine learning and similar tools can help with aggregation of all the rich content humans have created across our projects. Like the work our colleagues have done with ORES in improving the quality of article content.

At no part are we trying to replace or subvert the work of our human editors. We want to figure out smarter ways to return search results that answer visitors questions - even when those searches currently result in zero results. Imagine in the future searching for something we don't have an article for in a particular language Wikipedia - but we do have books in Wikisource, or quotes in Wikiquotes, or photos in Commons. Wouldn't it be great to have a link to those items in search results instead of nothing?

-Jytdog (talk) 15:42, 20 February 2016 (UTC)[reply]

I wrote that answer. You singled out the word "robots" in quotes. Do you feel that is a disrespectful description? I thought about using the word "bot" as in "Internet bot", but I thought for clarity I'd be explicit and spell out the (more common) word. We got this question, or variants along the lines of 'automatically written articles by a bot' quite a bit in other venues (mailing lists, other talk pages). I thought adding it to the FAQ would answer the common question and provide clarity about how we are currently looking at utilizing bots. Sorry for any confusion. If I've missed your concern, could you be more specific with what you feel is disrespectful? CKoerner (WMF) (talk) 20:47, 23 February 2016 (UTC)[reply]
Thanks for replying. Nobody has had robots in mind that I have ever heard of. I and others have been concerned about the WMF's strategy with regard to the relationship between user-generated and computer-generated (or bot-generated, or "automated generation of") content.
In any case, is there a long term plan to start including bot-generated content more systematically in any/all of the projects, or to use such content more? If you don't know, please say that. Thanks. Jytdog (talk) 02:32, 24 February 2016 (UTC)[reply]
btw, two of the most useful things I have seen written about this question, and in this context, were two posts by Jheald at jimmy's talk page - this and this. Those were real efforts to communicate clearly, i learned a lot, and I remain very grateful for them. I know they are too long for use here, but they recognize the concern and speak to it (even with the jabs :) ) so I wanted you to see them. Jytdog (talk) 03:33, 24 February 2016 (UTC)[reply]
Thanks for the head's up. Those are pretty good summaries of the work that's going on with regard to automation or bots. I am not aware of any long-term plans beyond what's in our current quarter goals and the Discovery Year 0-1-2 presentation. As mentioned on Jimbo's talk page, work in these areas are at a very early stage of initial development. CKoerner (WMF) (talk) 15:02, 24 February 2016 (UTC)[reply]
Thank you for linking to that presentation. I was not aware of it (I know, I know it has been sitting there for a long time. There are users for you. Surprisingly missing things.). I am taking "I am not aware" as meaning that you don't know if there are long-term plans nor what they might be. I don't know if you are in a position to know. I really don't. (If you are in a position to know, then your saying "I am not aware of X" means that X doesn't exist. If you are not in a position to know, then you are just saying "I don't know"). Chris, I want to point you to the following in the Knight grant - on the first page.
It says that the goal is "To advance new models for finding information by supporting stage one development of the Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet."
It says: Over the next six months, the Wikimedia Foundation will:
• Answer key questions:
•• Would users go to Wikipedia if it were an open channel beyond an encyclopedia?"
And it says starting on that same page:
Knowledge Engine by Wikipedia will create a model for surfacing high quality, public information on the Internet. The project will pave the way for non-commercial information to be found and utilized by Internet users. The discovery stage will lay the foundation for the project. During this period, the team will establish core usage and performance metrics that will help determine what is built."
Please stop now, and really stop. Imagine you are me and you are on the outside, and are trying to figure out what has actually been going. Now read the goal, and read the first "key question", and read the Outcome. The only thing that makes any sense about what was envisioned (and for all we know, is still envisioned) - and especially in light of the takeover of the page, and the things that Denny has said, and the stuff that has been said about extending Discovery to include other sources of cc-licensed content (and not things like the Stanford Encyclopedia) , and everything that has been said about Discovery currently "just exploring" - is that the WMF was (or still is) planning dramatic changes to this place -to what people would actually encounter when they come to There is no mention of how existing, user-generated content fits into that vision. And really importantly, do you see how the Discovery Year 0-1-2 presentation fits exactly into that notion that the WMF is planning big changes to what people who come to experience?
Can you see that? I am not asking you to confirm or deny it or agree to it - I am just asking if you can see what i am seeing. Can you?
And please also note that all of this makes it clear that the work of Discovery in exploring things, is very much Stage one of a larger plan. So everybody insisting that Discovery is "just exploring" now is just ... exasperating. We know that. It is not the point. It is the larger plan that everyone is concerned about. Can you see that, too?
And finally - and please do answer this - is the Discovery Year 0-1-2 presentation still the operative strategy for your group? This is something you should know, so I am looking for a clear answer on this.
Please do respond to all three questions. Thanks! Jytdog (talk) 18:02, 24 February 2016 (UTC)[reply]
Oh also, what is the "Licensing" referring to, on slide 7 of the 3 year plan? Thanks. Jytdog (talk) 22:39, 24 February 2016 (UTC)[reply]
  • quick note, see the last bullet in the "For What" section here. I know this is only a barnstorming, strategy-thinking meeting. I know that. It is just this thing I have been saying is apparently not completely la-la land. I get the problem that the Product team is trying to solve there - how to keep what the WMF does, relevant. And I see how doing cool stuff via the portal is way more nimble/scalable/etc and... tractable than dealing with this crazy nuthouse of user-generated content. (and again i know that is only brainstorming I see no "smoking gun" about commitments made there) Jytdog (talk) 02:40, 25 February 2016 (UTC)[reply]

@Jytdog: I'll start by saying that I'm finding it difficult to respond to this post. It seems you're quite frustrated by the situation here, and your post seems to be a kind of stream-of-consciousness post, which is totally fine, but it does make it harder to have a conversation. Please bear that in mind. :-) So, I'm going to try to respond by pointing a few things out that might be helpful.

  1. We (as in, Discovery Department) don't have all the answers. Many of the questions you're asking don't have answers right now. That's normal! We'll figure out answers as the questions get closer to undertaking work that needs them answering.
  2. Roadmaps tend to get fuzzier and less certain the further out you get. Your questions about the third year probably don't have answers either yet, because the work is years out and we're focussing on what we're doing right now. The roadmap is not a immutable commitment to do the work, it is a description of the thinking as of the time it was written.

I hope this helps. Perhaps it doesn't. But, I am trying, at least. :-) --Dan Garry, Wikimedia Foundation (talk) 01:31, 27 February 2016 (UTC)[reply]

Deskana I get that and I am sorry. I keep trying to provide context, so my there is some frame for my questions; I realize that it is too much. Please let me ask a really concrete question. The three-year plan deck. Was that actually presented by a person, who talked through it? If so, where? Thanks. Jytdog (talk) 03:49, 27 February 2016 (UTC)[reply]
@Jytdog: That I know of, it was presented once, by Wes Moran, to the Community Department at the Wikimedia Foundation. I don't know of any other presentations performed using that slide deck, but since it's openly accessible and downloadable by anyone, it may well have been presented by others in different venues. --Dan Garry, Wikimedia Foundation (talk) 19:31, 28 February 2016 (UTC)[reply]
Thanks for answering. It would have been really interesting to hear the story that went with that deck, and to understand the larger arc of which this is a part. Thanks again. Jytdog (talk) 20:31, 28 February 2016 (UTC)[reply]
If anybody else is aware of other places where that deck was actually presented, it would be good to hear, btw. And it would be great if anybody who actually heard it presented, could summarize the narrative arc that went with it. Jytdog (talk) 18:33, 29 February 2016 (UTC)[reply]

Edits today[edit]

I was BOLD and made some edits to this today, to try to help the Discovery team address the concerns in the community. I am sorry if this was offensive, and if I wrote anything wrong. If I did write anything wrong, please correct it. Happy to discuss. Jytdog (talk) 18:05, 20 February 2016 (UTC)[reply]

Thank you for the edits. I made a few tweaks as well. I hope this better addresses some of your concerns. CKoerner (WMF) (talk) 21:16, 23 February 2016 (UTC)[reply]

What do you call the search function?[edit]

I notice this page calls it a "search mechanism" but that is not a common term. Is it a search engine? Are there bots that crawl WP and the other projects and index them? Thanks. Jytdog (talk) 02:49, 24 February 2016 (UTC)[reply]

so it is Extension:CirrusSearch which uses en:Elasticsearch, which is a search engine for en:enterprise search. and you are looking at improving that search engine, first for intra-WM stuff and then to other open sources of information. OK. Jytdog (talk) 02:05, 25 February 2016 (UTC)[reply]
(edit conflict) @Jytdog: According to the definition given in w:Search engine (computing), yes, the Discovery Department is working on a search engine. That's nothing new, however; the CirrusSearch extension has existed for years, and search functionality in MediaWiki has been worked on many years before that. Search is also only part of our efforts to improve content discovery on Wikimedia projects; we're also working on maps as a content discovery mechanism, for example. --Dan Garry, Wikimedia Foundation (talk) 02:12, 25 February 2016 (UTC)[reply]
yes that was very helpful, thanks. Jytdog (talk) 02:42, 25 February 2016 (UTC)[reply]

Staff notes[edit]

this is great.

Two snippets, with comments...

  • "Moiz: Follow-up: The initial grant was just a small step toward a lot more funding. Who will be funding that in the future, given the bad press?"


"Lila: We never counted on Knight to cover team expenses, only to supplement. Staffing comes out of the common bucket. We budget the work first, and then apply any grant moneys where they fit. Annual fundraising goal includes grants. So this won’t change how we fund the team. Just potentially the total amount of money available."

About that, and relevant to what I was saying above. You are happy to talk about near term goals. You are happy to say "not Google". What is the strategy? What are the projected budgets for it? The persistent not-saying it, is the problem that is breeding all kinds of very bad feelings. And

  • "...We have been trying to work with CL to get the message out. Two office hours have not attracted anyone from communities." Office hours? Where are they publicized? Jytdog (talk) 22:18, 24 February 2016 (UTC)[reply]
    • While it's possible things have changed, my experience with office hours has been very high noise-to-signal ratio. If office hours have become a standard approach for engaging community, I don't find it surprising to find that nobody is showing up. I was glad to see some discussion about whether other approaches might yield better results -- I hope that discussion progresses. -Pete F (talk) 18:17, 25 February 2016 (UTC)[reply]
Sorry I am completely ignorant here. I reckon "office hours" means somebody available for a live chat at some designated time and "place". Would someone please say where is that site, and how are they publicized? Sorry to be ignorant. Jytdog (talk) 18:38, 25 February 2016 (UTC)[reply]
@Jytdog: You're not being ignorant by asking questions, don't worry! IRC office hours are publicised on meta:IRC office hours, and sometimes also village pumps and discussion forums on-wiki. I'm not really involved in their organisation, but if you have suggestions on how they could be improved, I'd suggest leaving those suggestions on meta:Talk:IRC office hours. --Dan Garry, Wikimedia Foundation (talk) 00:36, 27 February 2016 (UTC)[reply]
Thanks. OK god they are on IRC. oy. Jytdog (talk) 03:51, 27 February 2016 (UTC)[reply]

FAQ: If you're adding new data sources, isn't that a search engine?[edit]

It is unclear to me how this is helpful, so moving it here for discussion. Seems mostly redundant.

If you're adding new data sources, isn't that a search engine? ==

If you define "search engine" as including a web crawler that indexes the whole web, which is the most common definition, no.

We do have a goal of improving the search function to make it work better in each project, and across projects, and yes we want to expand what it reaches to other sources of high-quality, public data.

The goal is to expand the amount of knowledge presented in search results and expand the context beyond just textual search. We want to begin by showcasing content from other wiki projects including appropriate languages based on query input.

The data could be used to potentially evolve and improve the quality of our existing search experience.

Our first new data source outside of Wikimedia projects is OpenStreetMap data for Maps which our [[<tvar|wv-maps>wikivoyage:Wikivoyage:Travellers%27_pub#Announcing_the_launch_of_Maps</>|Wikivoyage community]] is already starting to experiment with. There are other data sets that we could potentially surface (census, national gallery, etc) but that will be up to our communities to decide. Some of these could certainly show up in search results and we have Phabricator tasks around improving GeoData content [[<tvar|phab-ticket>phab:T112026</>|T112026]].

-Jytdog (talk) 02:36, 25 February 2016 (UTC)[reply]


This page appears to no longer be active. Should it be deleted? It includes for example a link to Lila's FAQ which turned out to provide no information and was abandoned. The discussions above have not been responded to. Delete this, perhaps? Jytdog (talk) 21:03, 8 March 2016 (UTC)[reply]

If I understand you correctly you're asking about both the FAQ itself and the ensuing conversations here on the talk page. Is that right? As for the FAQ, it is still relevant to the work the Discovery team is pursuing and has value for new folks not as familiar with our work. Compared to other teams at the foundation like Editing and Reading, we're still relatively young (and have a weird name that isn't as apparent at first blush (like Editing) on what we do). As for the conversation, is there something specific you want to know that hasn't been answered yet? CKoerner (WMF) (talk) 17:24, 10 March 2016 (UTC)[reply]
Thanks for replying CKoerner (WMF).
  • I have asked three times here what "an open channel beyond an encyclopedia" meant and/or means. And in response to the first time I asked the question, you a) removed reference to it from the FAQ, and b) didn't answer, as far as I can see. This is meant to be something WMF is working towards with the money from the Knight Fdn - it is the first-listed "key question" that WMF is answering. So that is my first question. I'll add an additional question now - what if anything is WMF actually doing to answer the question, "Would users go to Wikipedia if it were an open channel beyond an encyclopedia?" If whatever this was, has been abandoned, please say so. But then I would still like to understand what it meant. Thanks.
  • it would be great to include something about your collaboration with Wikidata on federated search with SPARQL queries as discussed by JHeald here. That sounds like something to brag about (honestly!) to me.
  • What ways is Discovery exploring presenting search results? JHeald also talked about Article Placeholders and Resonator etc [here. He described that as only a Wikidata project - has Discovery collaborated on that too or is that outside your remit? If anything Discovery is doing is part of that, it would be great to describe it here. If not, you can deal with this by pointing to the stuff that Wikidata is doing. This circles back to the in-my-view-unhelpfully-framed "robot" FAQ discussed above.
  • what is the "Licensing" referring to, on slide 7 of the 3 year plan?
btw, if that 3 year plan is still what you are running under, it would be great to include a link to it in the FAQ. If it is not what Discovery is running under, would you please say so?
OK, I that that is enough for now! Jytdog (talk) 02:23, 14 March 2016 (UTC)[reply]
CKoerner (WMF).? Jytdog (talk) 03:01, 16 March 2016 (UTC)[reply]
Jytdog, I haven't got a lot of time this week, but let me give you a straight answer to your question about "an open channel beyond an encyclopedia" so that you will quit worrying about it. I understand how you were confused; it's a bunch of unnecessary computer jargon (that's probably why they removed it), but it's really not a complicated idea.
Special:Search is currently a "closed channel". This means that, if you are searching for something at the Haitian Creole Wikipedia, then you will get results for what's on the Haitian Creole Wikipedia, and nothing else. This is less than ideal for many users, because there are 100 times as many articles at the English Wikipedia alone, and most people who read Haitian Creole can also read French, English, and/or Spanish.[1]
An "open channel" is one that searches (or can search, if you set your prefs to do so) beyond the local site. "Beyond an encyclopedia" assumes that you're starting at a Wikipedia (we can ding the author for being Wikipedia-centric later), but the principle actually applies to all WMF sites. If Special:Search were an "open channel", it could give you results from the sister projects (it'd be very handy to find results at Wikisource if you're looking for information about an old book, no?) or even possibly some hand-selected non-WMF sources (e.g., OpenStreetMaps). There are still many questions to be answered about exactly how to implement this,[2] but that's the general idea.
The question that Discovery posed to the Knight Foundation is this: Millions of users are accustomed to searching the English (or French or German or whatever) Wikipedia and getting only results for pages at that Wikipedia. If that behavior changes, would users actually be pleased? Or would they be unhappy that their search results were "polluted" with all of this other stuff, and thus avoid using Special:Search? (Most reasonably savvy internet users can figure out how to type Shakespeare into their favorite web search engine if they don't like the results they're getting from Special:Search.)
If you are still confused by this, then let's talk about it at home next week, when I hope to be a little less busy. Faidon just updated the master schedule for that major data center migration, and I've got dozens of announcements to make right away. Whatamidoing (WMF) (talk) 17:56, 16 March 2016 (UTC)[reply]
  1. In fact, it is so obviously less than ideal that some small Wikipedias have set up the "Wikipedia:" namespace as a redirect to the English Wikipedia. If your search fails locally, you repeat it with "Wikipedia:" at the front, and end up at the English Wikipedia article by the same name.
  2. For example, if you search for "Shakespeare" at the Haitian Creole Wikipedia (which has no article under that title, although there is a two-sentence stub at w:ht:William Shakespeare), should it give you only Haitian Creole results (there are seven pages containing that name), pages from the English and French Wikipedias, and/or pages from Wikisource? Another question: What is the minimum acceptable level of free license? Would we be willing to include search results from a respectable open-source journal that was CC-BY-NC? Would we include search results from an image repository that's hosted in the US, public domain in the US, but not public domain everywhere in the world?
Thanks for that, WAID. That was indeed a very helpful translation into plain English, and a great explanation of what Discovery is doing now. The binary open/closed is very helpful. Open = "not closed". Thank you. This will probably frustrate you (and I am sorry about that) but this still leaves the question of how open was this ever considered? Right? There's "open" beyond whatever domain you are in, but still within WM domains. There is "open" including other open-source data sets. And then there is "wide open" as in the whole internet. Damon was apparently shopping the latter, which never got very far and is not the issue. Lila (and... some others apparently) appear to have been shopping the middle one at some very serious scale and ambition. Discovery is doing the first-named part of that now and has some tentative feelers into the middle. Additionally, the middle one, which it seems was described in the Knight Foundation grant application, really focused on becoming a go-to search portal for knowledge... so that was not about starting within Wikipedia. Much of what is really difficult in this whole thing, is that people (like you) are providing answers to questions about the KE project that deal only with what Discovery is doing now, and nobody is telling the story of how these thing evolved so that we can make sense out of what happened over the last year and see how what Discovery is doing now fits into that. Let me be frank - -the only reason I am asking these questions has nothing to do with what Discovery is doing now per se and those answers are really just kind of frustrating, as well-intentioned as they are. It has to do with where the work was going under Lila and where for all we know, it still might be going in the longer-term planning. (I don't know how much of her plans remain in place...) That stuff matters to me, because it is was what James claims was at the center of his dismissal. And it is starting to matter even more, because no one seems willing or able to actually tell the story - the more people don't answer, the uglier it looks and the more correct James' claims appears to be in the whole he said/she said of all this. ( I am aware that this may sound like the way conspiracy theorists think; but I am not prone to that sort of stuff, as I hope you know) I appreciate your offer to continue this on your Talk page but this should be a more centralized discussion and this is the appropriate venue for that. And I really appreciate you taking the time to answer in the midst of the migration work. I understand that will keep you busy for a while. Jytdog (talk) 19:39, 16 March 2016 (UTC)[reply]
Apologies for not being more clear in my edits. I was not trying to remove anything without answering it. Because of your initial question around the phrase “Would users go to Wikipedia if it were an open channel beyond an encyclopedia?” I reworded to be more clear. Your initial question asking to rewrite, my edit to the FAQ to clarify what was meant by the question.
Again, sorry if I didn’t make that clear. I was attempting to help.
This particular phrase comes from the grant. What does it mean? Another way to phrase it would be something like “Would people go to Wikipedia to find information that was not solely encyclopedic?” Could we use the popularity and incredible work of Wikipedia to bring more folks information from other sources - like Commons, Wiktionary, and other Wikimedia projects - along with external sources of information like OpenStreetMaps? Could Wikipedia be a place for folks to start their journey of discovery? That’s just a question. We don’t know the answer, but one of the things we want to do is figure out the answer.
So, what are we doing to answer this question? Well I’m sure you’ve seen some of the work the team has done around the portal. One of the things we’ve done already is to include photos in the search results as you type. We’re also investigating ways to provide relevant links to sister projects and maybe even trending or recommended articles to visitors. These are small baby steps we’re taking to try to figure out what we can.
Stas (on Discovery) is actively working on the Wikidata Query Service, which is the API to make these queries possible. It’s been amazing to see what others have created on top of WDQS. I have it on my todo list and plans to enhance more of our documentation, even beyond any FAQ.
JHeald’s work is interesting, but at this time it is not incorporated into any of the work of the Discovery team. It does look like the Reading team and Wikidata folks are working on something. Granted that’s an “Epic” (geek code for really big project) and I have no insight into a timeline there.
CKoerner (WMF) (talk) 18:28, 23 March 2016 (UTC)[reply]
User:CKoerner (WMF) Thanks for that! You have again done a great job explaining what Discovery is doing now. A part of my question - perhaps the key part - is as I have said: "what 'an open channel beyond an encyclopedia' meant and/or means.... But then I would still like to understand what it meant." It is very clear that the scope of work under the Knight grant changed. What did this mean in the original proposal? If you don't know, please tell me, and please also tell me who could tell me. Thanks! Jytdog (talk) 22:40, 26 March 2016 (UTC)[reply]

Knowledge engine[edit]

I will add here, that I don't know if anybody at WMF "owns" the Knowledge Engine thing anymore, but I and others remain very interested in hearing the story of what actually went on with this. Who was involved (and who wasn't, which I am coming to believe is big part of the problems around it), how far did planning actually get, what was actually presented to the Knight people originally, etc. The story! Not high level non-informative things or point negations, but the actual story. Which I imagine will include one or more serious haircuts or pivots, leaving Discovery where it is now. Where ever that is. Having bits of it, and all kinds of negative emotion, and no forthright and coherent telling of the actual story.... is just a bad situation. Please resolve it!

Lila wrote here that the plans evolved, in dialogue with the Knight Foundation. I responded to her here and noted that grants generally include a provision that if the scope is changed, that needs to be done in writing and agreed to by the parties to the agreement, and asked if WMF would post the actual scope that WMF and the Knight Foundation have agreed would be done. That would be amazing to get, in conjunction with and supporting the story I am asking for above.

Thanks. Jytdog (talk) 02:44, 14 March 2016 (UTC)[reply]

CKoerner (WMF)., what do you know about this story? If you are not aware of the story, can you point me to who at WMF is? Am trying to work through appropriate channels if you are not it, please do tell me. Thanks. Jytdog (talk) 03:02, 16 March 2016 (UTC)[reply]
CKoerner (WMF) following up. Jytdog (talk) 17:10, 23 March 2016 (UTC)[reply]
To be clear, the entire grant was published. As far as I know, the grant deliverables have not changed since it was approved, and Knight is satisfied with Discovery's plans (which are public). An earlier (more ambitious) presentation to Knight was "leaked", and the stories of Damon's vision of the Knowledge Engine have been posted on public forums. I'm guessing you want more details about how the concept went from a glimmer in an executive's eye to the presentation and then to the grant. As mentioned in our conversation on another page, much of that story would probably have to be filled in by Damon, Lila, and Wes, and quite possibly several other people. I'm almost positive that CKoerner wouldn't be able to contribute much, since he joined the foundation long after all the "interesting" stuff had already happened, and few of the early ideas were shared with individual contributors in Discovery. Presumably CKoerner could consolidate all the existing public information into one place...but so could you, or anyone else. --KSmith (WMF) (talk) 17:53, 23 March 2016 (UTC)[reply]
What Kevin said is a suitable summary. CKoerner (WMF) (talk) 18:12, 23 March 2016 (UTC)[reply]
OK, I am escalating this. User:Katherine (WMF). The above is not a "suitable summary" and that is not for you to determine by fiat, Chris. I will point you to this bit from here, with some bolding added":
"Max: My concern is that we still aren’t communicating it clearly enough. This morning’s blog post is the truth, but not all of the truth. Namely that we had big plans in the past. It would have been much easier to say that we did have big plans, but they were ditched soon after that mock-up was made; in the summer; when Damon left. There is clear evidence of something, but we still haven’t acknowledged it. We can’t deny it.
Lila: I tried to be more specific. But not sure where it would be better to go into more detail.
Tomasz: We need a narrative. Not independent data points. We need to bridge the different facts.
Lila: How do we explain the story now? The original idea was a broader concept. Never a crawler. We abandoned some ideas during the ideation phase, but we haven’t been clear what/when we abandoned.
Max: Right now we are doing damage control. Ideally we should have just said it clearly in the beginning. Reminds me of the board’s response to recent scandals. Even though we eventually communicate it right, we still do it at the point where it doesn’t really matter.
Lila: I agree. We are where we are. We have made mistakes with community communication. We have learned; we know what we would have done differently. What are the next steps we should take? "
So - Where is the narrative? Where is the evidence that anything has been learned? Jytdog (talk) 20:13, 23 March 2016 (UTC)[reply]
Oh boy. I should have added a "of the story from my limited perspective". I don't have anywhere near a complete picture of the genesis. And with the departure of Damon and now Lila, we all, both staff and the community, may never get a complete picture. In time we may be able to build a better understanding but that isn't going to be instant since people still need to get on with the great work they do day to day. I was at this meeting. What exactly are you demanding? A narrative that ties together all the little bits that each of us know? CKoerner (WMF) (talk) 20:27, 23 March 2016 (UTC)[reply]
As you are the communications liaison for the DIscovery team, I do not take anything you say as coming from "your personal perspective". It is in that role that I have been pinging you and addressing you. What I have been asking for, nicely, for quite a while now, is exactly what was discussed in that meeting. A narrative that strings together the bits of facts that have been made public - whatever it was Damon was planning, whatever was pitched to the Knight Foundation originally, the Knight agreement that we have all seen which locked into words-on-paper the vision at whatever point that was written, how the scope of work under the Knight grant changed in subsequent discussions with them, the "moonshot" and $32M budget that Doc James heard discussed, the Discovery three year plan, the actual longer term plan that Discovery is working under now if the 3-year plan is no longer relevant, and what Discovery is actually doing now. The narrative of what transpired over the past couple of years around "search" at the WMF. Exactly what is discussed in the snippet above. This is not something that members of the editing community should have to do detective work around and assemble ourselves (which would still have tremendous gaps in any case) and the suggestion is unhelpful. This is something the WMF should provide. It still matters. Jytdog (talk) 20:32, 23 March 2016 (UTC)[reply]
Hey Jytdog, we haven't particularly crossed paths before but for all intents an purposes I am Chris' opposite number in Advancement. I need to ask you for some patience with this. You're right there is no complete narrative. Hopefully in time it will be possible to provide that. But it isn't going to happen right this very moment in time. The reason is because quite frankly no-one knows the complete picture. The steps of where we got to from this time last year to now are pretty complicated involving a number of individuals across a number of departments (mine included) and some key individuals who are no longer with the Wikimedia Foundation as has been pointed out.
What must not happen is a repeat of the situation we were in a month or two or three months ago. We need to make sure that we get this right. I do not want a repeat of little bits of information here and little bits there that contradict and confuse staff, board and community alike. Let us get our shop in order but also understand that we also have to go through that period of learning as an organisation. We are actively going through a phase trying to understand what happened and what went wrong, what we need to do differently and how we ensure we are going to do that. We should do this properly, otherwise we are still going to be having the same circular conversations months from now. And quite frankly I am sure neither I nor yourself want to be in that position. Jseddon (WMF) (talk) 21:26, 23 March 2016 (UTC)[reply]
Thanks User:Jseddon (WMF)! A commitment to provide an answer is great. What is lacking, is a date that you expect to be able to provide the narrative, to manage my and other folks' expectations. Also, I struggle with the claim that "no one knows the answers", as there are folks in higher management who appear to have been close to this who are still around... whether people in liaison are getting access to them, I don't know. In any case, would you please let me know approximately when you expect to post the narrative so I can look out for it and follow up at that time? I want my expectations to be managed.  :) Thanks. Jytdog (talk) 21:35, 23 March 2016 (UTC)[reply]
I can't give you any approximate date right at this moment because what I wrote was a hope and not a commitment. The dependencies on doing this are more than just me. Any date I give would be meaningless. I am sure that's frustrating since realistically it could be interpreted as "at some point in the indefinite future". But I would rather say that than make some commitment I cannot keep. The latter happens all too often in our movement. So my answer is right now I simply don't know when. Not days. Probably not weeks. Probably months. When I have a better answer to give you I will do so. That I can commit to.
Also understand that "no one knows the answers" and "no one knows the complete picture" are very different statements, and it was the latter I said. The genesis, evolution and subsequent implementation for everything that has been discussed doesn't sit with one person. Lots cogs, lots of information and in lots of places. You can believe me or not but I assure you I am 100% correct on this. And it will take time to paint that complete picture.
The final reason for not giving a date is that pen to paper isn't happening right at this moment as we speak. The last two months have been chaos to put it frankly and I can certainly speak for myself that after announcement of the departure of our previous ED it was simply nice to just get on with my job for a change without it seeming like everything was blowing up around me. To simply, then or now, drop everything to naval gaze would be irresponsible and quick frankly unhealthy. Understanding and healing is happening, planning ahead is happening both while getting on with the realities of our day to day jobs. I hope it will possible to do what you ask as those play out.
I realise that much of what I have written will not quell your concerns or manage your expectations. It wouldn't quell mine if I were in your position. But it is a frank and honest description of where we are at and I think the best I or anyone else could say at this time. Jseddon (WMF) (talk) 22:47, 23 March 2016 (UTC)[reply]
Thanks for your frank response. That's useful. Maybe I am misunderstanding some really basic thing here. I am asking these questions to the WMF and I am getting back responses from individuals speaking as individuals - as though you and Chris are volunteers like me, not as people who have roles in a hierarchy. As people who work for the WMF. I do not understand this. If you and Chris cannot answer this, why are you not referring me to your boss, or to someone who can actually make it the WMF's business to address this? Thanks. Jytdog (talk) 03:35, 24 March 2016 (UTC)[reply]
  • Over a year later, and still nothing. Jytdog (talk) 02:18, 11 June 2017 (UTC)[reply]

No-click satisfaction[edit]

User engagement with search results - If users do not click on results, then we haven't given them the results they wanted.

This afternoon, Google made a horrific update to my search result format (it followed for my wife a few hours later). There's now all kinds of extra visual clutter, and this visual clutter replaces snippets that were formerly long enough on which to base a sound judgement.

New Google snippet for the Wikipedia result on searching for "machine learning":

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit ...

Yahoo! search (never tried before today) returns this snippet:

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence.

I find the sentence fragment from the new Google interface spectacularly hard to mentally assimilate.

Then I tried to search the proper roasting temperature for raw cashews (I seem to always manage to get busy and scorch them) and it was the ugliest results page from Google I've ever seen. It was a giant sea of similar, impossible-to-discriminate links (lacked sufficient snippet text). Over at Yahoo! many of the longer snippets actually contained temperatures like "325" and "350".

On this occasion, I was going to click through regardless. But if I had just been checking to remind myself, I would not have clicked through. I would have instead enjoyed a tiny instance of no-click satisfaction.

You don't always have to click to be well served. MaxEnt (talk) 04:35, 16 January 2020 (UTC)[reply]