Talk:Wikimedia Discovery/FAQ

Be clearer on data sources
The first mention of "data sources" is
 * and incorporating new data sources for our projects

But that just links to a map, which seems to be a different way to display search results. Please give actual potential data sources instead of an unclear link, thanks. -- SPage (WMF) (talk) 19:47, 9 November 2015 (UTC)
 * The section below calls out that its supported by external OSM data. That data corpus includes items (buses, trains, etc) that are outside of what our elastic indices include. RU WikiVoyage and soon EN WIkiVoyage will default to our tiles and are already starting to surface transit, points of interest, and articles for discovery of new content. As for other data I cite 'census, national gallery, etc' in https://www.mediawiki.org/wiki/Wikimedia_Discovery/FAQ#If_you.27re_adding_new_data_sources.2C_isn.27t_that_a_search_engine.3F but that's really up for a community discussion of what data sources can help in the same way that OSM did Tfinc (talk) 19:57, 10 November 2015 (UTC)


 * If Fox News or TeleSUR have the slightest chance of appearing as data sources of this searching project, I will campaign to stop it. --NaBUru38 (talk) 14:27, 15 February 2016 (UTC)
 * I'm right there with you NaBUru38. There are data sources, like OpenStreetMap, census data, and other bits of useful open data that we could pull into search results. These would provide a more rich search experience - on wiki - that what we currently have. Of course, we want to engage early and frequently with the community to determine what would be acceptable to include. I created a task (T126980) to track this concern. I encourage you to reference it as we move forward.


 * To be honest, we probably won't get around to this for a while. We have more immediate improvements in this quarter and are working on our longer-term plans. Where something at this level most likely resides! CKoerner (WMF) (talk) 15:38, 15 February 2016 (UTC)

OpenStreetMap is not incorporated into search (e.g. in elastic) and is separate. Of course, search results could be displayed on a map with OSM tiles. Parts of OSM data could also be used as an overlay for Wikivoyage, but don't think it should being mentioned as part of the grant and "search engine" in this way. Aude (talk) 19:36, 17 February 2016 (UTC)

orphan
This page has been up for a week but, as of the time of writing this literally nothing links to this page, it's an orphan. That's kind of ironic given that it's a strategy document for the 'discovery' team :-) Is there a plan for when this 'not-a-knowledge-engine' strategy will be announced widely? Wittylama (talk) 23:26, 9 November 2015 (UTC)
 * Thanks for the feedback. I've linked it from https://www.mediawiki.org/wiki/Wikimedia_Discovery so it's no longer an orphan. Next we'll be adding a set of wiki pages to compliment the discussions that have been happening on phabricator, email lists, and on wiki to bring it all together Tfinc (talk) 21:13, 10 November 2015 (UTC)
 * Also would love your feedback on the Discovery Roadmap linked in the FAQ. https://www.mediawiki.org/wiki/File:Discovery_Year_0-1-2.pdf Tfinc (talk) 21:17, 10 November 2015 (UTC)

"Are you building Google?"
This FAQ included the question "Are you building a search engine?" But after a complex edit history from a few days before the November 2015 Board of Trustees meeting, that is not at all the question that is addressed; specifically, the answer begins by stating "We are not building Google," and then includes a couple sentences I basically don't understand. This does not even come close to addressing the important question "Are you building a search engine," and leaving the section title intact is IMO highly deceptive to the casual reader. I'm not qualified or positioned to improve the answer (though I think that should be done). But I do think it's important that the question reflect what is actually said, which is why I have (for the second time) changed the question title to "Are you building Google?" Pinging who reverted this the first time. Happy to hear your thoughts, but I hope you can at least agree that this question is not addressed in this section? -Pete F (talk) 03:57, 14 February 2016 (UTC)
 * "There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors. - Jeff Atwood"
 * Discovery is not building a competitor to Google in the sense that Google searches everything it can find. We are trying to improve search across Wikimedia projects to provide better results. That's it. Imagine being able to search for "leaning tower in Italy" and not only see the Wikipedia article in your language, but photos from commons, a map of it's location, and information on it's physical properties from Wikidata.
 * I'm new to all of this. Let me ask around and see if we can get some clarification. CKoerner (WMF) (talk) 18:11, 15 February 2016 (UTC)
 * Pete F As you may be aware a blog post went up yesterday. Lila's also hosting an AMA-style (ask me anything) on Meta. I believe Tomasz is looking into an interview as well with the Signpost. I don't know if any of that helps with clarity, but I wanted to highlight the efforts to bring some clarity to things. I'm updating the FAQ with some of the questions that have been asked on this talk page and others. Again, more than happy to help where I can. CKoerner (WMF) (talk) 19:42, 17 February 2016 (UTC)

More questions
The answer to the first question is very unhelpful. Whatever you are calling it, please actually describe it. That is what all the questions below are about. In what follows, please replace "Knowledge Engine" with whatever it is you are calling it. The name is not what is important. What you are working toward, is important.

From what I have been able to piece together, the Knowledge Engine is a) a bunch of data, contained in or linked to the Wikidata database; b) an interface to receive a query; c) algorithms that create and display an answer to the query based on the data, formatted sort of like a WP article.

I have no idea how KE search results are envisioned to relate to existing WP content; if the notion here is just to archive existing content, or somehow fragment it and import it all into Wikidata, or if existing content will somehow remain in existence and available to the public. I have no idea if the WMF intends to put any further energy into making existing WP content more available to the public in the form in which it currently exists. Please address all that.

I don't understand what role the existing editing community is intended to play in all this.

If I am way off track, would you please describe in some detail what the Knowledge Engine actually is imagined to be, and what it will do, and how it will relate to the Wikipedia that exists today -- concretely, so it is understandable to the average reader of WP? (without technobabble) Three key questions there.

More concretely

1) Please rewrite the question "Would users go to Wikipedia if it were an open channel beyond an encyclopedia?" in plain English.

The "Wikipedia" we all know is an encyclopedia full of articles created and maintained by people.

I don't know what an "open channel" means.

I don't know what "an open channel beyond an encyclopedia" means. The question implies that the "encyclopedia" won't exist anymore - instead, the "open channel" would exist. Is that what is envisioned? If so, what happens to the Wikipedia content that exists today?

2) What would a Knowledge Engine search result look like? Are this and this prototypes of what you have in mind?

3) How exactly does Wikidata fit into the Knowledge Engine vision?  Is the vision here that results like those above will be created on the fly by the algorithms based on Wikidata when someone makes a query to the Knowledge Engine?  And that WMF will aggregate a bunch of other source data into Wikidata, or link to, or whatever?

4) Will there be any content curated by editors as there is today, or will the editing community become curators of Wikidata?

Please incorporate answers into the FAQ.

Thanks Jytdog (talk) 01:22, 17 February 2016 (UTC)