Talk:Wikimedia Discovery

Activity?
According to Wikimedia Foundation Engineering reorganization FAQ, "Search and Discovery" is a team, not an activity. If so, AFAICS it shouldn't use that template and it should be named Wikimedia Search and Discovery or similar (available information on "and" vs. "&" and casing is inconsistent). --Nemo 17:11, 30 April 2015 (UTC)
 * +1, the other point is, that "search" redirects to "Search and Discovery" (in my point of view a vote for deleting, searching for "Search", I normally don't think about to find a page of a team), and for "mediawiki search" in Google this page has a relatively good rank, but it doesn't describe a function in the mediawiki software, or (like you said) an activity. So i suggest to move this page to a page with a clear title. --Florianschmidtwelzow (talk) 08:00, 13 May 2015 (UTC)
 * The page has been moved to Discovery, so no longer has the word "search" in it, nor "&". --KSmith (WMF) (talk) 22:59, 10 June 2015 (UTC)
 * Nemo just moved this page, without consultation. I don't necessarily object, but some discussion would have been polite. Unlike all the other Wikimedia xxx team pages mentioned at team prefix:Wikimedia, Discovery is an entire Department, like Reading and Editing. --KSmith (WMF) (talk) 05:35, 27 July 2015 (UTC)
 * There is a discussion about WMF team pages, their names and locations, at meta:Meta:Babel which I hope you will contribute to. Rogol Domedonfors (talk) 20:59, 27 July 2015 (UTC)

CirrusSearch component
There are several interesting reports in the CirrusSearch component. In Phabricator I see quite some activity around recent things, while reports formerly considered normal/high priority (i.e. useful to improve search results) are mostly inactive. Does someone plan to go through all the reports and triage them? It would probably take less than a day for one of the ElasticSearch persons. --Nemo 11:28, 28 August 2015 (UTC)
 * user:Nemo bis, have you noticed any improvement in the management of older reports? John Vandenberg (talk) 04:39, 6 January 2016 (UTC)

Community Liaison job opening at WMF
Hi. There's a new job posting for a Community Liaison to work with the Discovery department. Please pass it along, if you know someone who might be interested or a good fit. Thanks. Quiddity (WMF) (talk) 20:06, 21 October 2015 (UTC)

Offsite homepage report
This page links to http://ironholds.org/misc/homepage_presentation.html, which I initially thought was a broken page because the '>' link is not easy to find (ironic for the Discovery team?) due to color scheme used. I assume it was work for WMF, and therefore should be posted onto a Wikimedia server, preferably a public wiki, and not a private website without a free content license. Could that report be posted on MediaWiki, User:Okeyes (WMF), please? John Vandenberg (talk) 02:58, 6 January 2016 (UTC)
 * To my knowledge there's nowhere to host this kind of report.If you can point me to a public wikimedia server that lets me host arbitrary HTML content, I would be interested to hear where. I have tried to export it as PDF but the format doesn't make it easy; I'll probably put some work into putting it together as a more structured report a la our other ones.
 * Your claim that it lacks a free content license is not the case; it is MIT-licensed and openly released, as is all of my work. That it lacks a copyright template is because I do not believe in releasing my work with any restrictions, which means CC-0 or MIT. I would be interested to know how you encountered the report (I've had multiple people poke me about it so I assume it's being discussed somewhere?) Ironholds (talk) 23:43, 6 January 2016 (UTC)
 * I encountered it because the link is on this page. I'm not aware of any other discussions occurring about it.  Your website doesnt mention that it is MIT, and it doesnt link to https://github.com/wikimedia-research/wp_home, so there is no way a reader can ascertain its copyright status.  A PDF of the content on Commons would be great, even if some of the functionality is lost.  Is it possible to post the HTML using Github Pages?, that way the rendered version is more clearly linked to the repo where the license is declared. John Vandenberg (talk) 23:56, 6 January 2016 (UTC)
 * I'll see if I can generate a HTML version for that, sure, but I'd much rather the PDF, which I'll work on today. Thanks, Ironholds (talk) 16:05, 7 January 2016 (UTC)
 * The report can now be found here. Ironholds (talk) 02:28, 9 January 2016 (UTC)

Wikipedia portal only
Is wikipedia.org the only portal that is under the purview of Discovery? This is strictly a clarification question about scope, as everything I see related to this appears to be wikipedia.org only, but the other portals are not explicitly excluded from the scope.

I assume wikipedia.org is the only portal with traffic significant enough to warrant it being a legitimate target for optimising knowledge pathways at present, but will improvements made to it also trickle down to the other portals? John Vandenberg (talk) 03:03, 6 January 2016 (UTC)
 * Caveat that the product manager can and will provide a better answer, which might contradict this one: all the portals are within our demesne. If changes also improve the other portals, absolutely. Most of the changes we're looking at are around UX design and should transer nicely. Ironholds (talk) 16:06, 7 January 2016 (UTC)

Congratulation to the Knight Foundation grant!
Great to hear about your success! I was wondering if you would be willing to share the full application that you sent in on-wiki? Having examples of successful applications can help other Wikimedia organizations in their work with external project grants. We have recently started a list to gather positive examples here. Kind regards, John Andersson (WMSE) (talk) 15:58, 8 January 2016 (UTC)
 * I think there's a better approach than that. Keep in mind that at large funding levels between major partners, the process is not "send in an application and hope for the best" but a series of meetings to explain and gain mutual understanding about a set of objectives.  Training in that for chapters by the people who were involved is a great idea!--Jimbo Wales (talk) 11:06, 1 February 2016 (UTC)

Organizing Main Discovery Page
Should each project (Search, Portal, Maps, etc.) have a landing page that talks more in-depth about the work (ala Wikipedia.org Portal Improvements)? Right now search goes to an extension page, but that's not really what search is all about. It's the technical implementation of a much larger corpus of work. I think things could be a little more balanced but wanted some feedback. CKoerner (WMF) (talk) 01:27, 29 January 2016 (UTC)

Platypus
There is a demo from some french students in theoretical computer science. They wrote an open source project which aim is to create an open source question answering framework and a demo of it. Just for the info. --Molarus (talk) 09:04, 31 January 2016 (UTC)
 * See:


 * http://askplatyp.us/ Ask Platypus questions in English about general knowledge and math. (example Where is Chinandega?)
 * https://tools.wmflabs.org/magnus-toolserver/thetalkpage/ Talk to Wiri (by Magnus Manske)
 * https://tools.wmflabs.org/bene/ask/ (example Where is Chinandega?)
 * https://tools.wmflabs.org/ppp-sparql/ (example Where is Chinandega?) --Atlasowa (talk) 14:05, 10 February 2016 (UTC)

Discovery vs. original research
What is the difference between "discovery" and "original research"? Between "discovery" and "search"? The terms "search" and "original research" are well understood in the context of Wikipedia, but what, exactly does "discovery" mean? Is it a concept of science fiction? Wbm1058 (talk) 04:15, 1 February 2016 (UTC)
 * I am speaking only for myself and I am not a staff member. "Discovery" is a broader term than just "search".  On Wikipedia, people discover things in a number of ways: the most basic links to other articles, series of articles, categories, sequences and timelines, the front page, and, yes, the search box.   Another aspect of "discovery" is how people find Wikipedia - search engines, links from other websites, re-use of our content by people who link back to us, sharing on social media, etc.
 * We can also think of "Discovery" in the context of readers and in the context of editors. Currently, as an editor, if I visit an article without an image and I think "Gee, I wish this had an image" then I probably go to commons and use the search box there.  Can we make that process easier and more efficient?  Currently, as an editor, if I see a link to an outside source I may wonder what other Wikipedia entries link to that source.  Can we make that process easier and more efficient?  Etc.
 * "Discovery" has nothing to do with "original research" - which is an entirely different concept and entirely different concern.--Jimbo Wales (talk) 11:05, 1 February 2016 (UTC)

Is the knowledge engine a tool for data mining? Does it use machine learning or genetic programming for the purposes of knowledge discovery? Wbm1058 (talk) 04:29, 1 February 2016 (UTC)
 * The WMF has discontinued the use of the term 'knowledge engine' - presumably because it was causing people to ask just this kind of question. In my view, we can think of the entire workings of Wikipedia and the Wikimedia projects, including the editors, the software, discovery elements, APIs, etc. as a global "knowledge engine".  But that's just a way of thinking, not a specific plan.--Jimbo Wales (talk) 11:05, 1 February 2016 (UTC)

About "I probably go to commons and use the search box there.": Actually I did just that yesterday. I have to say first, that I´m an experienced editor. I was looking for an icon, but I didn´t know what icons are there. I started with the searchword "icon" and moved then to the categories commons:Category:Icons. With categories I can search, without knowing the right name of the file. I don´t know if the searchbox will ever do the same, but I understand that new editors don´t know categories. By the way, I think there is AI software about tagging pictures. Categorizing new pictures by software could be a help, I think, at least I remember that I had read somewhere that this is a big part of the work commons editors have to do. PS: I do searching this way in Wikipedia too. PPS: Maybe another aspect of search is that I´m using sometimes Wikipedia to search for searchwords. Since I´m no native English speaker, I don´t know the right English word. Therefore searching in WP and switching from one language into the other is sometimes the first step before going to a search engine. I have learned this while I was researching things in the internet for writing articles. Now I´m doing this quite often. --Molarus (talk) 12:48, 1 February 2016 (UTC)

My view is that both "knowledge engine" and "discovery" are ambiguous terms, so I'm not sure switching from one to the other is helpful. Perhaps "super search" or "enhanced search" would be better. Maybe just view "discovery" as a "code word" while in the R&D phase, and wait until actual product(s) emerge from this to give them more permanent names.

I see that one example of enhanced search would be "what links here" both to and from external websites, and other Wikimedia projects. Wbm1058 (talk) 16:28, 3 February 2016 (UTC)
 * Hello Wbm1058, I'm the Community Liaison for the Discovery team. Happy to help answer any questions you or other folks have about the work the team is doing. I thought it might be helpful to clarify a few things you've mentioned.


 * You're right that the team name of Discovery is a bit ambiguous. That's intentional. The work the team is doing not exactly "just search", they're working on the discovery, or finding, of information across the various wikimedia projects. Like how our Editing team takes care of editing, the Reading team, well, readers. Discovery is the team looking at those folks looking for information. Could be search, could be embedding editable maps, could be how people enter into our projects - heck it could even be how people use our API to pull data out of our projects for analysis and use elsewhere.


 * So Discovery is the group of projects that we're all working on. I'll talk about a few briefly, but you can learn more on the main team page (something I'm working on improving). One project is Search - making it easier to search on say English Wikipedia without having to go back to a search engine to find what you're looking for. One small example is improvements to the suggestion tool (demo here) that is more lenient and allows for things like misspellings (happens to the best of us). Future ideas might be showing images from Commons or quotes from Wikiquotes when you search for something like "Albert Einstein".


 * Another is the portal. Did you know that millions of folks visit wikipedia.org every day? Not a specific language wiki, but that landing page. That's a big opportunity to introduce the projects to people who might be totally new to the movement, in new and useful way. We've already done some testing (using A/B tests) to show that a few small improvements can result in more visitors finding content within our projects. We have a draft article we hope to share with folks soon you can read for more info (and I'd love any feedback on the article itself!).


 * We are looking at ways of learning from users of our technology to understand how it's being used and how to improve it. I appreciate your feedback and thoughts and hope you continue to join us in the journey. CKoerner (WMF) (talk) 19:46, 4 February 2016 (UTC)

Completion suggester
Hello, I've been reading a very interesting mailing list thread on the new completion suggester.

Rather than talk about the weight of pageviews, I was wondering another thing: why does the search return suggestions on the first character? Shouldn't it return a list on the third character?

I mean, if I type "p", I will almost surely type more characters. Why request suggestions so quickly? --NaBUru38 (talk) 02:57, 10 February 2016 (UTC)
 * Hello NaBUru38, which mailing list thread? I'm still learning and would love to be aware of the conversation. Other folks who are more knowledgeable might chime in here, but here's my two cents. I think part of the reason we start completing with a single character, like "p", is that we want people to get feedback on their search immediately instead of delayed by a certain character limit. Another reason is that we actually have articles to point people to that are only a single character long, like the article on English Wikipedia for P or P the american alternative rock band! CKoerner (WMF) (talk) 16:23, 10 February 2016 (UTC)


 * Hello, I meant this. --NaBUru38 (talk) 23:31, 10 February 2016 (UTC)
 * I just saw your response NaBUru38. That made me chuckle. We have to be careful, apparently computers have dirty minds. :) CKoerner (WMF) (talk) 21:37, 15 February 2016 (UTC)

Searching two terms in the same section only
Right now there are people complaining at the Wikipedia Reference Desk that the Reference Desk, they think, is useless and not part of the encyclopedia. I believe they are wrong, in part, because they fail to understand that the daily Q-and-A of the Refdesk is just phase 1 of a multiphase operation. At some point, we need to process the voluminous archives we've accumulated to produce lists of answered questions, in which we've separated each question, rephrased it to be more readable and match the answer we were able to give, and provided specific, sourced answers, that can be more effectively searched.

A very, very basic step in this would be to make it easier to search the archived questions we have now. Presently, if you want to search out something about Jupiter's atmosphere, you get back results for any day where people talked about Jupiter in one question and the Earth's or someone else's atmosphere in another. At the very minimum, I'd like a way to do a search for Jupiter and atmosphere only when they appear in the same section (and by section, in this instance, I mean h2 but ignore h3 and below...)

To be honest, there could be an option in the search for this right now and I don't know - the search documentation is ... someplace... I saw it once... it doesn't really jump out at me when I do a search, now does it? I mean, in a link like this - what newbies are given when they hit the button "search reference desk archives" - they grudgingly spill the beans that there's a "prefix:" magic word, but they certainly don't point you toward the full list.

In general, I think that a Knowledge Engine might have a good use for Refdesk archives. If there is a way that you can point Refdesk users toward what would be the most useful curation to do to make a digest of these records for you to use, it might benefit both initiatives. Wnt (talk) 20:59, 14 February 2016 (UTC)
 * Yes, the prefix search we currently use is rather limiting for pages not in the main namespace. In the example conversation you gave on the Reference desk, is the problem that the individual did not search first? I'm not very familiar with that corner of the English Wikipedia and would appreciate clarification. CKoerner (WMF) (talk) 19:29, 15 February 2016 (UTC)
 * The link I give is just if you hit "search archives" from a main Refdesk page like this one. People should search before asking questions, yes... but that's not really the whole story.  The problem is, if you go to the talk page from the Science desk I just linked, you'll see a lot of ...... the kind of people who have too much influence on these projects nowadays, complaining the Refdesk is worthless or at least not an encyclopedia because it only helps one person.  But, I don't think it should help only one person.  I think we've accumulated this huge database of answered questions, which we could improve further with a lot of work by editors.  But a big obstacle to motivating that is that the search is so poor.  It's just too hard to pick out a relevant preceding question, which means that the same questions get asked and re-asked, while others who don't want to wait around for an answer are just bouncing and we never know about it. Wnt (talk) 01:39, 18 February 2016 (UTC)
 * Also, we are working on improving the way searching (and the results within) work. There was a short blog post about it a few months back. Before I joined the team so I'm not super familiar with it, but happy to reach to other team members for more information if you'd like. CKoerner (WMF) (talk) 19:31, 15 February 2016 (UTC)

Knowledge Engine by Wikipedia
It seems that this is the working title of the project. It would be helpful to know the precise relationship between this and the various actions planned and discussed here. One important question which needs an answer somewhere is what Curation means in this context. The proposal uses the phrases "openly curated", "public curation mechanisms", "curation of that data". May we know who you envisage undertaking this curation? Are you by any chance assuming that the Knowledge Engine by Wikipedia will be curated by the current Wikipedia volunteer community? Rogol Domedonfors (talk) 22:24, 13 February 2016 (UTC)
 * Rogol Domedonfors, I'm sorry for any confusion over the phrase "knowledge engine". The FAQ clarifies a bit. There isn't anything being built by that name. It's an old term used mainly (if not only) for the grant. I'll do my best to help with the intent of the word "curation". It's pretty simple, even if our past explanations have been lacking. It means exactly what it means today to the movement. If we want to improve the quality of our search, say by improving the ranking of articles in search results, we'll do so with the communities. "Openly curated", "public curation mechanisms" and the like refer to the already impressive work we've done together. This is a continuation of that tenant into new areas - like improving search across Wikimedia projects. So yes, your assumption is correct. Would you like to help? CKoerner (WMF) (talk) 19:19, 15 February 2016 (UTC)
 * Thanks for the prompt response. As I understand it then, the WMF put forward a grant proposal for a "Knowledge Engine", and received funding for it, but actually nothing like that is being built.  The proposal was made without any kind of consultation with the community, and the WMF assumed, and assured the grant-giving body, on the basis of no consultation whatsoever, that the community would be ready, willing and able to take on this extra curation work as an addition to the work they already do voluntarily.  The possibility that the community might decline was not identified as any kind of risk: if seems to have been taken for granted, and money was actually asked for and accepted on that basis.  This attitude to the community is unsatisfactory to say the least.  You say that you will do this work with the communities.  How can you possibly be so sure?  Can you please now explain in more detail what new work you are expecting that the communities will be willing to do and give them some reasons to think that they might be willing to take it on?  It would be much appreciated if you were to take an attitude that better suggested that you understood that this was a request, and a request that might well be turned down in the current poor state of relations between the WMF and its volunteers, volunteers who are most unlikely to appreciate being taken for granted in this way.   Rogol Domedonfors (talk) 20:26, 15 February 2016 (UTC)
 * I'm afraid Rogol Domedonfors that things aren't so easily black and white in this case. Many people have been involved in the creation of the grant and the related work around search, maps, portals and more. The language has changed, but the intent has not. We want to improve search within and across Wikimedia projects. Our beloved MediaWiki.org included. I do not know if you are aware of the FAQ regarding the grant. It has a section about the element of human curation. What specifically within the grant, Discovery's list of work, and the FAQ do you believe the community would not welcome? These ideas would be helpful in our request for comment and I'd be happy to illustrate concerns for the developers involved as things move forward. CKoerner (WMF) (talk) 21:53, 15 February 2016 (UTC)
 * The links you give make it clear that nobody is currently able to state, specifically, what that extra work will be (although it is clear that you do expect extra work to be done in "new areas") so you can hardly ask me to say, specifically, what the community would or would not welcome, when you yourself do not know, specifically, what you are asking the community to undertake. You do admit that extra work will be required, and yet you are content to assume that the community will be happy to provide extra effort, and indeed to accept grant money on the basis that extra effort will be forthcoming simply for the asking.  Do you really not understand how arrogant that appears?  Did you ever test those assumptions -- at what stage did you engage the community with your plans and gain some kind of consensus that the project would be worth the extra work and that there was a broad willingness to deliver that extra effort?  Where was that engagement and when did it take place?  Was it before or after committing the WMF to the expenditure of a large amount of donor money?  Rogol Domedonfors (talk) 22:28, 15 February 2016 (UTC)
 * You're right, I shouldn't attempt to speak for all volunteers of the Wikimedia movement. No one should. You're also right that at this point in time we do not have a list of every imaginable way we might require curation in regards to search and other Discovery projects. Heck, I might even be wrong in that we need curation from contributors. I also don't intend to assume volunteers will be upset if asked for further curation and involvement. If I understand your concerns, is in the same vein. That the WMF should work with the communities. We are trying. It's something I take very seriously, as a volunteer and now a WMF staff member. A small example is the work we're starting on trying to update the weight of pages in search results. One of our first tasks, and one that blocks (or prevents) us from rolling out the technological solution, is a task asking for community input.
 * While I can't speak to the decisions around the grant (it's in the past, before I joined) I can say that my intent with responding to you is to help moving forward. I know it might be perceived as trite, but I hope you believe me.
 * The entire movement is founded on the idea that everyone pitching in a little can make quite a bit of change. I'll ask you again, what are your specific concerns, how can we address them, and how are we falling short? To further that, if you feel affronted by some past work of the team, has Discovery produced anything without involving the community that you'd like to discuss? CKoerner (WMF) (talk) 23:08, 15 February 2016 (UTC)
 * My concern, as I have expressed twice already, is that the WMF has embarked on a programme of action which will require extra work from the volunteer community with no engagement with that community and no evidence for the belief that the work required will be forthcoming. That should not have been done, and the WMF needs to acknowledge that mistake and work to repair the damage to the relationship caused by that failure.  The way to address that failure is to engage the community as soon as possible, that is now, and not just in small-scale tactical questions, but in deciding whether the programme is broadly speaking workable or not.  I think you need to ask the community whether it is willing to support your programme with its efforts before proceeding.  Rogol Domedonfors (talk) 07:33, 16 February 2016 (UTC)
 * There was a presentation of a concept (which was leaked, and which never even became an actual plan), and then there is the grant, which is binding. The "Knowledge Engine" concept that was presented in June had already evolved substantially by the time the grant was awarded (September?), and has continued to evolve since. Many of us believe it was a mistake to even give that presentation without vetting the ideas with community members, so I understand your frustrations about that.


 * As far as I can tell, the actual grant only refers to curation in one context: "Test results from exploring relevancy through a federation of open data sources, including structured data via Wikidata, and curation of that data with human and machine learning." Humans are already curating Wikidata, so I think that's covered. I am not aware of any actual plans for Discovery work that would rely on any human curation beyond what is already being done.


 * You are correct that we as an organization need to do much better at addressing all of this. Just recently, those of us who work in (or with) Discovery were specifically encouraged to be as open as possible, which should help. It helped encourage me to write this response. Lila is working in other channels to try to clear up confusion and respond directly to any questions that haven't already been answered. Thank you for your patience here. --KSmith (WMF) (talk) 01:30, 18 February 2016 (UTC)


 * Thank you for those assurances. Rogol Domedonfors (talk) 21:55, 18 February 2016 (UTC)


 * I too am concerned about the concept of "curation" in relationship to "enhanced search". Surely y'all are aware of the concept of search engine optimization (SEO), probably more so than I am, and the risk that an army of low-paid editors in "global south" locations motivated to elevate their clients' pages in wiki search results would likely overwhelm western developed country volunteers, unless adequate controls were implemented to prevent that. Also, that unless this "curation" work was really fun and interesting, it would likely take only twelve months to build up an eleven-month "curation backlog". Recall how past requests, such as to curate Article Feedback Tool comments, went. Wbm1058 (talk) 16:43, 26 February 2016 (UTC)


 * I've posted a series of questions on the discussion page of the FAQ, here. I hope they will be answered and incorporated into the FAQ.  I understand that there are questions that there you may see as going beyond the scope of the Discovery team's focus; if there are please do get help from the people who can answer them.   Thanks. Jytdog (talk) 16:45, 17 February 2016 (UTC)

Images in search results
Ping Deskana_(WMF) or anyone else.

Having software blindly grab images is a problem. I suggest you pull them off of the search results. Wikipedia is extremely not-censored. We have everything from explicit porn to images of Muhammad to stomach-curdling medical images.

For example typing pearl n in the search box instantly shows list with exactly one image, that image is the first search result, and it's an image of a cumshot across a woman's neck. I couldn't even begin to guess how many short simple character-strings will bring up penises and vaginas and assholes and various sex acts. Our lead image on the articles Muhammad and Depictions_of_Muhammad are "innocuous", but other articles very well could lead with offending images. Having them pop up on unrelated searches could get ugly. Alsee (talk) 23:20, 14 March 2016 (UTC)
 * Thanks for your suggestion (and for pinging me so that I saw it). I agree that some users could find the page images that are chosen objectionable. Unfortunately, what you're suggesting is not going to happen, for several reasons. Firstly, last year I worked with the Design Research Team to perform usability tests on the mobile app when we added these images to search results, which showed that users typically found it much easier to use search when they had additional context from images. Secondly, in an A/B test on the Wikipedia portal, adding these images significantly increase the rate at which users are clicking through to search results, which validates the outcome of the usability testing. Thirdly, the whole point of the relevant policy is to prevent exactly the kind of thing you're requesting, namely censoring content to try to make it acceptable to the masses. I quote the policy, ""Wikipedia may contain content that some readers consider objectionable or offensive—​​even exceedingly so. Attempting to ensure that articles and images will be acceptable to all readers, or will adhere to general social or religious norms, is incompatible with the purposes of an encyclopedia.""


 * So, in summary, I think your intent is laudable and I appreciate it, but sorry, your suggestion goes against both the qualitative and quantitative data that shows these images are useful to users and the relevant Wikipedia policies. --Dan Garry, Wikimedia Foundation (talk) 00:00, 15 March 2016 (UTC)