User:CKoerner (WMF)/Discernatron

Intro
To iterate faster on changes to search we need to be able to test search changes before we push them out to users. Discernatron is a platform for collection of human judgement of search result relevance. Basically, when evaluating potential changes to search we will rate those changes by how much closer we can get to putting the most relevant articles, as judged by humans in Discernatron, at the top of the page.

Login via meta.wikimedia.org »

What queries am I rating?
Every month the discovery department loads approximately 500 randomly selected search queries from the English Wikipedia into Discernatron for grading. These queries represent around 0.0001% of the total full text queries issued. This sample is incredibly small, but still represents a wide swath of the types of queries received. Before being released to Discernatron two WMF employee's review the sampled set of queries and remove anything that could be considered personally identifiable information (PII). Initially only queries for English Wikipedia are being used but we will expand to other languages, such as French, Spanish and Russian, as time goes by.

So someone is looking at all my searches?
No. When reviewing queries there is no additional meta data, such as user name, location, or IP address. Additionally due to the sample size it is very unlikely that the sample contains more than one query from any single user. See also our Data access guidelines.

What kinds of queries are removed?
Anything potentially personally identifiable. This means any kind of phone number, serial number, or non-notable address. We remove searches for specific URLs and non-notable companies. Additionally names of non-notable people (those that don't have wiki articles and aren't mentioned prominently in any other article). For the benefit of graders most non-English searches are also removed, as it would be hard to judge the quality of results. Finally "junk" queries, such as "Ikofjfgbgtbrtlbluirytytrohooygyugc", which make up one to two percent of total query volume are removed.

=Instructions=

How do I score queries?
You will be presented with a page containing the query at the very top and a list of results that could be relevant to the query. By tapping or clicking on the result your relevance ranking will cycle from None to Irrelevant, Maybe Relevant, Probably Relevant, and Relevant. Tapping once more after green will bring the result back to unrated. You must rate at least 80% of the results to a query for the results to be saved. If you aren't sure select 'Skip this query' and you will be taken to a new query to rate. Skipped queries will not be shown to you again.

Snippets
Along with each potential search result there is a snippet available. Clicking on the down arrow will expand the snippet for a given result.

What differentiates Relevant from Probably Relevant?
A result is relevant if you would expect to find it in the top 5 results to a query. If something is related and possibly the answer to the query, but not certainly, use probably relevant. When grading please keep in mind that the top of results page is limited in space; having 10 results that are all the best answer to a query is impossible to show. Try and pick the best results as relevant, and set the others to probably relevant. Probably relevant results are ones you would expect to find in the bottom two thirds of the first result page.

Maybe Relevant?
The maybe relevant ranking is reserved for items that aren't completely irrelevant, but also aren't great answers to the query. Maybe relevant results could show up on the results page, but wouldn't be particularly desirable. The main difference between maybe relevant and irrelevant is that irrelevant queries have no relationship to the query.

What about disambiguation pages, lists, talk pages, categories, etc.?
We are not sure yet if these are good results or not. Use your best judgement as to the quality of a result with respect to the given query and we will compare inter-judge rankings to try and decide what people expect here.