Discovery/Status updates/2018-10-01

This is the weekly update for the week starting 2018-09-17 through 2018-10-05

Search

 * Implemented indexing statement values as part of main data in Wikidata, so that statement values are now searchable without special syntax
 * Reindexed wikidata which also enables qualifier indexing
 * Mathew worked on resolving an elasticsearch shard size alert by doing an in place reindex
 * There was a lot of work done to investigate a brief outage of CirrusSearch (mw exception spike for api.php), but it's resolved enough for now.
 * Gehel and others worked on refactoring puppet to support multiple elasticsearch instances on same node
 * Erik worked on an issue where the text content of wiki page in search index can merge words making them unfindable
 * Stas updated the search engine of Wikidata to enable searching by author name string
 * David and Erik worked together on evaluating adding an image quality score to media search result ranking
 * Stas added X-Search-Id to WikidataCompletionSearchClicks events
 * David added a way to configure timeouts of autocomplete queries
 * Erik upgraded saneitizer to constantly re-index documents
 * David investigated why interwiki cache hit/miss was no longer reported (since 2017) and decided to drop the support for caching interwiki queries
 * Mathew and Gehel worked on raising the alert level on disk space for old elasticsearch servers
 * Erik worked to correct issues where the Cirrus MLT cache had a 0% hit rate on switchover

WDQS

 * Added new NTriples RDF dump (which makes it easier to do per-line processing)
 * Internal cluster switched to Kafka events as change source, public cluster next

Did you know?

 * Different languages can have a different number of sounds they use; the set of sounds used in a particular language is called its “phonemic inventory”. The numbers of sounds can range from 11 to over 140! Having more sounds than letters, or different sounds than the usual sound associated with a letter, can be the source of unusual orthographies and/or transliteration schemes—including "q" formerly being used as a vowel in Natqgu (now Natügu), a language of the Solomon Islands.

--
 * View all open tickets related to Discovery.
 * Looking to get involved? See tasks marked as Easy or volunteer needed