Discovery/Status updates/2019-01-07

This is the weekly update for the week starting 2019-01-07

Discussions[edit]

David discovered an issue with the click-through rate on one of the Search dashboards for mobile apps and enlisted Chelsy's help in fixing it quickly (done!) [1]
Mathew worked on increasing the number of shards for enwiki_general [2]
David helped to augmenting the list of known clusters using cluster conf for Mjolnir [3]
David updated the completion suggester: TP50 was increased from 9ms to 24ms [4]
The Search team worked on supporting searching multiple filetypes at once, based on input from the Multimedia team [5]
David and Mathew worked on allowing ElasticSearch machines to be able to communicate with each other on port 9500 and 9700 [6]
We found that most of the dashboards in grafana are designed to have a cluster per DC, and we needed to refactor them so that we can select a specific cluster (by adding chi, psi and omega selectors) [7]
The multi-instance support code added for ExternalIndex was designed without the group+replica concepts in mind, so we fixed ExternalIndex to support groups & replica topology [8]
There was a recent spike of fatal timeouts from API search suggestions (prefixsearch) that caused a number of user queries to become stalled for 60 seconds and then receive a generic error page without any results. We fixed this by merging a patch for language detection to not be run when rewriting is not enabled [9]

We have added a new keyboard shortcuts to WDQS UI, for those systems where Ctrl-Space is already taken - Ctrl-Alt-Space and Alt-Enter [10]
Stas found an issue where the WDQS puppet/hiera configs were too distributed, Mathew and Gehel worked on it with assistance from SRE (thanks!) [11]
Our database in WDQS seems to hit Blazegraph internal limits, which requires some careful work of rearranging the data to stay away from the limit. This work now has started [12]
Stas have fixed an issue where a large update could crash Updater [13]
Stas have fixed an issue where due to database replication delay, Updater could read an old version of the data from Wikidata [14]
Stas fixed an issue where SERVICE SILENT construct was producing errors despite standards saying it should not do that [15]

--