Discovery/Status updates/2019-03-25

This is the weekly update for the week starting 2019-03-25

Search

 * ElasticSearch upgrade to v6:
 * incident report
 * Trey finished a deep dive into the performance of language identification for cross-wiki searching (example ) and punctuation-related problems, and discovered things are working pretty well overall, but the Chinese language model is a bit off.
 * Erik noticed that the inlabel / incaption keywords should highlight the label/caption but were not
 * David worked on fixing an error code that Elasticsearch 6 nested_path and nested_filter are deprecated and _retry_on_conflict was deprecated
 * We worked on migrating mjolnir to stdout/syslog/cee logging output
 * The team worked on upgrade to elasticsearch 6.5.4 for cirrus / codfw (specifically) and for eqiad
 * Erik worked on the implementation and testing of glent m0 integration with wmf infrastructure
 * David did a lot of work to update the mw-config to use the psi&omega elastic clusters
 * David found that the auto_generate_phrase_queries is deprecated and ineffective
 * The team fixed an old bug where we were getting fatal errors - "cannot perform this operation with arrays" from CirrusSearch/ElasticaWrite (using JobQueueDB)
 * Gehel worked to make spicerack more robust when unfreezing writes to elasticsearch / cirrus as well as creating a cookbook to reset frozen write state on elasticsearch / cirrus
 * Stas moved WikibaseLexeme search code to WikibaseLexemeCirrusSearch extension
 * We noticed that Elasticsearch indices went read-only, causing a huge lag
 * We also saw where search exceptions handling was printing response information on the screen
 * The team fixed an issue where mwgrep was not working
 * We also fixed an issue where Elasticsearch 6 needed to silence deprecation warnings to avoid logspam
 * We needed to create an extra elasticsearch clusters in the beta cluster
 * We also needed some alerts so we know if mjolnir starts misbehaving
 * We also converted check_elasticsearch.py icinga plugin to py3
 * We needed to start using local nginx reverse proxy for connections reuse
 * The version of curator that we currently use (5.2.0) isn't compatible with elasticsearch 6. Which causes issues in a few cron on logtash servers (see blelow). Version 5.6.0 supports both elasticsearch 5 and 6.....so...we updated it
 * We also did some cleanup of the reprepro configuration for elasticsearch-curator
 * Getting a centralized way to inspect the content of the search profiles might be helpful when investigating search behaviors. In the same vein as other dump debug APIs (mapping/settings/cirrusdoc) David suggested that we should add a new simple API to dump the profiles (cirrus-profiles-dump)
 * David also found that a call to a member function toArray on a non-object (null) in vendor/ruflin/elastica/lib/Elastica/Client.php:736 and fixed it

--
 * View all open tickets related to Discovery.
 * Looking to get involved? See tasks marked as Easy or volunteer needed