Discovery/Status updates/2017-12-04

This is the weekly update for the week starting 2017-12-04

==Discussions== Discussions

Search

 * Upgrading to ElasticSearch 5.5.x took a lot of smaller sections of work to be completed :
 * Complete a Kibana security release https://phabricator.wikimedia.org/T173685]
 * Upgrading Logstash cluster to elastic 5.5.x
 * Upgrading all log producers to use the Logstash LVS endpoint
 * There was a HP RAID Battery issue on elastic2004 that has been resolved
 * There was an issue with forceSearchIndex.php hanging at the end of the process when running on large wikis, so we disabled statsd collection instead of replacing statsd

Portal

 * Fixed an issue where the logo on www.wikipedia.org is misaligned for RTL languages on very small devices
 * Automation is nearly done for the Wikipedia.org portal site

FY 2017-18 Q2 (Oct-Dec) goals
This status was last updated 2017-12-08.

Tech: Search Platform
1. Implement advanced methodologies such as “learning to rank” machine learning techniques and signals to improve search result relevance across language Wikipedias. 2. Improve support for multiple languages by researching and deploying new language analyzers as they make sense to individual language wikis. 3. Investigate how to expand and scale Wikidata Query Service to improve its ability to power features on-wiki for readers 4. Address technical debt:
 * Begin to automate the machine learning pipeline, starting by targeting eight to ten languages, other than English, that match (at a minimum) current performance and then deploy those models. (IN PROGRESS)
 * Investigate open source language software that is available and see if it can be converted into ElasticSearch plugins. (IN PROGRESS)
 * Investigate usage of fall-back languages (DONE)
 * Investigate fuzzy (phonetic) matching. (POSTPONED UNTIL Q3)
 * Continue general language support. (ONGOING)
 * Work on sub-category filtering and searching within the Wikidata Query Service. (IN PROGRESS)
 * Convert existing Selenium tests to Node.js (IN PROGRESS)
 * Investigate ownership and maintenance of Logstash (IN PROGRESS)

Structured Data on Commons
1. Commons search will be extended via CirrusSearch and ElasticSearch and Wikidata Query Service, to support searching based on structured data elements describing media. 2. Advanced search capabilities (e.g., Wikidata Query Service, SPARQL queries) will be updated to support the more specific media search filters and the relationships to the topics they represent
 * Determine advanced search requirements and measures for structured data on commons. (NOT STARTED YET)
 * Begin work on prefix- and full-text search in ElasticSearch on Wikidata in preparation for the Structured Data on Commons project. (IN PROGRESS)

WDQS
Wikidata Query Service goal for this quarter will be to work on sub-category filtering and searching within the Wikidata Query Service; it will be maintained by Stas and Guillaume to support the continued growth and use of the service; the Analysis team will help with statistics.

Portal
Update the Wikipedia.org portal codebase to be completely automated for ease of ongoing maintenance.
 * Automate portal project updates: statistics and translations (IN PROGRESS)

Maps
Support the move to be more operationally centralized and roll out a new map style that has numerous updates and enhancements.
 * Finalize and deploy new map style; replicate maps test cluster in Wikimedia Cloud Service; monitor for critical bugs (IN PROGRESS)

Analysis
The team will continue to work closely with the Search Platform team to analyze A/B tests and other assorted data; they will also begin working on determining a baseline set of metrics for Structured Data on Commons. (IN PROGRESS)

--
 * View all open tickets related to Discovery.
 * Looking to get involved? See tasks marked as Easy or volunteer needed