Discovery/Status updates/2018-01-08

This is the weekly update for the week starting 2018-01-08

Highlights

 * Latvian and Arabic Wikipedias enabled  on their respective wikis (completed by community volunteers)

Search

 * David tuned the wikidata fulltext search similarity parameters
 * Stas fixed an issue with the IDs option in forceSearchIndex.php is broken
 * Trey finished his initial analysis of phonetic algorithms available in Elasticsearch.

Portal

 * The portal's stats and translations were updated on January 8, 2018 via our mostly automated process
 * Jan updated "liguru" to "lìgure" on the Wikipedia portal

Did you know?

 * Last October, the President of Kazakhstan announced that the country would switch from the Cyrillic to Latin alphabet. As a result, over the course of about 100 years, the writing system for Kazakh will have changed from Arabic to Latin to Cyrillic and back to Latin. Several Turkic languages spoken in former Soviet Republics have gone through similar shifts, including Azerbaijani, Turkmen, and Uzbek—with the most recent shift to Latin for some beginning in the 1990s. Thus some speakers of those languages lived through all three changes to their official writing system.

FY 2017-18 Q3 (Jan-Mar) goals
This status was last updated 2018-01-08.

Search
Current Goals (FY 2017-18 Q3)
 * Objective 1: Implement advanced methodologies such as “learning to rank” machine learning techniques and signals to improve search result relevance across language Wikipedias.
 * Create and test advanced parser features
 * Evaluate and build new features for machine learning pipeline (T162279)
 * Begin to build relationships with external information retrieval researchers
 * Category search (keywords for sub-category searching)


 * Objective 2: Improve support for multiple languages by researching and deploying new language analyzers as they make sense to individual language wikis.
 * Continue to investigate morphological libraries for ElasticSearch plugins.
 * Implement Serbian, investigate Slovak
 * Improve search by using fuzzy (phonetic) language matching.
 * Continue general language support.
 * Investigate language analyzer config options


 * Objective 3: Investigate how to expand and scale Wikidata Query Service to improve its ability to power features on-wiki for readers
 * Acquire and productionize six new servers for WDQS, (see T178548)
 * Set up individual internal and external service endpoints with enhanced features for expert users


 * Address technical debt:
 * Elasticsearch 5.6/Logstash 5.6/Kibana 5.6 (ELK stack)
 * Maintain APIs
 * Translation extension

Structured Data on Commons
Current Goals (FY 2017-18 Q3)
 * Objective 1: Commons search will be extended via CirrusSearch and ElasticSearch and Wikidata Query Service, to support searching based on structured data elements describing media.
 * Search for file captions, including multilinguality (there will be multilingual file captures, there might be file summaries, and there might be additional related functionality implied designs when received); (also general design for search on FE)


 * Objective 2: Advanced search capabilities (e.g., Wikidata Query Service, SPARQL queries) will be updated to support the more specific media search filters and the relationships to the topics they represent
 * Upgrade and re-implement full-text search on ElasticSearch on Wikidata
 * Investigate using MCR with Wikidata

--
 * View all open tickets related to Discovery.
 * Looking to get involved? See tasks marked as Easy or volunteer needed