Discovery/Status updates/2018-01-29


This is the weekly update for the week starting 2018-01-29


  • Several members of the Search Platform team participated in the annual Developer Summit held January 22-23, 2018 in Berkeley, California. [1]
  • During the WMF's All Hand's two day event, the Search Platform team met with the WMDE team and the Multimedia team to talk about the future of search and how the teams can work together on Structured Data on Commons.
  • The Search Platform team had their team offsite the two days after the All Hands meetings — lots of great conversations about the future of search were had.
  • Q3 Goals can be found here: [2]



  • Chris Schilling opened a Phab ticket (T185721 [3]) on the difficulties of searching in the Khmer script.[4] The Unicode encoding for the script uses many diacritics and (with proper font support) the same glyph can be properly written with the underlying Unicode characters in any of several different orders, which complicates searching. If you are interested in learning more—or if you have any experience with computing in Khmer—please check out the Phab ticket.
  • Erik made a new utility script that reads in the spark dataframe and emits binary xgboost datasets to hdfs, all in order to switch Mjolnir to file based training [5]
  • Gehel cleaned up multiple definitions of logstash endpoint in puppet / hiera so that almost all references to the logstash host are now consolidated in a single variable [6]
  • Stas added hidden status to category dumps, to be deployed on Feb 7 [7]


  • Chelsy finished up a draft analysis result on a MLR test on Hebrew wiki that is being reviewed by the Search Platform team [8][9]


  • Gehel took on the large-ish task of defining the constraints of the new WDQS cluster and getting that information on wiki. [10][11]
  • Stas added support for continuations to WDQS queries to Mediawiki API [12]

Did you know?[edit]

English has a very large vocabulary—possibly larger than any other language.[13] In part this is because English likes to "[pursue] other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary,"[14] and in part because of its history[15]—particularly the Norman conquest, which brought in a ruling class that spoke a dialect of Old French.

An interesting consequence of this history is that English has distinct words for the animal, "cow," and for the meat of that animal, "beef." Surprisingly, both come from the same Proto-Indo-European[16] root: "gʷṓws".[17] The word "cow" derives from Proto-Germanic "kūz," while "beef" was borrowed from Old French "boef", derived in turn from Latin "bōs". Both "kūz" and "bōs" come from PIE "gʷṓws," though they obviously followed very different sound change[18] paths along the way.