Discovery/Status updates/2019-05-20

This is the weekly update for the week starting 2019-05-20

Highlights[edit]

Most of the team attended a three-day offsite in Prague last week, and Deb, Erik, Stas, and Trey also attended the Wikimedia Hackathon.

Discussions[edit]

Search[edit]

At the Hackathon, we hosted a session on "Advanced search syntax for newbies" [1]—and we had a few in-depth discussions with volunteers about search, our APIs, etc., and talked more in-depth about Arabic and Slovak.
- As a result of our discussion, Trey opened a ticket to investigate the effects of searching without diacritics in Slovak. [2]
Trey completed a change to Arabic-language completion suggester (upper left search box) to make Eastern Arabic Numerals and Western Arabic Numerals equivalent. [3] It will still take a little while for the change to be seen on-wiki.
Stas made a set of preliminary patches to convert CirrusSearch extension to extension.json registration (merged) and final conversion patch still in review [4]
David worked on several tasks to create a fallback method based on a generic index [5]; making fallback methods configurable [6]; and allowing the FallbackMethod to create their own SearchQuery [7]
We noticed that multiple Elasticsearch nodes were getting overloaded in eqiad in April - Erik patched it and found a few things that might have caused the issues [8]
When enabling cross cluster search to support multi-instance we had to run custom scripts to update cluster settings -- and discovered that the puppet repo was not aware of this; it's fixed now [9]
Erik did a smorgasbord of fixes: "missing replica" error messages in production logs was fixed by uniquely identify connections in connection pool [10]; create archive indices and delete archive docs from general indices and to ignore ancient logging rows with log_page = null [11]; fixed a condition where we received a cirrusSearchElasticaWrite job for an unwritable cluster cloudelastic [12]; and documented the CirrusSearch schema [13].
During the Hackathon, Erik also exposed CloudElastic to the WMF Cloud [14]

Wikidata Query Service[edit]

At the Hackathon, with the help of Krinkle, the bug with URL shortener widget being hard to use was fixed [15]
WDQS bug with label service clauses nested in subqueries being processed incorrectly was fixed [16]
Stas fixed breakage in LDF server JSON-LD format [17]

Did you know?[edit]

Naming Things is Hard, Volume 187: The Phab ticket mentioned above to equate different numeral systems for Arabic-language wikis uses the names Eastern Arabic Numerals (١٢٣...) and Western Arabic Numerals (123…). In English, the numerals we usually use (123...) are often called “Arabic numerals” [18] because they came to Europe from Arabic sources. In Arabic, the Eastern Arabic Numerals are called “Indian numerals” [19] because they came from Indian sources. In English, “Indian numerals” refer to the numerals used in India (१२३...) but they are just called “Devanagari numerals” in Hindi, for example. [20] Some have tried to make the subtle distinction in English that “arabic numerals” are the numerals that came from Arabic sources (123...), while “Arabic numerals” are the ones that are used by Arabic speakers (١٢٣...).

It’s also interesting to look at a table of the various related numeral systems [21] and see the similarities and “false friends”—note that your fonts may vary: Devanagari 7 looks like a 6 (“७”), Arabic 6 looks like a 7 (“٦”), Gujarati 5 looks like a 4 (“૫”), Bengali 4 looks like an 8 (“৪”), Gurmukhi 1 looks like a 9 (“੧”), etc. But any of those systems are MMMDCCXXIV times better than Roman numerals! [22]

--

View all open tickets related to Discovery.
Looking to get involved? See tasks marked as Easy or volunteer needed