Wikimedia Discovery/Meetings/Search retrospective 2016-07-06

Format
This retrospective was conducted using the "Five finger retrospective" format.

Action Items from last month

 * Chris: post the link to his "what technical collaboration team does" presentation
 * Chris: Chris and Erik should talk about implications of interwiki search indices
 * Trey & Deb: Chris needs to be aware of the ? at the end of queries
 * ✅, and material has been posted on several village pumps
 * https://meta.wikimedia.org/wiki/Discovery/Handling_question_marks_in_search_queries & talk page
 * Erik: Figure out a plan to reliably monitor github (David and Guillaume have started to watch it)
 * ✅, decided that volume is low enough that watching projects and getting emails from github is sufficient
 * Got Discovery members admin rights on github projects
 * ✅, decided that volume is low enough that watching projects and getting emails from github is sufficient
 * Got Discovery members admin rights on github projects

What happened since the last retro (June 1)

 * new elasticsearch servers
 * Change in product ownership for Q1: Dan -> Deb
 * Wikimania and chatting with community and others about what the Discovery team is doing with search
 * Job offer extended and accepted for new analyst, starting around the end of July \o/^_^+1
 * Dan's vacation / Deb did a great job filling in

Thumb: Thumbs up--something that went well

 * Chris, Deb, and Trey chatting about question-mark handling +1+1
 * Fixes & improvements & relaunch of the TextCat A/B Test
 * Elasticsearch servers are SOO easy to install (if you don't count the required cluster restarts)+1
 * Deb did a great job filling in while Dan was out +1
 * phan is proving a usefull addition to CI testing+1+1
 * phan is a static php code analyzer - https://github.com/etsy/phan
 * This is a Discovery initiative; would like to spread to other groups over time

Index finger: The ONE thing you want people to know (about how this team has functioned over the last month)

 * somehow we partially own the production logging infrastructure (by being elasticsearch "experts") +1 (Guillaume get a quite a few questions on logstash, where I have no idea...)
 * was this "somehow ownership" transferred from Bryan Davis's "somehow ownership" of it previously? :-)
 * Questions for the future: Who will be responsible for new hardware? Should we become the official owners?

Middle finger: Something that did not go well

 * Issue that affects the elasticsearch cluster (being discussed here: https://github.com/elastic/elasticsearch/issues/19187 )
 * Generates a ton of logs; fills the disk
 * Might be fallout from upgrading the clusters
 * Maintaining the swift repo plugin is hard because we don't use it (https://github.com/wikimedia/search-repository-swift )
 * David has spent days trying to fix it when broken
 * We should look for a new maintainer--maybe add a disclaimer in the README
 * Initial run of the TextCat A/B Test +1 (alas)
 * After a strong analysis, the data we were collecting were unreliable ("visit pages" were completely wrong)
 * Contributing factor: No automated tests for the logging code
 * Contributing factor: No front-end engineer, so not expert in browser-specific issues
 * Contributing factor: More than 20 ways to perform a search; complex code
 * Related factor: We already knew there was a mismatch in counts--this forced us to diagnose and fix it
 * Maps has a tendency to absorb a lot of my (Guillaume's) time. Prioritization needs to happen between different sub teams. Not sure how to make that happen.
 * If you need more of Guillaume, let him know, and he can try to reallocate his time.
 * Could shift more coding to developers, and leave Guillaume to review/finalize
 * If you see something is stuck, let him know. If he doesn't hear anything, he'll assume things are ok.
 * cindy (automated tests) has started acting up again, after the last round of fixes had it working well for a month or so+1
 * Mysterious errors; very common on local vagrant instance
 * Is integration testing worthwhile, from a cost/benefit basis?
 * "I can't live without Cindy now" +1
 * Some cindy errors are now being caught by phan
 * Other team runs tests as part of jenkins; we don't, partly because of the elastic dependency

Ring finger: Something about relationships--within the team, between teams, other

 * working with mobile team to implement geo features
 * seems to be going well so far
 * Thanks to Erik for answering all of Deb's 'newbie' search and overall team work questions
 * Weekly video chats with David, Erik, and Trey have been both productive and good "almost" face time
 * Thanks to Guillaume for answering "newbie" questions about web requests and caching (great help during the whole thing with Legal) - It was fun, I learned a lot as well!
 * Marcus Kroetsch (with Technical University of Dresden) is about to run research for WDQS usage
 * http://korrekt.org/
 * Stas will get help from Mikhail to anonymize query data before we hand it over
 * https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
 * Had very interesting talk with Fabian Suchanek from YAGO  (Yet Another Great Ontology) http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
 * Potential future collaborations
 * Guillaume working hard on getting integrated in the Ops team. Thanks for your support in that.
 * Seems to be going well
 * I am trying to spend more time on "real" Ops stuff (clinic duty, looking a mediawiki servers, ...)

Pinky: A little thing that would be easy to overlook

 * The "wrong keyboard" analysis turned up a *lot* of "Latin Russian" in ruwiki. There's a lot there (maybe 1% of queries) that could be improved.
 * Amir Aharoni (from Language Team in Editing) mentioned a gadget on the Hebrew Wikipedia that attempts to automatically correct for "Latin Hebrew" -> https://he.wikipedia.org/wiki/%D7%9E%D7%93%D7%99%D7%94_%D7%95%D7%99%D7%A7%D7%99:Gadget-Dwim.js
 * To test it out, go to https://he.wikimedia.org and type "trnhev" (without the quotes) into the search box; that's "Latin Hebrew" for America. You'll see it corrects what you've written into America in Hebrew!

Action items

 * David: Look into getting out of maintaining the swift plugin
 * Deb look at prioritising/defining the "Latin Russian/Latin Hebrew" problem? - https://phabricator.wikimedia.org/T138958
 * Resolved: put in the "This Quarter" column on the Discovery Search backlog
 * Kevin: Send reminder one day before next retro... except not to Guillaume? ;-) [He prefers to respond "in the moment"]