Wikimedia Discovery/Meetings/Search retrospective 2016-08-19

Format
This retrospective was conducted using the "Five finger retrospective" format: https://www.mediawiki.org/wiki/Team_Practices_Group/Five_finger_retrospective

Action Items from last month

 * Chris: post the link to his "what technical collaboration team does" presentation
 * Chris: Chris and Erik should talk about implications of interwiki search indices
 * Trey & Deb: Chris needs to be aware of the ? at the end of queries
 * ✅, and material has been posted on several village pumps
 * https://meta.wikimedia.org/wiki/Discovery/Handling_question_marks_in_search_queries & talk page
 * Erik: Figure out a plan to reliably monitor github (David and Guillaume have started to watch it)
 * ✅, decided that volume is low enough that watching projects and getting emails from github is sufficient
 * Got Discovery members admin rights on github projects
 * ✅, decided that volume is low enough that watching projects and getting emails from github is sufficient
 * Got Discovery members admin rights on github projects

What happened since the last retro (July 6)

 * Deployed "? stripping" in queries
 * Setup relforge cluster
 * Deployed textcat
 * Elasticsearch upgrade to 2.3.4 in progress
 * Discovery got a new analyst which will help a lot going forward, especially if we start building that probabilistic bot classifier thingy :P
 * Deployed logstash/kibana upgrade
 * Completed refactoring of the search fields
 * WDQS servers for codfw approved and on their way
 * Searcher class in cirrus is now < 1000 lines +1+1:D
 * Did research on the top 100-ish unsuccessful queries and decided not to go further with it due to lack of interesting data
 * Analysis of ascii folding and stemming

Thumb: Thumbs up--something that went well

 * We seem to have proven to many people's satisfaction that zero-results queries are not a good place to mine articles and redirects.
 * Trey's "?" blog post!
 * Blog post on textcat deployment+1
 * Addressed some technical debt and made code look saner+1
 * Trey's analysis of providing a list of search queries and the communication of the results
 * RelForge cluster has come into existence! And David has been able to index lots of data (enwiki & frwiki in two different ways!)

Index finger: The ONE thing you want people to know (about how this team has functioned over the last month)

 * (Things have generally been running smoothly. It doesn't feel like any ONE thing stands out.)+1+1

Middle finger: Something that did not go well

 * elasticsearch not stable
 * 2 major issues in the last month and a half. One is a mystery; one we understand and have some ideas for fixes
 * logstash upgrade delayed multiple times due to lack of preparation / thoroughness
 * seemingly not enough time in the day +1+1
 * Just a lot going on; everything takes time. Last couple weeks have been atypically busy.
 * KH: Try not to put in extra hours, generally. Time-sensitive occasional things are understandable.
 * internet connections have been a bit weird / dropping at inconvinient times ( I spellz gud ) +1 (hangouts have been dodgy)
 * cindy (automated browser testing) was acting up again, and we're still not sure why or have final fix
 * Recurring item. Do we want to think about shifting testing to unit test level? Or to php level?
 * We really do need this to work; feel much more comfortable merging when automated tests are working
 * Devs should work w/PO to make sure some time gets allocated to work on this

Ring finger: Something about relationships--within the team, between teams, other

 * Good communication about "search across projects and across languages"
 * Trey says: working with Deb & Chris to get blog posts out about developments has been great. Thanks to Deb for driving the process! +1(yay, thanks!)
 * Doing some good work with Graphs team to make visualizations easier (e.g. integrating w/WDQS)
 * Guillaume still split between multiple sub teams, no one is complaining...(I feel ya!)+1 - he's doing great!
 * Seems to be improving since the last retrospective
 * We have multiple sub-teams that are fairly independent

Pinky: A little thing that would be easy to overlook (or was overlooked)

 * Elasticsearch garden is not cultivated as much as it should (T109089) - for example: the multiple alerts when cluster is failing was there for a fairly long time, but we had that spam again today
 * Similar issue with maps: Small issues that are not critical; only get attention when they break. Could do better with that.
 * There are a lot of little issues, so it makes sense to prioritize them
 * Do you (GL) have knowledge/support to be able to prioritize your work?
 * GL: Would be interested in participating in a planning session
 * Some work ongoing with a new recommendation system that may need some help from cirrus developers (https://phabricator.wikimedia.org/T143197 )
 * Offline article recommendation system (similar to "more like")
 * Some help needed to catch obvious problems with bm25
 * DC: Have created a place to test enwiki on BM25. (http://en-wp-bm25-relforge.wmflabs.org/wiki/Special:Search )
 * Stas working on upcoming lectures / demos
 * Internal talk about SPARQL and WDQS, mostly technical, partly aimed at analysis folks
 * Wiki conference San Diego: Less technical audience

Action items

 * Kevin: Invite GL to search planning meeting(s)
 * David: Will send email to private list requesting BM25 testing; later to public list (and Discovery weekly status)
 * Erik: work w/PO to make sure some time gets allocated to work on cindy problems
 * Erik: work w/PO to make sure some time gets allocated to work on cindy problems
 * Erik: work w/PO to make sure some time gets allocated to work on cindy problems