Wikimedia Platform Engineering/MediaWiki Core Team/Check-ins/20140324

who: Brad, Antoine, Bryan, Nik, Chad, Dan, Chris, Greg, RobLa, Tim regrets: Ori

Scrum of Scrums

 * Please review: https://gerrit.wikimedia.org/r/#/c/113525/ (for https://bugzilla.wikimedia.org/26122 )
 * Chad will review

CI server access/debugging
Faidon is suggesting that we come up with a backup for Antoine when it comes to generally doing CI server maintenance, even in advance of hiring a Release Engineer. Thoughts?

Status on debugging current problem? https://bugzilla.wikimedia.org/show_bug.cgi?id=62623 Faidon helped out with core file generation. The core file is available on tin.eqiad.wmnet:/home/hashar/bug62623.core Produced by https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-databaseless/22671/console

BFD: Warning: /home/hashar/bug62623.core is truncated: expected core file size >= 651972608, found: 65609728. [New LWP 19784] Cannot access memory at address 0x7fc3bbec72a8 Cannot access memory at address 0x7fc3bbec72a0 (gdb) bt   at /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c:889 Cannot access memory at address 0x7fffff9ae118
 * 1) 0 0x00000000006b9e9f in zend_mm_remove_from_free_list (heap=0x157eb20, mm_block=0x15e35ad0)

Second backtrace with a full core file:

Program terminated with signal 11, Segmentation fault. mm_block=0x15beb618) at /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c:889 889   /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c: No such file or directory.
 * 1) 0 0x00000000006b9e9f in zend_mm_remove_from_free_list (heap=0x12e0b20,

mm_block=0x15beb618) at /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c:889 p=0x15beb5a8) at /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c:2122 /tmp/buildd/php5-5.3.10/Zend/zend_hash.c:734 /tmp/buildd/php5-5.3.10/Zend/zend_objects.c:45 (object=0x15bb2d78) at /tmp/buildd/php5-5.3.10/Zend/zend_objects.c:126 (objects=0xdd52d8) at /tmp/buildd/php5-5.3.10/Zend/zend_objects_API.c:92 /tmp/buildd/php5-5.3.10/Zend/zend_execute_API.c:304 /tmp/buildd/php5-5.3.10/Zend/zend.c:963 /tmp/buildd/php5-5.3.10/main/main.c:1664 /tmp/buildd/php5-5.3.10/sapi/cli/php_cli.c:1367
 * 1) 0 0x00000000006b9e9f in zend_mm_remove_from_free_list (heap=0x12e0b20,
 * 1) 1 0x00000000006ba985 in _zend_mm_free_canary_int (heap=0x12e0b20,
 * 1) 2 0x00000000006a83c6 in zend_hash_destroy (ht=0x15beb2e0) at
 * 1) 3 0x00000000006bbb79 in zend_object_std_dtor (object=0x15bb2d78) at
 * 1) 4 0x00000000006bbb99 in zend_objects_free_object_storage
 * 1) 5 0x00000000006bf6ef in zend_objects_store_free_object_storage
 * 1) 6 0x000000000068c6a3 in shutdown_executor  at
 * 1) 7 0x000000000069a775 in zend_deactivate  at
 * 1) 8 0x00000000006474c0 in php_request_shutdown (dummy=0x12e0b20) at
 * 1) 9 0x000000000042baa5 in main (argc=32767, argv=0x7fff166f1533) at

Core file is available at tin.eqiad.wmnet:/home/hashar/bug62623-2.core

Quarterly Review
Spreadsheet of DOOM! https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Agte_lJNpi-OdEZzMm5EMmVaZTVwZl9Hb1RKQnlZN3c&usp=drive_web#gid=0


 * Config management
 * Revision store
 * Identity for SOA (Kerberos?)
 * Image scalers

Performance

 * Bug 62768: CirrusSearch should embed performance data in the page output
 * Implemented by Chad. Ori to set up data collection. WIP patch.
 * Multimedia team collecting network performance data for image loading in production.
 * Brandon intends to deploy GeoIP cookie to production this afternoon.
 * Ori meeting w/Timo and Roan this Friday to write out frontend performance workshop for Zurich.

HHVM

 * Patches to review:
 * HHVM support for FastStringSearch
 * HHVM handler for fatals (replacing wmerrors extension)
 * HHVM support for wikidiff2
 * Tim’s first patch is merged internally at Facebook. Now waiting on review of the second patch. HNI work almost ready to submit (~2 days).
 * New task: need to research garbage-collection for long-running scripts (bug 62768)
 * Faidon will try to join Wednesday meetings (we should continue to have them) and/or find an additional ops person to act as contact point for HHVM work.
 * If we stick with hhvm (or before picking it up again), setting up a meeting w/ Facebook folks (Sara, Paul, etc) could be useful

Deployment tooling / RelEng

 * Lots of progress on beta migration to eqiad
 * 26 instances built; all using local puppet and salt masters
 * Still waiting on Sean to build db cluster
 * deployment-elastic04 not joining cluster for some currently unknown reason

Search

 * Kinda quiet - Nik working on highlighter. Chad fixing interwiki search results
 * Window next week: Wednesday, April 2. Group 1 wikis go primary
 * Wikimania 2014 talk submitted: https://wikimania2014.wikimedia.org/wiki/Submissions/CirrusSearch:_How_we%27ve_replaced_a_great_search_engine_with_an_awesome_search_engine

SecurePoll
On hold. Reviewed TitleValue

ContactForm for trademark
Brad will set up test env for test

Security

 * training, yay!
 * no new bugs..

Postmortem review

 * Bug 62615 - Easy DOS vectors in the API

Poolcounter:
 * Improve poolcounter extension error messages. Some context would be helpful, like poolcounter server contacted, pool context, URL. And perhaps error messages even if only in english (as opposed to what's displayed to the user)
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=63027

Search:
 * We're going to monitor the slow query log and have icinga start complaining if it grows very quickly. We normally get a couple of slow queries per day so this shouldn't be too noisy.  We're going to also have to monitor error counts, especially once we get more timeouts.
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=62077

https://tendril.wikimedia.org/ for database is nice. Might integrate ES logs in it.

Deploy:
 * investigate Potential scap bug with the change of mw versions
 * Why didn't puppet pull in the latest versions of deployed mw?
 * see also: the eventual consistency requirement for deployment tooling
 * NEEDS BUG
 * make scap report rsync errors
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=62862

Hiring
[stuff]

Liaisons

 * Flow (Chad)
 * Wikidata (Nik)
 * Access requests for wmde

Review needed

 * Structured logging RFC implementation
 * Add Composer managed libraries: Ie667944
 * Add a PSR-3 based logging interface: I5c82299
 * Enable MWLogger logging for legacy logging methods: I1e5596d
 * * Enable MWLogger logging for wfLogProfilingData: Iae11e1e