Wikimedia Platform Engineering/MediaWiki Core Team/Check-ins/20140324

From mediawiki.org

who: Brad, Antoine, Bryan, Nik, Chad, Dan, Chris, Greg, RobLa, Tim regrets: Ori

Scrum of Scrums[edit]

CI server access/debugging[edit]

Faidon is suggesting that we come up with a backup for Antoine when it comes to generally doing CI server maintenance, even in advance of hiring a Release Engineer. Thoughts?

Status on debugging current problem? https://bugzilla.wikimedia.org/show_bug.cgi?id=62623 Faidon helped out with core file generation. The core file is available on tin.eqiad.wmnet:/home/hashar/bug62623.core Produced by https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-databaseless/22671/console


BFD: Warning: /home/hashar/bug62623.core is truncated: expected core file size >= 651972608, found: 65609728. [New LWP 19784] Cannot access memory at address 0x7fc3bbec72a8 Cannot access memory at address 0x7fc3bbec72a0 (gdb) bt

  1. 0 0x00000000006b9e9f in zend_mm_remove_from_free_list (heap=0x157eb20, mm_block=0x15e35ad0)
   at /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c:889

Cannot access memory at address 0x7fffff9ae118


Second backtrace with a full core file:

Program terminated with signal 11, Segmentation fault.

  1. 0 0x00000000006b9e9f in zend_mm_remove_from_free_list (heap=0x12e0b20,

mm_block=0x15beb618) at /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c:889 889 /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c: No such file or directory.


  1. 0 0x00000000006b9e9f in zend_mm_remove_from_free_list (heap=0x12e0b20,

mm_block=0x15beb618) at /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c:889

  1. 1 0x00000000006ba985 in _zend_mm_free_canary_int (heap=0x12e0b20,

p=0x15beb5a8) at /tmp/buildd/php5-5.3.10/Zend/zend_alloc_canary.c:2122

  1. 2 0x00000000006a83c6 in zend_hash_destroy (ht=0x15beb2e0) at

/tmp/buildd/php5-5.3.10/Zend/zend_hash.c:734

  1. 3 0x00000000006bbb79 in zend_object_std_dtor (object=0x15bb2d78) at

/tmp/buildd/php5-5.3.10/Zend/zend_objects.c:45

  1. 4 0x00000000006bbb99 in zend_objects_free_object_storage

(object=0x15bb2d78) at /tmp/buildd/php5-5.3.10/Zend/zend_objects.c:126

  1. 5 0x00000000006bf6ef in zend_objects_store_free_object_storage

(objects=0xdd52d8) at /tmp/buildd/php5-5.3.10/Zend/zend_objects_API.c:92

  1. 6 0x000000000068c6a3 in shutdown_executor () at

/tmp/buildd/php5-5.3.10/Zend/zend_execute_API.c:304

  1. 7 0x000000000069a775 in zend_deactivate () at

/tmp/buildd/php5-5.3.10/Zend/zend.c:963

  1. 8 0x00000000006474c0 in php_request_shutdown (dummy=0x12e0b20) at

/tmp/buildd/php5-5.3.10/main/main.c:1664

  1. 9 0x000000000042baa5 in main (argc=32767, argv=0x7fff166f1533) at

/tmp/buildd/php5-5.3.10/sapi/cli/php_cli.c:1367

Core file is available at tin.eqiad.wmnet:/home/hashar/bug62623-2.core


Quarterly Review[edit]

Spreadsheet of DOOM! https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Agte_lJNpi-OdEZzMm5EMmVaZTVwZl9Hb1RKQnlZN3c&usp=drive_web#gid=0

  • Config management
  • Revision store
  • Identity for SOA (Kerberos?)
  • Image scalers

Performance[edit]

  • Bug 62768: CirrusSearch should embed performance data in the page output
    • Implemented by Chad. Ori to set up data collection. WIP patch.
  • Multimedia team collecting network performance data for image loading in production.
  • Brandon intends to deploy GeoIP cookie to production this afternoon.
  • Ori meeting w/Timo and Roan this Friday to write out frontend performance workshop for Zurich.

HHVM[edit]

  • Patches to review:
    • HHVM support for FastStringSearch
    • HHVM handler for fatals (replacing wmerrors extension)
    • HHVM support for wikidiff2
  • Tim’s first patch is merged internally at Facebook. Now waiting on review of the second patch. HNI work almost ready to submit (~2 days).
  • New task: need to research garbage-collection for long-running scripts (bug 62768)
  • Faidon will try to join Wednesday meetings (we should continue to have them) and/or find an additional ops person to act as contact point for HHVM work.
  • If we stick with hhvm (or before picking it up again), setting up a meeting w/ Facebook folks (Sara, Paul, etc) could be useful

Deployment tooling / RelEng[edit]

  • Lots of progress on beta migration to eqiad
    • 26 instances built; all using local puppet and salt masters
    • Still waiting on Sean to build db cluster
    • deployment-elastic04 not joining cluster for some currently unknown reason


Search[edit]

SecurePoll[edit]

On hold. Reviewed TitleValue

ContactForm for trademark[edit]

Brad will set up test env for test

Security[edit]

  • training, yay!
  • no new bugs..

Postmortem review[edit]

  • Bug 62615 - Easy DOS vectors in the API

Poolcounter:

  • Improve poolcounter extension error messages. Some context would be helpful, like poolcounter server contacted, pool context, URL. And perhaps error messages even if only in english (as opposed to what's displayed to the user)

Search:

  • We're going to monitor the slow query log and have icinga start complaining if it grows very quickly. We normally get a couple of slow queries per day so this shouldn't be too noisy. We're going to also have to monitor error counts, especially once we get more timeouts.
https://tendril.wikimedia.org/ for database is nice. Might integrate ES logs in it.

Deploy:

  • investigate Potential scap bug with the change of mw versions
    • Why didn't puppet pull in the latest versions of deployed mw?
    • see also: the eventual consistency requirement for deployment tooling
    • NEEDS BUG
  • make scap report rsync errors


Hiring[edit]

[stuff]

Liaisons[edit]

  • Flow (Chad)
  • Wikidata (Nik)
    • Access requests for wmde


Review needed[edit]

  • Structured logging RFC implementation
    • Add Composer managed libraries: Ie667944
    • Add a PSR-3 based logging interface: I5c82299
    • Enable MWLogger logging for legacy logging methods: I1e5596d
  • * Enable MWLogger logging for wfLogProfilingData: Iae11e1e