Deployment tooling/Notes/What does scap do

This documentation describes scap prior to it being ported to python.

Scap ("sync-common-all-php") is a collection of shell scripts used to publish code and configuration to the WMF production web servers.

scap
scap is the driver script for syncing the MW versions and configuration files currently staged on tin.equiad.wmnet to the rest of the MW servers in the production cluster.


 * Usage
 * scap [--versions= ] [ ]


 * 1) Acquire lock on
 * 2) Record start timestamp
 * 3) Ensure that   is available (needed for   to remote hosts)
 * 4) Check for command line flag to limit activities to a particular MW version
 * 5) Export   variable describing software versions to push with sync scripts. Either:
 * 6) * A specific version given with the  command line argument (eg 1.23wmf12)
 * 7) * The output of
 * 8) Lint files in $MW_COMMON_SOURCE/wmf-config and $MW_COMMON_SOURCE/multiversion
 * 9) Runs
 * 10) * copies files from tin.eqiad.wmnet:/usr/local/apache/common-local to tin.eqiad.wmnet:/a/common via rsync
 * 11) Runs
 * 12) Runs   to announce that scap is starting
 * 13) Runs   via   on scap-proxies group
 * 14) Randomizes list of hosts to update (All hosts listed in  )
 * 15) Runs   via
 * 16) Runs   via
 * 17) Runs
 * 18) Compute elapsed runtime
 * 19) Runs   to log runtime
 * 20) Runs   to log scap run completion
 * 21) Deletes temp files
 * 22) Releases lock on

sync-common
sync-common is really just an alias for scap-1 in shell script form.


 * 1) Runs

scap-1
scap-1 sets up the local host to receive files via rsync, chooses an rsync server to fetch files from and delegates to  to actually fetch the files.


 * 1) Sources
 * 2) If   directory is not found:
 * 3) * Creates  via
 * 4) If   directory is not found:
 * 5) * Creates  via
 * 6) Initialize   variable to first command line argument (could be empty string)
 * 7) Initialize   as an empty variable
 * 8) If   is not an empty string:
 * 9) * Set  via
 * 10) If   is still empty:
 * 11) * Set  to
 * 12) Run   as the user
 * 13) *  invocation
 * 14) Echo "Done"
 * 15) Exit 0

scap-2
scap-2 copies files from the  module of an rsync server to the   directory on the local host


 * Usage
 * scap-2 [ ]


 * 1) Sources
 * 2) Initialize   as
 * 3) If   is still empty:
 * 4) * Set  to
 * 5) Initialize   as an array containing
 * 6) If   is not an empty string:
 * 7) * Add  to   for each $v in
 * 8) * Add  to
 * 9) Echo that   is copying from
 * 10) Run

mw-update-l10n
mw-update-l10n generates l10n cdb files and exports their contents as a series of json files that have better rsync compression properties for transfer to cluster hosts.


 * Usage:
 * mw-update-l10n [--verbose]


 * 1) Sources
 * 2) Asserts that the local host is running some variant of linux
 * 3) Checks for a   command line argument and toggles off the   setting if present
 * 4) Sets   to the number of cores on the local host (includes hyperthreading cores)
 * 5) Sets   to   - 2
 * 6) Sets   to the output of
 * 7) * (eg :
 * 8) Split version string into   (eg 123.wmf11) and   (eg aawikibooks)
 * 9) If   is set and   isn't a version being synced: continue
 * 10) Make a new temp file and track as
 * 11) Run   for the wiki   outputting to
 * 12) Copy   to
 * 13) Copy   to   unless they are the same location
 * 14) Run   using   threads
 * 15) Run   using   threads

refreshCdbJsonFiles
refreshCdbJsonFiles generates JSON data files and MD5 checksums from CDB databases.


 * Usage
 * refreshCdbJsonFiles --directory  [--threads ]


 * 1) Validate command line arguments
 * 2) Create list of   files in target directory
 * 3) Split list in N parts (N == number of parallel threads requested)
 * 4) For each sublist of CDB files:
 * 5) Fork a child process
 * 6) For each file:
 * 7) Compute md5 checksum of file
 * 8) If md5(file) === last md5 recorded: continue
 * 9) Generate JSON file of key:value pairs found in CDB file to temporary file
 * 10) Write md5(file) to $file.MD5
 * 11) Move JSON temp file to $file.json
 * 12) Wait for children to finish
 * 13) Echo status message if any files were updated

scap-rebuild-cdbs
scap-rebuild-cdbs rebuilds l10n cache CDB database from JSON files


 * 1) Sources
 * 2) Sets   to the number of cores on the local host (includes hyperthreading cores)
 * 3) Sets   to   / 2
 * 4) Sets   to either   or the output of
 * 5) For each version in  :
 * 6) Run

mergeCdbFileUpdates
mergeCdbFileUpdates updates l10n CDB files from JSON data files


 * Usage
 * mergeCdbFileUpdates --directory  [--threads ] [--trustmtime]


 * 1) Validate command line arguments
 * 2) Create list of   files in target directory
 * 3) Split list in N parts (N == number of parallel threads requested)
 * 4) For each sublist of JSON files:
 * 5) Fork a child process
 * 6) For each file:
 * 7) Continue unless JSON newer than CDB / md5 checksums don't match
 * 8) Load JSON data from file
 * 9) Create a new CDB file with JSON key:value data
 * 10) Rename temporary CDB file over .cdb file
 * 11) Wait for children to finish
 * 12) Echo status message if any files were updated

sync-wikiversions
sync-wikiversions copies wikiversions files to hosts in the mediawiki-installation dsh group.


 * 1) Sources
 * 2) Ensure that   is available (needed for   to remote hosts)
 * 3) Run
 * 4) Ensure that   is available locally
 * 5) Run   via   on mediawiki-installation hosts
 * 6) Runs   to log completion
 * 7) Runs   to log sync-wikiversions completion

mw-deployment-vars.sh
mw-deployment-vars.sh is a puppet generated shell script that sets several MW related environment variables.

The values of these variables change based on the deployment system in use and the realm of the server. For the sake of this analysis we are only concerned with the values configured for the scap deployment system in the production realm.


 * MW_COMMON
 * varies by deployment system
 * scap:


 * MW_COMMON_SOURCE
 * varies by deployment system
 * scap:


 * MW_DBLISTS
 * varies by deployment system
 * scap:


 * MW_DBLISTS_SOURCE
 * varies by deployment system
 * scap:


 * MW_CRON_LOGS


 * MW_RSYNC_HOST
 * varies by realm
 * production:


 * MW_DSH_ARGS


 * MW_RSYNC_ARGS


 * MW_CARBON_HOST
 * varies by realm
 * production:


 * MW_CARBON_PORT

find-nearest-rsync
find-nearest-rsync is a perl script that attempts to determine the host with the lowest ICMP ping round trip time (rtt) from a given list of hosts.


 * Usage
 * find-nearest-rsync [--verbose] [ ...]

The host with the lowest rtt will be printed to stdout.

mwversionsinuse
mwversionsinuse is a shell script to call the local version of multiversion/activeMWVersions


 * 1) Sources
 * 2) Runs

dologmsg
dologmsg appends a message to an IRC buffer


 * Usage
 * dologmsg [MESSAGE]