User:SBailey (WMF)

Role at Wikimedia Foundation: Hopes for Wikimedia platform futures: Areas of professional interest: Experience: Hobbies: Pets: Residence:
 * parsoid developer
 * help gather requirements, design and create a way to allow editors to author interactive models and simulations in the mediawiki platform
 * empowering non-programmers to create programs
 * programming language design
 * electronics and embedded systems
 * mechatronics, robotics and automation
 * agriculture automation systems for aquaponics and other farming techniques
 * cross-platform, distributed computing, functional array programming language and platform
 * spreadsheet cell engine internals developer (Lotus 123 and Borland Quattro Pro)
 * math education platform and geometry manipulation learning environment for IBM
 * X86 CPU test generating and cosimulation control C interpreter language for Transmeta
 * graphics primitive libraries
 * embedded systems development of hardware, firmware and cloud control processes
 * building and flying drones
 * fabricating geodesic domes (built a 3V 5/8 26 foot dome and cover for burning man)
 * Machining and welding custom electric assisted pedal vehicles (for burning man)
 * building custom PA systems for outdoor performances (for burning man)
 * various PC and playstation video games
 * 19 year old grey cat (Greybie - Grey Beast)
 * New chocolate lab puppy (Bodhi)
 * modest home on 40 acres in Fort Jones California, 26 acres of which is currently a wheat field

-- Important considerations for wikimedia development:

For wikimedia maintenance scripts which need to operate on whole tables, the use of batching is essential and a recommended mething is the Keyset Pagination method, not the LIMIT OFFSET method.

The LIMIT OFFSET approach slows query down when a size of the data is very large because it requires the database read through the records until it reaches the offset #, which cannot use an index or just skip ahead a certain number of records as deleted records will not count as part of the offset. The approach which can use an index to aid in batching efficiently is called Keyset pagination. It requires a unique id in your query, which you can use as a bookmark to point to the last row of the previous page. The next page is fetched using the last bookmark which uses the indexed key field as the start point. For instance:

$lastID = 0; $batchSize = 1000;

SELECT linter_id, linter_params

FROM linter

WHERE linter_id > $lastID

ORDER BY linter_id ASC LIMIT $batchSize;

Which when rendered using the wikimedia database abstraction class is:

$queryLinterTable	->table( 'linter' )

->fields( [ 'linter_id', 'linter_params' ] )

->where( [ 'linter_id > ' . $lastElement ] )

->orderBy( 'linter_id', selectQueryBuilder::SORT_ASC )

->limit( $batchSize )

->caller( __METHOD__ ); $results = $queryLinterTable->fetchResultSet;

foreach ( $results as $row ) {

$linter_id = intval( $row->linter_id );

$lastElement = $linter_id;

// do the maintenance logic and update on each row

}

If the result set above returns the last row with linter_id as 12345, you can use it to fetch the next page which will then use the $lastElement value of 12345:

SELECT linter_id, linter_params

FROM linter

WHERE user_id > 12345

ORDER BY user_id ASC

LIMIT 1000;

This code section will live inside an enclosing loop to process each batch and probably sleep a second between batches and call $lbFactory->waitForReplication; to allow the table updates to be propagated to the database replicas.