Topic on Manual talk:DeleteOldRevisions.php

Option to keep certain number of revisions

3
DikkieDick (talkcontribs)

Hi,

I'm a bit digging and our wiki is having pages with a huge number of revisions. But I don't want to remove all revisions (not needed to keep everything). What I would like is an option to keep a certain amount of revisions, given as a parameter f.e. 5. So when deleting revisions from the revision-table the number of revisions for a certain page should be taken into account. If a page has 5 or less revisions none will be removed. If a page has more than 5 revisions, all older revisions will be removed except the most recent 5. I've copied DeleteOldRevisions.php to DeleteOldRevisions_Keep.php and am working on modifying it, but it's a touch job so it seems.

I'm progressing: the query

mysql> select rev_id from revision where rev_page=5591 order by rev_id desc limit 5;

+--------+

| rev_id |

+--------+ |  37402 | |  37401 | |  37400 | |  37399 | |  37398 |

+--------+

5 rows in set (0.00 sec)

mysql> select rev_id from revision where rev_page=5592 order by rev_id desc limit 5;

+--------+ | rev_id |

+--------+ |  37295 | |  37294 | |  37293 |

+--------+

3 rows in set (0.00 sec)

rev_page 5591 has 27 revisions and rev_page 5592 has 3 revisions.

Now I was wondering what will happen if I undo the latest revision for page 5591 to revert it back to 37401. Fortunately this gives me a new revision 37406, which gives me the clue that I can use above query to clean up everything except for the latest 5 revisions.

DikkieDick (talkcontribs)

After some testing it's finished:

Code:

<?php

/**

 * Delete old revisions from the database and keep the latest 'N' revisions (default 10)

 *

 * This program is free software; you can redistribute it and/or modify

 * it under the terms of the GNU General Public License as published by

 * the Free Software Foundation; either version 2 of the License, or

 * (at your option) any later version.

 *

 * This program is distributed in the hope that it will be useful,

 * but WITHOUT ANY WARRANTY; without even the implied warranty of

 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

 * GNU General Public License for more details.

 *

 * You should have received a copy of the GNU General Public License along

 * with this program; if not, write to the Free Software Foundation, Inc.,

 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

 * http://www.gnu.org/copyleft/gpl.html

 *

 * @file

 * @ingroup Maintenance

 * @author Dick Pluim <dick.pluim@gmail.com>

 * (Based on deleteOldRevisions.php by Rob Church)

 */

require_once __DIR__ . '/Maintenance.php';

/**

 * Maintenance script that deletes old revisions from the database and keep the latest 'N' revisions (default 10).

 *

 * @ingroup Maintenance

 */

class DeleteOldRevisions extends Maintenance {

        public function __construct() {

                parent::__construct();

                $this->addDescription( 'Delete old revisions from the database and keep the latest N revisions (default 10)' );

                $this->addOption( 'delete', 'Actually perform the deletion' );

                $this->addOption( 'page_id', 'List of page ids to work on', false );

        }

        public function execute() {

                $this->output( "Delete old revisions\n\n" );

                $this->doDelete( $this->hasOption( 'delete' ), $this->mArgs );

        }

        function doDelete( $delete = false, $args = [] ) {

                # Data should come off the master, wrapped in a transaction

                $dbw = $this->getDB( DB_MASTER );

                $this->beginTransaction( $dbw, __METHOD__ );

                $revConds = "";

                $keepRevs = [];

                $keepLimit = 10; # default

                # If a parameter is given, we assume that this is the number of revisions to keep.

                # only first argument is being used.

                if ( count( $args ) > 0 ) {

                        $keepLimit=$args[0];

                        $this->output( "Keeping " . $keepLimit . " revisions\n" );

                }

                # make the pagelist

                $res = $dbw->select( 'page', 'page_id', 'page_id>0', array( 'ORDER BY' => 'page_id ASC' ));

                foreach ( $res as $row ) {

                          $revConds = "rev_page = $row->page_id order by rev_id desc limit $keepLimit" ;

                          # make the list of revisions we want to keep for this page

                          $res2 = $dbw->select ( 'revision', 'rev_id' , $revConds, __METHOD__);

                          foreach ( $res2 as $row2 ) {

                                  $keepRevs[] = $row2->rev_id ;

                          }

                }

                # Make the list of revisions which will be deleted

                $revConds = 'rev_id NOT IN (' . $dbw->makeList( $keepRevs ) . ')';

                $res = $dbw->select( 'revision', 'rev_id', $revConds, __METHOD__ );

                $oldRevs = [];

                foreach ( $res as $row ) {

                        $oldRevs[] = $row->rev_id;

                }

                $this->output( "done.\n" );

                # Inform the user of what we're going to do

                $count = count( $oldRevs );

                $this->output( "$count old revisions found.\n" );

                # Delete as appropriate

                if ( $delete && $count>0 ) {

                        $this->output( "Deleting..." );

                        $dbw->delete( 'revision', [ 'rev_id' => $oldRevs ], __METHOD__ );

                        $this->output( "done.\n" );

                }

                # This bit's done

                # Purge redundant text records

                $this->commitTransaction( $dbw, __METHOD__ );

                if ( $delete ) {

                        $this->purgeRedundantText( true );

                }

        }

}

$maintClass = "DeleteOldRevisions";

require_once RUN_MAINTENANCE_IF_MAIN;

--------------

Output:

[root@server maintenance]# php deleteOldRevisions_keep.php

Delete old revisions

Keeping 10 revisions

PHP Notice:  Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808

done.

2534 old revisions found.

[root@server maintenance]# php deleteOldRevisions_keep.php 5

Delete old revisions

Keeping 5 revisions

PHP Notice:  Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808

done.

6103 old revisions found.

[root@server maintenance]# php deleteOldRevisions_keep.php --delete 15

Delete old revisions

Keeping 15 revisions

PHP Notice:  Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808

done.

2026 old revisions found.

Deleting...done.

Searching for active text records in revisions table...done.

Searching for active text records in archive table...done.

Searching for inactive text records...done.

2024 inactive items found.

Deleting...done.

Tested with first 50 and then going slightly further down... ;-)

Can't figure out why I get the PHP Notice above. And there is sometimes a mismatch between old revisions found and inactive items found, but it's working in my test-environment.

Running it a second time:

[root@server maintenance]# php deleteOldRevisions_keep.php --delete 15

Delete old revisions

Keeping 15 revisions

PHP Notice:  Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808

done.

0 old revisions found.

Searching for active text records in revisions table...done.

Searching for active text records in archive table...done.

Searching for inactive text records...done.

0 inactive items found.

2001:16B8:10D2:A900:497:48D0:EC41:2B7B (talkcontribs)

Hi Dick,

your option is a great addition to the script! It would be great, if you could create an issue in phabricator and put it into review so that it can be added to the MediaWiki tarball so that everyone can benefit from it!

Reply to "Option to keep certain number of revisions"