Jump to content

Manual talk:DeleteOldRevisions.php

Add topic
From mediawiki.org

Option to keep certain number of revisions

[edit]

Hi,

I'm a bit digging and our wiki is having pages with a huge number of revisions. But I don't want to remove all revisions (not needed to keep everything). What I would like is an option to keep a certain amount of revisions, given as a parameter f.e. 5. So when deleting revisions from the revision-table the number of revisions for a certain page should be taken into account. If a page has 5 or less revisions none will be removed. If a page has more than 5 revisions, all older revisions will be removed except the most recent 5. I've copied DeleteOldRevisions.php to DeleteOldRevisions_Keep.php and am working on modifying it, but it's a touch job so it seems.

I'm progressing: the query

mysql> select rev_id from revision where rev_page=5591 order by rev_id desc limit 5;

+--------+

| rev_id |

+--------+

|  37402 |

|  37401 |

|  37400 |

|  37399 |

|  37398 |

+--------+

5 rows in set (0.00 sec)

mysql> select rev_id from revision where rev_page=5592 order by rev_id desc limit 5;

+--------+

| rev_id |

+--------+

|  37295 |

|  37294 |

|  37293 |

+--------+

3 rows in set (0.00 sec)

rev_page 5591 has 27 revisions and rev_page 5592 has 3 revisions.

Now I was wondering what will happen if I undo the latest revision for page 5591 to revert it back to 37401. Fortunately this gives me a new revision 37406, which gives me the clue that I can use above query to clean up everything except for the latest 5 revisions. DikkieDick (talk) 12:45, 21 March 2017 (UTC)Reply

After some testing it's finished:
Code:
<?php
/**
 * Delete old revisions from the database and keep the latest 'N' revisions (default 10)
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 * http://www.gnu.org/copyleft/gpl.html
 *
 * @file
 * @ingroup Maintenance
 * @author Dick Pluim <dick.pluim@gmail.com>
 * (Based on deleteOldRevisions.php by Rob Church)
 */
require_once __DIR__ . '/Maintenance.php';
/**
 * Maintenance script that deletes old revisions from the database and keep the latest 'N' revisions (default 10).
 *
 * @ingroup Maintenance
 */
class DeleteOldRevisions extends Maintenance {
        public function __construct() {
                parent::__construct();
                $this->addDescription( 'Delete old revisions from the database and keep the latest N revisions (default 10)' );
                $this->addOption( 'delete', 'Actually perform the deletion' );
                $this->addOption( 'page_id', 'List of page ids to work on', false );
        }
        public function execute() {
                $this->output( "Delete old revisions\n\n" );
                $this->doDelete( $this->hasOption( 'delete' ), $this->mArgs );
        }
        function doDelete( $delete = false, $args = [] ) {
                # Data should come off the master, wrapped in a transaction
                $dbw = $this->getDB( DB_MASTER );
                $this->beginTransaction( $dbw, __METHOD__ );
                $revConds = "";
                $keepRevs = [];
                $keepLimit = 10; # default
                # If a parameter is given, we assume that this is the number of revisions to keep.
                # only first argument is being used.
                if ( count( $args ) > 0 ) {
                        $keepLimit=$args[0];
                        $this->output( "Keeping " . $keepLimit . " revisions\n" );
                }
                # make the pagelist
                $res = $dbw->select( 'page', 'page_id', 'page_id>0', array( 'ORDER BY' => 'page_id ASC' ));
                foreach ( $res as $row ) {
                          $revConds = "rev_page = $row->page_id order by rev_id desc limit $keepLimit" ;
                          # make the list of revisions we want to keep for this page
                          $res2 = $dbw->select ( 'revision', 'rev_id' , $revConds, __METHOD__);
                          foreach ( $res2 as $row2 ) {
                                  $keepRevs[] = $row2->rev_id ;
                          }
                }
                # Make the list of revisions which will be deleted
                $revConds = 'rev_id NOT IN (' . $dbw->makeList( $keepRevs ) . ')';
                $res = $dbw->select( 'revision', 'rev_id', $revConds, __METHOD__ );
                $oldRevs = [];
                foreach ( $res as $row ) {
                        $oldRevs[] = $row->rev_id;
                }
                $this->output( "done.\n" );
                # Inform the user of what we're going to do
                $count = count( $oldRevs );
                $this->output( "$count old revisions found.\n" );
                # Delete as appropriate
                if ( $delete && $count>0 ) {
                        $this->output( "Deleting..." );
                        $dbw->delete( 'revision', [ 'rev_id' => $oldRevs ], __METHOD__ );
                        $this->output( "done.\n" );
                }
                # This bit's done
                # Purge redundant text records
                $this->commitTransaction( $dbw, __METHOD__ );
                if ( $delete ) {
                        $this->purgeRedundantText( true );
                }
        }
}
$maintClass = "DeleteOldRevisions";
require_once RUN_MAINTENANCE_IF_MAIN;
--------------
Output:
[root@server maintenance]# php deleteOldRevisions_keep.php
Delete old revisions
Keeping 10 revisions
PHP Notice:  Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808
done.
2534 old revisions found.
[root@server maintenance]# php deleteOldRevisions_keep.php 5
Delete old revisions
Keeping 5 revisions
PHP Notice:  Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808
done.
6103 old revisions found.
[root@server maintenance]# php deleteOldRevisions_keep.php --delete 15
Delete old revisions
Keeping 15 revisions
PHP Notice:  Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808
done.
2026 old revisions found.
Deleting...done.
Searching for active text records in revisions table...done.
Searching for active text records in archive table...done.
Searching for inactive text records...done.
2024 inactive items found.
Deleting...done.
Tested with first 50 and then going slightly further down... ;-)
Can't figure out why I get the PHP Notice above. And there is sometimes a mismatch between old revisions found and inactive items found, but it's working in my test-environment.
Running it a second time:
[root@server maintenance]# php deleteOldRevisions_keep.php --delete 15
Delete old revisions
Keeping 15 revisions
PHP Notice:  Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808
done.
0 old revisions found.
Searching for active text records in revisions table...done.
Searching for active text records in archive table...done.
Searching for inactive text records...done.
0 inactive items found. DikkieDick (talk) 07:18, 23 March 2017 (UTC)Reply
Hi Dick,
your option is a great addition to the script! It would be great, if you could create an issue in phabricator and put it into review so that it can be added to the MediaWiki tarball so that everyone can benefit from it! 2001:16B8:10D2:A900:497:48D0:EC41:2B7B (talk) 20:14, 6 January 2018 (UTC)Reply

Option to remove only "minor edits" ?

[edit]

Hi,

I'm also trying to get a good compromise between a radical removal of history and storing lots of useless information. But what would be the best according to me, would be to be able to remove all the old "minor edits" in the history. Unfortunately, my coding skills are not sufficient for that... If someone has an idea...

Thanks Pseudomino (talk) 19:36, 18 November 2017 (UTC)Reply

An option to only remove all edits, which are marked as "minor" does not exist currently. Integrating such an option will cause problems:
First of all, it will break things like the calculation of size differences between revisions, if the referenced revision suddenly no longer is there. While this only is a technical issue, which maybe can be solved, there is another, way bigger problem:
An option to only delete minor edits will remove some edits from the history, but not others. Features like the history function of MediaWiki rely on the fact that all revisions stay in place. They compare revisions with each other and display the difference. However, if a revision in between has been removed, then the difference will also include the changes made in that removed revision. That means that changes will be attributed to a user, although it is not clear whether it was really him, who made them.
This is a very bad situation, which might even cause legal trouble, e.g. if part of an edit contains insults and with the according revision removed it looks like these insults come from user A, while they in fact have been added in a removed revision by user B. 2001:16B8:10D2:A900:497:48D0:EC41:2B7B (talk) 20:05, 6 January 2018 (UTC)Reply

What does "old" mean?

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I don't understand when a revision is defined as old.

Which revisions get deleted exactly? Maybe all but the current one? Or can "old" be defined somewhere as for example "30 days"? Berot3 (talk) 13:05, 18 January 2021 (UTC)Reply

The text states "to delete all old (non-current) revisions" so I'd say all but the latest revision of a page no matter at what time it was done. [[kgh]] (talk) 15:20, 18 January 2021 (UTC)Reply
thanks, well I will simply test than on a through-away page :D Berot3 (talk) 16:22, 18 January 2021 (UTC)Reply
Good. HOwever since you are a new MediaWiki user I am not sure why you need to reduce the size of the database. Personally I would only do it if I really have an issue. [[kgh]] (talk) 16:48, 18 January 2021 (UTC)Reply
yeah thanks, I think I simply confused "shrinking db" with "getting rid of the history of a page that is displayed for a page".
From what I saw it is only possible to delete history-entries of a page but than they still appear as greyed- and crossed-out. I thought that it might be possible to simply remove them entirely. Berot3 (talk) 14:17, 21 January 2021 (UTC)Reply
Not sure if a wiki is the best thing for you to choose. Having a version history is one of the core features of a wiki. Not having is is like cutting off arms and legs of the software I believe. However things could get philosophical discussing this further. [[kgh]] (talk) 16:46, 21 January 2021 (UTC)Reply
No, you are absolutely right. I’m used to have god-like rights as a admin, but having a wiki with such strong commitment to history makes Mediawiki even more powerful and beautiful as a wiki!
i understand now, thank you. Berot3 (talk) 20:58, 21 January 2021 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

how to get page ID (add info/link)

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


It might be good for beginners like me to add a link to Page information to inform readers where to find the page ID Berot3 (talk) 16:30, 18 January 2021 (UTC)Reply

I would do it myself, but I'm not sure where and how to put it with translation and stuff... Berot3 (talk) 16:33, 18 January 2021 (UTC)Reply
Done. [[kgh]] (talk) 16:47, 18 January 2021 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Parse error when running deleteOldRevisions.php

[edit]

MediaWiki 1.43.1

Parse error: syntax error, unexpected '=' in /home/clients/xxxxxxxxxxxxxxxxx/web/includes/BootstrapHelperFunctions.php on line 35

Line 35 has ??= operator.

Any clue ?

Thx Alex1859 (talk) 12:20, 27 August 2025 (UTC)Reply

solved
the new way is :
php maintenance/run.php deleteOldRevisions --delete ~2025-46767-2 (talk) 17:43, 27 August 2025 (UTC)Reply

Doesn't work

[edit]

This script doesn't work as it should. It only removes rows from the revision table, but leaves all old text in place. Reading its source code, looks like it had not been updated to work with Multi-Content Revisions. ~2025-29300-34 (talk) 06:40, 19 October 2025 (UTC)Reply

Any warning when used on an Wiki with previous concatenated compressOld database?

[edit]

Maintenance script compressOld.php with "-t concat" saves all gzipped texts in the text record of the oldest revision.

I'm curious whether this script ever issues a warning when executed on a database or records that had a run of this maintenance script? Currently I'm working with a wiki that is plagued with RevisionAccessExceptions: "Failed to load data blob from Bad data in text row xxx. Use findBadBlobs.php to remedy."

The cause is, when looking at the raw data, obviously because compressOld.php has been run with -t concat (and maybe also $wgCompressRevisions activated for some time - I couldn't tell), and then DeleteOldRevisions.php was executed on these revisions. This causes the non-deleted, most recent revision to point at the oldest revision containing the concatenated texts, except that the oldest revision is gone.--WhichBrain (talk) 04:39, 1 April 2026 (UTC)Reply