Topic on Project:Support desk

Jump to navigation Jump to search

Missing pages, but data appear to be intact

7
Jonahgreenthal (talkcontribs)

My wiki is missing a bunch of pages. I think it's specifically the pages that only had one revision. Possibly only if that revision was very old.

Here's an example page: https://www.qbwiki.com/wiki/St._Anne%27s . It says there's no text in this page, but the page, revision, and text tables all seem to have the right data.

Other observations:

The revision #0 of the page named "St. Anne's" does not exist.
This is usually caused by following an outdated history link to a page that has been deleted. Details can be found in the deletion log.

(but the deletion log doesn't show anything)

  • When I try to save an edit, I am told there is an edit conflict ("Someone else has changed this page since you started editing it.")
  • I can delete the page and then re-create it
  • I fiddled around with the PHP code to see where the error text was coming from. It's Article.php line 436, in the fetchRevisionRecord() method:
// $this->mRevision might already be fetched by getOldIDFromRequest()
if ( !$this->mRevision ) {
	if ( !$oldid ) {
		$this->mRevision = $this->mPage->getRevision();

		if ( !$this->mRevision ) {
			wfDebug( __METHOD__ . " failed to find page data for title " .
				$this->getTitle()->getPrefixedText() . "\n" );

			// Just for sanity, output for this case is done by showMissingArticle().
			$this->fetchResult = Status::newFatal( 'noarticletext' );
			$this->applyContentOverride( $this->makeFetchErrorContent() );
			return null;
		}
	…

But I don't know enough about how MediaWiki works (or PHP in general, really) to figure out what's going wrong.

This problem likely started after an upgrade that I struggled with. Unfortunately, I didn't notice this problem until long after the upgrade, and when I finished the upgrade I had thought everything turned out okay, so I don't remember exactly what went wrong.

The upshot is that there are a bunch of pages (this was just one example) whose contents exist in the database but can't be accessed through the website. I'm not sure even how to systematically identify such pages. I suspect some rows or values are just missing from some table(s), but I have no clue which or how to find out.

Thoughts?

Nikerabbit (talkcontribs)

Do you have backups from before the upgrade?

It is most likely that either the actor or comments migration has gone wrong. MediaWiki does a LEFT JOIN on those tables so missing entries in there will cause those revisions/pages to appear as missing.

If it is about comments, see https://phabricator.wikimedia.org/T249904.

If it is about actors, I had the following trick:

  1. identify the user names for those revisions
  2. Create proper users for them, e.g. `User::newSystemUser( '...', [ 'steal' => true ] );`
  3. Run database queries UPDATE revision SET rev_user = 0 where rev_user_name = '...'; (and similar for all affected tables, mostly logging, archive and recentchanges)
  4. Run php maintenance/cleanupUsersWithNoId.php


But you need backups to do this in case the rev_user and equivalent fields are already dropped. I guess it's possible to do it afterwards by updating the rev_actor and equivalent fields too, but I have not done that myself.

Jonahgreenthal (talkcontribs)

I do have backups from before the upgrade, but the upgrade was about a year ago so restoring from the backup isn't viable.

Thanks for pointing me at revision_comment_temp and revision_actor_temp. It looks like the problem is the latter—this query returns 164 rows:

SELECT * FROM revision WHERE rev_id NOT IN (SELECT revactor_rev FROM revision_actor_temp)

Do you agree with that reasoning? (Some of the corresponding pages do exist, but the revisions seem to be missing when I view the history through the web interface.)

The rev_user_text column contains the username, so that should address your step 1, right? Are you able to elaborate on step 2 (how do I do that? what's the steal thing?) and 3 (which tables are affected?)? Thanks so much!

Nikerabbit (talkcontribs)

I'd suggest running `php maintenance/migrateActors.php --force` to observe if there are errors. If there is, you should get list of usernames that match the rev_user_text of those rows. You could try running cleanupUsersWithNoId.php first or maybe even findMissingActors.php (if you have it) to see if is sufficient.

But if they don't work, my step 2 basically creates and user and actor for the name. The issue may be that there is no used account for the name, so actor cannot be created. Step 3 removes broken references to user ids which do not exist, so that cleanupUsersWithNoId can process it. The relevant tables and names should be printed out by the migrateActors script.

Jonahgreenthal (talkcontribs)

Thanks. migrateActors produced a bunch of messages like this:

User name "X" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

cleanupUsersWithNoId produced a bunch of output but didn't seem to actually do anything.

I don't know how to actually do your step 2. It looks like PHP code I'm supposed to run, but I don't know how to run custom code within the MediaWiki environment.

I don't have findMissingActors.

Nikerabbit (talkcontribs)

There is shell.php and eval.php under maintenance, both allow you to run that code interactively.

Jonahgreenthal (talkcontribs)

Thanks. I had to fight with shell.php pretty hard to get PsySH to work, but I think everything works now, including the solution of my original problem. I appreciate your help.

Reply to "Missing pages, but data appear to be intact"