Topic on Talk:Quarry

No index for revision table?

9
RoySmith (talkcontribs)

I'm running what looks like a very simple query:

select count(*) from revision where rev_user = 130326;

It's taking forever to complete. According to Manual:Revision table, rev_user should be the first part of the user_timestamp key, so this should be a fast indexed query. But, if I describe the table

describe revision;

I don't see any keys listed.

Zhuyifei1999 (talkcontribs)
Framawiki (talkcontribs)

Even quicker (the count process is a bit different, takes care for things like log events)

MariaDB [enwiki_p]> select user_editcount from user where user_id = 130326;
+----------------+
| user_editcount |
+----------------+
|          24197 |
+----------------+
1 row in set (0.01 sec)
RoySmith (talkcontribs)

Interesting. There's no table "revision_userindex" listed in Category:MediaWiki database tables. Is that just not up to date? I do see it when I run "show tables", and the query you suggest does indeed work.

So, the next obvious question is, why do we have two copies of the table, one with the index and another that's missing it?

Zhuyifei1999 (talkcontribs)

Because Category:MediaWiki database tables lists the tables that built-in to MediaWiki. These tables can contain highly sensitive data, such as hashes of passwords, tokens, IP addresses, revision-deleted or oversighted revision comments, etc. Therefore, when these databases are replicated to wiki replicas, which Quarry queries from, some tables like revision must redact some cells from a raw when a certain criteria is met.

We implement some of the redaction by not replication some tables at all, and some other redaction by using views. As for your query, the relevant redaction in view (not table) revision is if((`enwiki`.`revision`.`rev_deleted` & 4),NULL,`enwiki`.`revision`.`rev_user`) AS `rev_user` , which makes rev_user NULL when rev_deleted has bit 2^2 (for revdel-ed username) set. As a side effect of using this if, indexes in the table revision for rev_user cannot be used.

Since unusable indexes is a serious performance problem, the DBAs made alternative views that redact the rows entirely. This is the case of view revision_userindex, where the definition includes and `enwiki`.`revision`.`rev_user` AS `rev_user` and where ((`enwiki`.`revision`.`rev_deleted` & 4) = 0) . rev_user is now a simple passthrough, allowing use of indexes on that column; the bit condition in rev_deleted is now an implicit but enforced where condition to the underlying revision table that we cannot query directly.

So no, we don't have two copies of the table, but two views of the same table.

RoySmith (talkcontribs)

Ah, cool. Thanks. That all makes perfect sense.

Maybe the "Database tables" tab in Quarry could include this information? I imagine I'm not the first person to stub my toe on this.

Zhuyifei1999 (talkcontribs)

Good point. Though, what link text and target do you suggest? cc @Framawiki:

Framawiki (talkcontribs)

It would be great to have a wikitech page that includes the map of replicas table like the mediawiki one Currently a thing like this doesn't exist as I know.

See also phab:T139289

JPxG (talkcontribs)

I realize this is a comment from 2018, but I've had the same issue today, and I'll second the request for a wikitech page that lists these tables.

Reply to "No index for revision table?"