Topic on Talk:Quarry

Jump to navigation Jump to search

Query using LAG() takes a long time

3
RoySmith (talkcontribs)

I want to look at the intervals between successive revisions. I tried (against enwiki_p):

select rev_id, rev_timestamp, lag(rev_timestamp) over (order by rev_id)
from revision
limit 5;

but that's running for a long time; I assume it'll time out eventually. It's as if it were doing a full table scan. What am I doing wrong?

BDavis (WMF) (talkcontribs)

The explain I got by using https://sql-optimizer.toolforge.org/ looks like index would be used, but your query has no WHERE clause to limit it's scope. The LIMIT clause truncates the returned results, but having an ORDER BY involved requires the database engine to produce and order the data set before truncating to the 5 desired results. The sheer size of the enwiki revision table (913,325,481 rows) makes operations on it without WHERE filtering very time and memory expensive, especially when ORDER BY is also involved.

Results

id select_type table type possible_keys key key_len ref rows Extra
1.1 SIMPLE revision index page_timestamp 20 913325481 Using index; Using temporary
RoySmith (talkcontribs)

OK, that makes sense. I assumed that since rev_timestamp was indexed, it would be smart enough to do the right thing, but I guess not.

Reply to "Query using LAG() takes a long time"