Topic on Talk:Quarry

Is the query actually running or not?

14
Mmaarrkkooss (talkcontribs)

I gave a query to quarry about all the articles and their categories - it's logical it will need some time. The query status is running, but when I press explain, it says "Error: Hmm... Is the SQL actually running?! ". Which is it then?

Zhuyifei1999 (talkcontribs)

Which query? Could you give a link?

Mmaarrkkooss (talkcontribs)
Mmaarrkkooss (talkcontribs)

I might been doing this wrong however.

Zhuyifei1999 (talkcontribs)

From the logs, the query most likely generated too many results for quarry to store them. You hit phab:T172086.

Mmaarrkkooss (talkcontribs)

Where can I see the logs? How could I run a query that big?

Zhuyifei1999 (talkcontribs)
MariaDB [enwiki_p]> SELECT COUNT(1) FROM categorylinks INNER JOIN page p1 ON (p1.page_id = cl_from AND (p1.page_namespace = 0)) LEFT JOIN page p2 ON (p2.page_namespace = '14' AND (p2.page_title = cl_to)) LEFT JOIN page_props ON ((pp_page=p2.page_id) AND pp_propname = 'hiddencat') WHERE   (pp_propname IS NULL);
+----------+
| COUNT(1) |
+----------+
| 28917820 |
+----------+
1 row in set (9 min 7.29 sec)

MariaDB [enwiki_p]> Select COUNT(1) from page;
+----------+
| COUNT(1) |
+----------+
| 43592182 |
+----------+
1 row in set (2 min 1.31 sec)

Quarry will not and should not be able to store result sets this large. As for the logs, they are as files on the runner instances, and I do not know of an easy way to expose them.

Mmaarrkkooss (talkcontribs)

Thanks for the insights. Do you know how could obtain such large data from that query?

Mmaarrkkooss (talkcontribs)

Also, the above shell is it quarry? Cause it doesn't look so.

Zhuyifei1999 (talkcontribs)

For large amounts of offline data, you can use https://dumps.wikimedia.org/enwiki/. The above shell is directly querying the Wiki Replicas servers on Toolforge, but quarry will produce the same results if the same query is run at the same time, since it connects to the exact same servers.

Mmaarrkkooss (talkcontribs)

I have in fact downloaded these sql tables from the dumps and I am writing them in a database (which is a slow process) but I have a fear that this query won't work either.

Mmaarrkkooss (talkcontribs)

Hello again.

I am thinking of using pagination and limiting the results. How many do you think I could get away with? 1 million records? More ? Less?

Mmaarrkkooss (talkcontribs)

500k is doable.

Mmaarrkkooss (talkcontribs)
Reply to "Is the query actually running or not?"