Talk:Quarry

Jump to navigation Jump to search

About this board

Discussion area for discussion about Quarry itself and help about individual queries.

Categories with exactly 1 category in them

2
Bennylin (talkcontribs)

Hi, can anybody help me create a query for a list of "Categories with exactly 1 category in them"? Thanks

Bennylin (talkcontribs)
Mike Peel (talkcontribs)

I've been trying to run https://quarry.wmflabs.org/query/28096 for the last few days, but it keeps timing out. It normally works (or works after timing out once or twice). Are things running slow on the back-end? If so, would it be possible to increase the time-out from 30 minutes to 60 minutes or so? Thanks!

Zhuyifei1999 (talkcontribs)

It says it's executed in 894.39 seconds. Is this still an issue?

Mike Peel (talkcontribs)

I just reran it again today, and it worked first time with an execution time of 735.61 - so maybe this is resolved. However, let's see how it goes over the next week...

Framawiki (talkcontribs)

Note that Quarry is "simply" an interface for Wiki replicas databases. A timeout certainly comes from these DB.

Mike Peel (talkcontribs)

It's back to timing out again today. :-( It must somehow be linked to server loads, as I'm not changing the query between runs, and the amount of data to analyse and the number of results are roughly similar.

Mike Peel (talkcontribs)

It's still timing out. :-(

Mike Peel (talkcontribs)

... and today it completed in 484.18 seconds (~8 minutes). Very weird - the run time seems to be very random.

Mike Peel (talkcontribs)

It's been back to timing out for most of this last week.

Frietjes (talkcontribs)

I have been having the same problem with query/36685 :(

Mike Peel (talkcontribs)

The query has been timing out for the last fortnight now. :-(

Mike Peel (talkcontribs)
Ahecht (talkcontribs)

Same issue with quarry:quarry/36666 which had been taking about 15 minutes but is now timing out after 30.

Dvorapa (talkcontribs)

I experience the same thing. It seems analytics cluster is overloaded or there may be some other issue. All my simple queries now take 20+ minutes or timeout, a year ago they run under 2 minutes.

For Toolforge users the easy solution is to try the web cluster (sql --cluster=web enwiki), but the issue should be solved anyway.

Dvorapa (talkcontribs)

Per IRC thread everything should be back to normal soon

Mike Peel (talkcontribs)

IRC thread?

Frietjes (talkcontribs)

I was able to run quarry:query/36685 about 20 minutes ago, and it didn't timeout this time!

Mike Peel (talkcontribs)
Dvorapa (talkcontribs)

Error 500 should be fixed now based on what guys have said on IRC (#wikimedia-cloud)

Mike Peel (talkcontribs)

It seems to be working now, thanks! :-)

Reply to "Queries running slow?"
GeoffreyT2000 (talkcontribs)
Zhuyifei1999 (talkcontribs)

Which part is smaller?

Matěj Suchánek (talkcontribs)

I can confirm that. The black input field, "Fork of...", "This query has been published...", query names under My profile. Firefox.

Zhuyifei1999 (talkcontribs)
Framawiki (talkcontribs)
Framawiki (talkcontribs)

It should be better now, can you confirm?

Matěj Suchánek (talkcontribs)

Seem correct, thanks!

Reply to "Font size smaller than before"

Number of people who put specific page into Watchlist

2
Japanologie (talkcontribs)
Matěj Suchánek (talkcontribs)

No, the relevant tables are not public for privacy reasons.

Reply to "Number of people who put specific page into Watchlist"
MisterSynergy (talkcontribs)

Due to the ongoing migration to the actor table, quite a number of queries need modifications. Currently I have 75 queries in my Quarry account, and it would be nice if there was a "Search in my queries" functionality that would allow me to look for e.g. "_user". Would it be possible to add such a search field to Quarry?

Edgars2007 (talkcontribs)

I currently simply use scraping for such purposes (also it acts like backup). But yeah, this would be very, very nice.

There should be thread or phab ticket for this. Something like "meta-quarry" (that is, query the quarry data).

BDavis (WMF) (talkcontribs)

phab:T90509 seems related, but it sounds like you might be hoping for searching of the SQL query body itself too.

Reply to "Search in my SQL queries"
Summary by BDavis (WMF)

Full article text is not available in the Wiki Replica databases. Use the Action API or Dumps instead to get article content.

202.63.103.210 (talkcontribs)

Can someone help me to write a query which gives the article content from Wikipedia articles in Telugu language

Matěj Suchánek (talkcontribs)

It is not possible to extract content from articles via Quarry (relevant tables are not publicly available).

Framawiki (talkcontribs)

But you can use the Special:Export page on your local wiki to get an w:XML version of the content of the pages.

Missing columns in revision table?

6
Summary by BDavis (WMF)

The Wiki Replica views have changed to remove *_user and *_user_text fields from all tables. See wikitech:News/Actor storage changes on the Wiki Replicas for more information.

Le Deluge (talkcontribs)

I've just tried to run a query that uses rev_user_text, one of the columns in the revision table, and get a failure with :

Unknown column 'rev_user_text' in 'field list'

It runs fine when I remove SELECT rev_user_text from the query. Is this a temporary glitch, or has rev_user_text been permanently removed?

Edgars2007 (talkcontribs)
BDavis (WMF) (talkcontribs)
This post was hidden by Barkeep49 (history)
TheDJ (talkcontribs)
BDavis (WMF) (talkcontribs)

I created phab:T225007 today to track the need updating the Manual. Part of the issue that has happened here is explained (but maybe not well) on wikitech:News/Actor storage changes on the Wiki Replicas is that the Wiki Replica views have changed in advance of MediaWiki changes. I'm not 100% sure which patches have landed in the MediaWiki code base and which are still pending in Gerrit. There definitely is some documentation lag however because even if the legacy user fields have not been remove from MediaWiki core yet, the actor table and related fields have landed.

Reply to "Missing columns in revision table?"

Subquery succeeds but Join times out

3
PaigePhault (talkcontribs)

I'm trying to get some estimates of the prevalence of WP policy violations, but am seeing timeouts for queries that do joins.

SELECT page.page_title, revision.rev_id, mycomment.comment_text FROM (SELECT comment.comment_text, comment.comment_id FROM comment WHERE comment.comment_text RLIKE '[[:<:]]WP:(OUTING|PRIVACY|DOX)[[:>:]]' LIMIT 10) AS mycomment INNER JOIN revision ON revision.rev_comment_id = mycomment.comment_id INNER JOIN page on page.page_id = revision.rev_page;

The internal select statement finishes in a <cough> reasonable amount of time, but trying to use it in a join fails every time. Any suggestions on how to materialize the subselect perhaps?

Matěj Suchánek (talkcontribs)

What about...?

SELECT page_title, rev_id, comment_text
FROM page
JOIN revision ON rev_page = page_id
JOIN `comment` ON rev_comment_id = comment_id
WHERE rev_comment_id IN (SELECT comment_id FROM `comment` AS c1 WHERE c1.comment_text RLIKE '[[:<:]]WP:(OUTING|PRIVACY|DOX)[[:>:]]')
LIMIT 10;
PaigePhault (talkcontribs)
Reply to "Subquery succeeds but Join times out"
جار الله (talkcontribs)

Hello, I am looking for articles with flaggedRevs boxes, is there any way to find them by http://quarry.wmflabs.org/?

Reply to "mw-fr-revisiontag"
Ineuw (talkcontribs)

Am I correct to assume that the text layer of a Wikisource page is encoded as UTF-8? Does this apply to all Wikipedia sites that contain text?

Achim55 (talkcontribs)

Yes and yes as far as I know. Citing Manual:$wgDBTableOptions: "MediaWiki internally always produces data in UTF-8 encoding to be saved in the database."


Reply to "Wikisource text layer encoding"