Hi, can anybody help me create a query for a list of "Categories with exactly 1 category in them"? Thanks
Jump to navigation Jump to search
Reply to "Queries running slow?"
Reply to "Font size smaller than before"
Reply to "Number of people who put specific page into Watchlist"
Reply to "Search in my SQL queries"
Reply to "Missing columns in revision table?"
Reply to "Subquery succeeds but Join times out"
Reply to "mw-fr-revisiontag"
Reply to "Wikisource text layer encoding"
About this board
Discussion area for discussion about Quarry itself and help about individual queries.
Categories with exactly 1 category in them
Nvm, I got it quarry:query/37399
Queries running slow?
I've been trying to run https://quarry.wmflabs.org/query/28096 for the last few days, but it keeps timing out. It normally works (or works after timing out once or twice). Are things running slow on the back-end? If so, would it be possible to increase the time-out from 30 minutes to 60 minutes or so? Thanks!
It says it's executed in 894.39 seconds. Is this still an issue?
I just reran it again today, and it worked first time with an execution time of 735.61 - so maybe this is resolved. However, let's see how it goes over the next week...
Note that Quarry is "simply" an interface for Wiki replicas databases. A timeout certainly comes from these DB.
It's back to timing out again today. :-( It must somehow be linked to server loads, as I'm not changing the query between runs, and the amount of data to analyse and the number of results are roughly similar.
It's still timing out. :-(
... and today it completed in 484.18 seconds (~8 minutes). Very weird - the run time seems to be very random.
It's been back to timing out for most of this last week.
I have been having the same problem with query/36685 :(
The query has been timing out for the last fortnight now. :-(
The query has been timing out for the last fortnight now. :-( I've filed a bug report at https://phabricator.wikimedia.org/T226050
Same issue with quarry:quarry/36666 which had been taking about 15 minutes but is now timing out after 30.
I experience the same thing. It seems analytics cluster is overloaded or there may be some other issue. All my simple queries now take 20+ minutes or timeout, a year ago they run under 2 minutes.
For Toolforge users the easy solution is to try the web cluster (sql --cluster=web enwiki), but the issue should be solved anyway.
Per IRC thread everything should be back to normal soon
I was able to run quarry:query/36685 about 20 minutes ago, and it didn't timeout this time!
Error 500 should be fixed now based on what guys have said on IRC (#wikimedia-cloud)
It seems to be working now, thanks! :-)
Font size smaller than before
The font size on https://quarry.wmflabs.org/ is now much smaller than before. Was this intentional?
Which part is smaller?
I can confirm that. The black input field, "Fork of...", "This query has been published...", query names under My profile. Firefox.
It should be better now, can you confirm?
Seem correct, thanks!
Number of people who put specific page into Watchlist
We can get number of people who put specific page into Watchlist through [https://en.wikipedia.org/w/api.php?action=query&prop=info&titles=Main%20Page&inprop=watchers API]. Could We become able to do this through Quarry?
I'd like to create page list most watchlisted. Thank you.
No, the relevant tables are not public for privacy reasons.
Search in my SQL queries
Due to the ongoing migration to the actor table, quite a number of queries need modifications. Currently I have 75 queries in my Quarry account, and it would be nice if there was a "Search in my queries" functionality that would allow me to look for e.g. "_user". Would it be possible to add such a search field to Quarry?
I currently simply use scraping for such purposes (also it acts like backup). But yeah, this would be very, very nice.
There should be thread or phab ticket for this. Something like "meta-quarry" (that is, query the quarry data).
phab:T90509 seems related, but it sounds like you might be hoping for searching of the SQL query body itself too.
Can someone help me to write a query which gives the article content from Wikipedia articles in Telugu language
It is not possible to extract content from articles via Quarry (relevant tables are not publicly available).
But you can use the Special:Export page on your local wiki to get an w:XML version of the content of the pages.
Missing columns in revision table?
I've just tried to run a query that uses rev_user_text, one of the columns in the revision table, and get a failure with :
Unknown column 'rev_user_text' in 'field list'
It runs fine when I remove SELECT rev_user_text from the query. Is this a temporary glitch, or has rev_user_text been permanently removed?
Yes, permanently removed.
There is some additional information on this change at wikitech:News/Actor storage changes on the Wiki Replicas.
Also seems we have out of date documentation however... https://www.mediawiki.org/wiki/Manual:Revision_table#rev_user_text
That's probably not helpful...
I created phab:T225007 today to track the need updating the Manual. Part of the issue that has happened here is explained (but maybe not well) on wikitech:News/Actor storage changes on the Wiki Replicas is that the Wiki Replica views have changed in advance of MediaWiki changes. I'm not 100% sure which patches have landed in the MediaWiki code base and which are still pending in Gerrit. There definitely is some documentation lag however because even if the legacy user fields have not been remove from MediaWiki core yet, the actor table and related fields have landed.
Subquery succeeds but Join times out
I'm trying to get some estimates of the prevalence of WP policy violations, but am seeing timeouts for queries that do joins.
SELECT page.page_title, revision.rev_id, mycomment.comment_text FROM (SELECT comment.comment_text, comment.comment_id FROM comment WHERE comment.comment_text RLIKE '[[:<:]]WP:(OUTING|PRIVACY|DOX)[[:>:]]' LIMIT 10) AS mycomment INNER JOIN revision ON revision.rev_comment_id = mycomment.comment_id INNER JOIN page on page.page_id = revision.rev_page;
The internal select statement finishes in a <cough> reasonable amount of time, but trying to use it in a join fails every time. Any suggestions on how to materialize the subselect perhaps?
SELECT page_title, rev_id, comment_text FROM page JOIN revision ON rev_page = page_id JOIN `comment` ON rev_comment_id = comment_id WHERE rev_comment_id IN (SELECT comment_id FROM `comment` AS c1 WHERE c1.comment_text RLIKE '[[:<:]]WP:(OUTING|PRIVACY|DOX)[[:>:]]') LIMIT 10;
Nope, that also times out see here https://quarry.wmflabs.org/query/36349
Hello, I am looking for articles with flaggedRevs boxes, is there any way to find them by http://quarry.wmflabs.org/?
Wikisource text layer encoding
Am I correct to assume that the text layer of a Wikisource page is encoded as UTF-8? Does this apply to all Wikipedia sites that contain text?
Yes and yes as far as I know. Citing Manual:$wgDBTableOptions: "MediaWiki internally always produces data in UTF-8 encoding to be saved in the database."