Talk:Quarry

About this board

Previous discussion was archived at Talk:Quarry/Archive 1 on 2015-04-17. Discussion area for discussion about Quarry itself and help about individual queries.

Start a new topic

Quarry / SQL query - how to fix OperationalError('table resultsets already exists')?

4 comments • 17:16, 24 July 2024 2 days ago

4

Gluo88 (talkcontribs)

I was able to complete the above SQL query as seen in https://quarry.wmcloud.org/history/84807/911494/884555.

However, when I run the same query again at https://quarry.wmcloud.org/query/84807 . I got the error "OperationalError('table resultsets already exists')". How should I fix the error?

I guess that resultsets may be a temporary table that was produced from running my previous SQL query. I tried to execute the following:

DROP TABLE IF EXISTS resultsets;

However, I got the following information: “Access denied; you need (at least one of) the SUPER, READ_ONLY ADMIN privilege(s) for this operation”

May anyone help me on this issue?

Thanks.

Reply 21:27, 13 July 2024 13 days ago

Gluo88 (talkcontribs)

I just found that the above query is complete now. The issue looks to be automatically solved, at least for now, although I still don't know the reason of the "OperationalError('table resultsets already exists').

However, my query https://quarry.wmcloud.org/query/84817 has just failed still with the same reason of the "OperationalError('table resultsets already exists').

May anyone know how to handle "OperationalError('table resultsets already exists')?

Thanks.

Reply Edited 01:25, 15 July 2024 12 days ago

Gluo88 (talkcontribs)

The issue looks to be automatically solved again. Did someone help in background?

Thanks a lot.

Reply 02:49, 15 July 2024 11 days ago

This post was hidden by Gluo88 (history)

Reply to "Quarry / SQL query - how to fix OperationalError('table resultsets already exists')?"

Tool databases

5 comments • 23:56, 23 July 2024 3 days ago

5

Samwilson (talkcontribs)

Is there a delay between a tool database being created and it being available in Quarry? It looks like s55926__wishlist_p can not be queried (it was only created today): https://quarry.wmcloud.org/query/11263

Reply 05:19, 23 July 2024 3 days ago

TheDJ (talkcontribs)

As far as I know, tools databases are not public, and so not in quarry.

Reply 09:16, 23 July 2024 3 days ago

Samwilson (talkcontribs)

@TheDJ: That used to be the case, but I'd thought that recently (phab:T151158) it had become possible. It's only databases with names ending in _p.

Reply 13:18, 23 July 2024 3 days ago

BDavis (WMF) (talkcontribs)

Quarry reads from the mirror of the primary ToolsDB rather than the primary. The mirror is lagged by 15 hours at the moment: https://grafana.wmcloud.org/d/PTtEnEyVk/toolsdb-mariadb?orgId=1&var-server=tools-db-3

Reply 16:08, 23 July 2024 3 days ago

Samwilson (talkcontribs)

Ah that's good to know, thanks! I'll be patient. :-)

Reply 23:56, 23 July 2024 3 days ago

Reply to "Tool databases"

Quarry / SQL optimization - using DB indexes?

3 comments • 18:33, 10 July 2024 16 days ago

3

Fl.schmitt (talkcontribs)

Hi, I'm trying to optimize a simple SQL query for pages on commonswiki (table page) which start with a certain string (e.g. "SELECT * FROM page WHERE page_title LIKE "Building%" ORDER by page_title;"- see also https://quarry.wmcloud.org/query/83277 for an example with a smaller result set). The SQL Optimizer on Toolforge tells me that this query would use filesort instead of indexes which causes performance issues ("Query plan 1.1 is using filesort or a temporary table. This is usually an indication of an inefficient query. If you find your query is slow, try taking advantage of available indexes to avoid filesort."). The DB schema documentation tells me that there should be an index defined (key: page_name_title) on page_title and page_namespace columns. The MySQL docs tell me that i could use USE INDEX (page_name_title) to enforce using that index. If I add that clause to my query (SELECT * FROM page USE INDEX (page_name_title) WHERE page_title LIKE "Building%" ORDER by page_title;), the SQL optimizer complains about a "Query error: Key 'page_name_title' doesn't exist in table 'page'". At the Commons village pump, I've learned that the replicas may lack the indices. So, I'm not sure if there's a way to optimize such a query on Quarry. I would prefer using Quarry instead of the toolforge CLI because Quarry allows for linking queries and results.

Reply 17:57, 10 July 2024 16 days ago

Matěj Suchánek (talkcontribs)

This query cannot use the index since you don't have a condition on page_namespace.

Reply 18:18, 10 July 2024 16 days ago

Fl.schmitt (talkcontribs)

Aww - ok :-) - yes, with such a condition, it works like a charm - thanks a lot!

Reply 18:33, 10 July 2024 16 days ago

Reply to "Quarry / SQL optimization - using DB indexes?"

Long Query Time

2 comments • 19:47, 12 June 2024 1 month ago

2

SoySauceOnRice (talkcontribs)

Hi, I am doing some research on sockpuppets. I'm trying to use the SQL database to assist in building my dataset, but queries seem to take a long time.

For instance, this previous query ran in 196 seconds:

https://quarry.wmcloud.org/query/61732

Whereas my identical query has been going for over 9 hours:

https://quarry.wmcloud.org/query/83588

A previous instance was running over a week before I restarted it.

Am I doing something wrong, or are these running times typical?

On another note, I have been looking for complete SQL dumps to run my own instance so that I don't cause issues with excessive queries, however I can only find complete xml dumps (https://dumps.wikimedia.org/enwiki/20240601/).

Is there any way to get a full SQL dump (complete with article revision history)?

Thank you for your help.

Reply 12:23, 11 June 2024 1 month ago

Matěj Suchánek (talkcontribs)

Using Toolforge CLI, the query returns 187814 rows in 31.187 sec. Maybe that's too much for Quarry.

Reply 19:47, 12 June 2024 1 month ago

Reply to "Long Query Time"

Queued queries

One comment • 14:06, 8 June 2024 1 month ago

1

Plastikspork (talkcontribs)

Recently, some of my queries get immediately queued and won't run. If I fork the query, then it runs immediately, but now I have a bunch that are marked as queued but not running. Is there a way to kill these old ones, and any ideas why this happens?

Here is an example of a few that are still queued:

https://quarry.wmcloud.org/query/71639

https://quarry.wmcloud.org/query/83591

https://quarry.wmcloud.org/query/83590

Thanks!

Reply 14:06, 8 June 2024 1 month ago

Reply to "Queued queries"

Query for wikipedia user registrations by day

3 comments • 07:12, 8 June 2024 1 month ago

3

Felipeangelim (talkcontribs)

Hi! I was trying to get the timeseries of new user registrations in wikipedia, by day (two columns, day and count), but my query is really slow. Am I doing something wrong?

SELECT

  DATE(user_registration) AS registration_date,

  COUNT(*) AS user_count

FROM

  user

WHERE

  user_editcount > 10

GROUP BY

  DATE(user_registration)

ORDER BY

  registration_date;

Reply Edited by Matěj Suchánek 06:50, 8 June 2024 1 month ago

Matěj Suchánek (talkcontribs)

There is no index on user_registration and user_editcount, so you are doing nothing wrong, it's just impossible to use a better query plan than scanning the whole table.

Reply 06:50, 8 June 2024 1 month ago

Matěj Suchánek (talkcontribs)

You can try query Special:Log/newusers instead. For example:

SELECT LEFT(log_timestamp, 8), COUNT(*)
FROM logging_logindex
JOIN user ON user_name = REPLACE(log_title, '_', ' ')
WHERE log_type = 'newusers'
AND log_action IN ('create', 'newusers')
AND log_timestamp > '2006'
AND user_editcount > 10
GROUP BY LEFT(log_timestamp, 8);

There are even more values possible for log_action, but I'm not sure if the JOIN worked for them, too.

Reply 07:12, 8 June 2024 1 month ago

Reply to "Query for wikipedia user registrations by day"

All files in a category (for categories with million+ files)

2 comments • 09:07, 5 June 2024 1 month ago

2

Schlurcher (talkcontribs)

Hi, I'm trying to efficiently get a list of all files in a category with 1'000'000+ files. For example all files on Commons in this Category "Category:Flickr images reviewed by FlickreviewR 2". This was the most simple I could come up with, but it takes like forever to get an output: https://quarry.wmcloud.org/query/83527

Is there a better way or does this require another tool (which would that be)?

Reply 14:08, 4 June 2024 1 month ago

TheDJ (talkcontribs)

You would have to write a dedicated tool to do requests that divides the query into multiple chunks and writes them to a file. You could use the Special:MyLanguage/API:Categorymembers api for instance, and list per 500 (limit) and use the continue parameter from request to request. Or get a database dump installed locally and export them from there. A million is a lot, it is not a type of request that the wikimedia optimises for.

Also depending on what you are trying to achieve, there might be alternate/better ways to achieve that goal other than listing a million+ entries in one go.

Reply 09:07, 5 June 2024 1 month ago

Reply to "All files in a category (for categories with million+ files)"

Help finding articles with a given template

6 comments • 20:27, 1 June 2024 1 month ago

6

Wizmut (talkcontribs)

I am trying to find articles that use "Template:Auto short description" but not "Template:Short description". Is this possible?

Reply 23:00, 31 May 2024 1 month ago

Matěj Suchánek (talkcontribs)

Though it is possible using an SQL query, ~~you can also use the default search.~~

Reply Edited 17:18, 1 June 2024 1 month ago

Wizmut (talkcontribs)

Thank you. But I also need to sort by the length of the induced short description, so it has to be SQL.

Reply 10:31, 1 June 2024 1 month ago

Matěj Suchánek (talkcontribs)

I think I misunderstood your request. "Template:Short description" is used inside articles, but "Template:Auto short description" is only for documentation purposes. So you actually want articles with a template documented as "short description inducing" without "Template:Short description", right?

I made this query (hope it's correct):

SELECT page_title, LENGTH(pp_value) AS len, pp_value
FROM page_props
JOIN page ON page_id = pp_page
LEFT JOIN templatelinks ON tl_from = page_id AND tl_target_id = 137  # Short_description
WHERE pp_propname = 'wikibase-shortdesc'
AND tl_target_id IS NULL
AND page_namespace = 0
AND page_is_redirect = 0
AND EXISTS (
 SELECT 1 FROM templatelinks
 JOIN linktarget ON lt_id = tl_target_id
 JOIN page ON page_namespace = lt_namespace AND page_title = lt_title
 JOIN categorylinks ON cl_from = page_id
 WHERE tl_from = page_id AND cl_to = 'Templates_that_generate_short_descriptions'
);

It took 10 minutes and the only returned entry was for en:Main_Page.

Reply 17:18, 1 June 2024 1 month ago

Wizmut (talkcontribs)

I very much appreciate the effort. But there are probably a few thousand pages that have very long induced short descriptions. I was able to find this one: [https://en.wikipedia.org/wiki/St._Anthony_Catholic_School] by skimming through [https://quarry.wmcloud.org/query/82687]. A fellow wikipedian who adds SDs requested a script for all such auto-SDs in excess of 80 characters, although SDs tend to get visually cut off around 55 characters.

Reply 18:10, 1 June 2024 1 month ago

Wizmut (talkcontribs)

The "AND tl_target_id = 137" seems to filter in only articles with explicit SDs. But changing it to "AND tl_target_id <> 137" gives a list of template links that aren't T:short_description. I'm not sure how to make it return a list of pages instead of template links.

Reply 20:27, 1 June 2024 1 month ago

Reply to "Help finding articles with a given template"

Quarry appears to have stopped working properly

3 comments • 23:01, 20 May 2024 2 months ago

3

The Anome (talkcontribs)

I seem to having problems with queries running and/or completing. Are any other users seeing this?

Reply 22:52, 20 May 2024 2 months ago

GTrang (talkcontribs)

Tracked in Phabricator
Task T365374

Reply 23:01, 20 May 2024 2 months ago

GTrang (talkcontribs)

Tracked at T365374.

Reply 23:01, 20 May 2024 2 months ago

Reply to "Quarry appears to have stopped working properly"

SQL query in Quarry for getting the current statements on a Wikidata item given its QID

5 comments • 01:17, 30 April 2024 2 months ago

5

Rdrg109 (talkcontribs)

I am learning how to use Quarry to query Wikidata data. I have written these two queries:

I now want to write a SQL query that gets the current statements on a Wikidata item given its QID (the returned rows should be the same of those rows returned by this SPARQL query). This is my initial idea on how to do it: I know that the table revision contains all the revisions on a given page and the table comment contain information on the introduced changes, so one way to do what I want to do is to parse the column comment_text (see an example in the results of query 2 above), which describes the changes, and determine the latest changes. I think this method is complex to implement using SQL since I need to determine which changes were not edited by any other other changes. I wonder if there's a simpler approach or a table that already contains the current statements on a Wikidata item.

I know that the current statements on a Wikidata item can be easily obtained in SPARQL (this query already does that), but since I'm learning how to query data in Quarry (i.e. using SQL), reading a SQL query that does that would help me to understand more about how data is stored and should be queried in Quarry.

Reply Edited 19:05, 29 April 2024 2 months ago

Matěj Suchánek (talkcontribs)

I wonder if there's a simpler approach or a table that already contains this information. SQL queries are not suitable for Wikidata data model, that's why Wikidata Query Service exists. Some SQL queries are possible, but they are rather management-oriented, not data-oriented.

Reply 19:11, 29 April 2024 2 months ago

Rdrg109 (talkcontribs)

For the record, I asked a similar question in Libera Chat's room #wikimedia-cloud and some user replied the following:

18:50 <rdrg109> For the record, I have created a topic with that question in Talk:Quarry here: https://www.mediawiki.org/wiki/Topic:Y3u6dz3ci6eqlura
19:08 <+wm-bb> <<hidden user>> rdrg109: it’s basically not doable. it’s best to use SPARQL instead
19:30 <rdrg109> <<hidden user>>: Ok, thanks for the help!

Reply Edited 01:17, 30 April 2024 2 months ago

BDavis (WMF) (talkcontribs)

More of the IRC discussion around this topic can be seen in the chat log at https://wm-bot.wmcloud.org/browser/index.php?start=04/29/2024&end=04/29/2024&display=%23wikimedia-cloud

Reply 20:54, 29 April 2024 2 months ago

Dipsacus fullonum (talkcontribs)

It is impossible to reconstruct all claims from comment_text. There are several Wikibase API commands that create or edit entities without specifying in the comment text which claims are created, removed or modified. See for example the creation of d:Q125692383 today which was created with 5 claims at once without it being visible in the comment.

Reply Edited 21:10, 29 April 2024 2 months ago

Reply to "SQL query in Quarry for getting the current statements on a Wikidata item given its QID"