Talk:Quarry

About this board

Previous discussion was archived at Talk:Quarry/Archive 1 on 2015-04-17. Discussion area for discussion about Quarry itself and help about individual queries.

SQL query in Quarry for getting the current statements on a Wikidata item given its QID

5
Rdrg109 (talkcontribs)

I am learning how to use Quarry to query Wikidata data. I have written these two queries:

  1. Get the number of statements on a Wikidata item given its QID
  2. Get all revisions on a Wikidata item given its QID

I now want to write a SQL query that gets the current statements on a Wikidata item given its QID (the returned rows should be the same of those rows returned by this SPARQL query). This is my initial idea on how to do it: I know that the table revision contains all the revisions on a given page and the table comment contain information on the introduced changes, so one way to do what I want to do is to parse the column comment_text (see an example in the results of query 2 above), which describes the changes, and determine the latest changes. I think this method is complex to implement using SQL since I need to determine which changes were not edited by any other other changes. I wonder if there's a simpler approach or a table that already contains the current statements on a Wikidata item.

I know that the current statements on a Wikidata item can be easily obtained in SPARQL (this query already does that), but since I'm learning how to query data in Quarry (i.e. using SQL), reading a SQL query that does that would help me to understand more about how data is stored and should be queried in Quarry.

Matěj Suchánek (talkcontribs)

I wonder if there's a simpler approach or a table that already contains this information. SQL queries are not suitable for Wikidata data model, that's why Wikidata Query Service exists. Some SQL queries are possible, but they are rather management-oriented, not data-oriented.

Rdrg109 (talkcontribs)

For the record, I asked a similar question in Libera Chat's room #wikimedia-cloud and some user replied the following:

18:50 <rdrg109> For the record, I have created a topic with that question in Talk:Quarry here: https://www.mediawiki.org/wiki/Topic:Y3u6dz3ci6eqlura
19:08 <+wm-bb> <<hidden user>> rdrg109: it’s basically not doable. it’s best to use SPARQL instead
19:30 <rdrg109> <<hidden user>>: Ok, thanks for the help!
BDavis (WMF) (talkcontribs)
Dipsacus fullonum (talkcontribs)

It is impossible to reconstruct all claims from comment_text. There are several Wikibase API commands that create or edit entities without specifying in the comment text which claims are created, removed or modified. See for example the creation of d:Q125692383 today which was created with 5 claims at once without it being visible in the comment.

Reply to "SQL query in Quarry for getting the current statements on a Wikidata item given its QID"

Internal server error when trying to stop a query

1
Tom.Reding (talkcontribs)

I tried using 2 different browsers, Chrome & Firefox, but get the same result. 79955 & 82289 have been in queue status for ~18 hours.

task T363644

Reply to "Internal server error when trying to stop a query"

Get values for a parameter of a template

4
Ameisenigel (talkcontribs)

Is there any possibility to query the values of a specific parameter of a template? What I would like to have is something like a list of the values that are used for parameter "X" in template "Y".

Matěj Suchánek (talkcontribs)
BDavis (WMF) (talkcontribs)

There is no native MediaWiki data maintained to track this interesting semantic data. There once was a community maintained tool called Templatetiger that maintained a database of this type of information. Templatetiger's database was constructed by reading the monthly database dumps and parsing the wikitext content they contain. Technically the tool still exists, but the data it contains has not been updated in many years.

Ameisenigel (talkcontribs)

Thanks to both of you for your answers!

Get the latest page assessment

2
2601:40F:4002:4DD0:E567:3C2B:9BAC:7198 (talkcontribs)

What is the best way to get the latest assessment value per page per project. I only care about pages "assigned" to projects.

I tried to get all "page_assessments_projects" table but waited for hours

thank you

2601:40F:4002:4DD0:E567:3C2B:9BAC:7198 (talkcontribs)

* I tried to get all "page_assessments" table....

Reply to "Get the latest page assessment"

Where are non-local file pages stored?

4
StefenTower (talkcontribs)

On enwiki, I noticed that when I query the Page table, it gives me only local File: pages but not Commons File: pages used on enwiki. In what table do I find info on those? Sometimes they are in WikiProjects by way of their talk pages, so I'd like to connect them to see which of these files are in a WikiProject.

BDavis (WMF) (talkcontribs)

Non-local File pages are not stored in the wiki's local database. Instead they are exposed via remote access to the foreign wiki's database either directly (true for most Wikimedia hosted wikis -> Commons) or via API (InstantCommons). Pages like Manual:$wgForeignFileRepos give some clues about how this works on the backend.

If I'm understanding your use case, I think it is similar to the problems described by T268240: Provide a mechanism for detecting duplicate files in commons and a local wiki and/or T267992: Provide mechanism to detect name clashed media between Commons and a Local project, without needing to join tables across wiki-db's. The TL;DR there is that it is not possible to perform an SQL join across the backing database partitions for enwiki and commonswiki. This type of computation needs to currently be done in custom software that makes separate requests to both sides of the partition instead.

StefenTower (talkcontribs)

Thank you for your quick response. I have created a workaround that assumes that the matching File: page is there, as oddly enough, their associated File talk: pages are tracked in the Page table.

BDavis (WMF) (talkcontribs)

File talk: pages would be local. Nice find! :)

Reply to "Where are non-local file pages stored?"

pagelinks query is too slow

2
71.183.74.70 (talkcontribs)

I'm trying to display the list of articles with the most links on English Wikipedia: https://quarry.wmcloud.org/query/80666. I've run this query on other, smaller wikis, and results look good to me. However, on enwiki_p, it eventually times out and stops. How can I optimize it? Thank you!

TheDJ (talkcontribs)

You are asking the database to create in memory, a new calculated column, based on every single link, for each article (7 million). That won’t / can’t scale. The solution is to download a local copy of the database, and in batches of x articles, run and store the calculation, and then run your top x query against those calculated results.

Reply to "pagelinks query is too slow"

pagelinks doesn't work on commonswiki_p

4
Summary by BDavis (WMF)

Error was caused by database table maintenance.

Achim55 (talkcontribs)

Even 'DESCRIBE pagelinks' throws 'Error: View 'commonswiki_p.pagelinks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them'. Other wiki's DBs are fine.

Achim55 (talkcontribs)

What a miracle, now it works again...

BDavis (WMF) (talkcontribs)

Less of a miracle than things working as intended. phab:T352010#9551548 was the bug report in a location that folks would be looking for it, 4 minutes later phab:T352010#9551559 was the realization of what was needed to correct the issue, and finally phab:T352010#9551564 6 minutes after the initial report was the DBA asking for verification that the fix worked.

Matěj Suchánek (talkcontribs)

Format article titles as links to such

2
Summary by BDavis (WMF)

See phab:T74874 for feature request

StefenTower (talkcontribs)

Is there a way to show a Quarry result of wiki article titles and have those articles linkable from the result?

BDavis (WMF) (talkcontribs)

Count files until a specific upload time

3
PantheraLeo1359531 (talkcontribs)

Hi!


I have a query that counts all uploaded files and total size (https://quarry.wmcloud.org/query/65557). How do I change the code, so that the query only counts the files until a specific upload time and date (like "count uploaded files until upload date 2023-12-31").


Thank you! ~~~~

TheDJ (talkcontribs)
PantheraLeo1359531 (talkcontribs)

Thank you very much, this is what I looked for :) --~~~~

Reply to "Count files until a specific upload time"

Need Help, for Missing Infobox items

4
IJohnKennady (talkcontribs)

Hi Supports, How to find the missing or empty Infobox parameters, for e-g, Infobox Indian constituency this template have multiple parameters, find this "| constituency_no" and "| constituency_no = " is blank/empty. - IJohnKennady (talk) 17:22, 20 January 2024 (UTC)

Matěj Suchánek (talkcontribs)

Infobox parameters are not indexed by the database.

IJohnKennady (talkcontribs)

Is there any wiki tools available to find??? -

Matěj Suchánek (talkcontribs)

CirrusSearch, a.k.a. the native search functionality, comes to my mind. In particular, its hastemplate: and insource: features.

Reply to "Need Help, for Missing Infobox items"