Talk:Quarry

About this board

Previous discussion was archived at Talk:Quarry/Archive 1 on 2015-04-17. Discussion area for discussion about Quarry itself and help about individual queries.

Where are non-local file pages stored?

4
StefenTower (talkcontribs)

On enwiki, I noticed that when I query the Page table, it gives me only local File: pages but not Commons File: pages used on enwiki. In what table do I find info on those? Sometimes they are in WikiProjects by way of their talk pages, so I'd like to connect them to see which of these files are in a WikiProject.

BDavis (WMF) (talkcontribs)

Non-local File pages are not stored in the wiki's local database. Instead they are exposed via remote access to the foreign wiki's database either directly (true for most Wikimedia hosted wikis -> Commons) or via API (InstantCommons). Pages like Manual:$wgForeignFileRepos give some clues about how this works on the backend.

If I'm understanding your use case, I think it is similar to the problems described by T268240: Provide a mechanism for detecting duplicate files in commons and a local wiki and/or T267992: Provide mechanism to detect name clashed media between Commons and a Local project, without needing to join tables across wiki-db's. The TL;DR there is that it is not possible to perform an SQL join across the backing database partitions for enwiki and commonswiki. This type of computation needs to currently be done in custom software that makes separate requests to both sides of the partition instead.

StefenTower (talkcontribs)

Thank you for your quick response. I have created a workaround that assumes that the matching File: page is there, as oddly enough, their associated File talk: pages are tracked in the Page table.

BDavis (WMF) (talkcontribs)

File talk: pages would be local. Nice find! :)

Reply to "Where are non-local file pages stored?"

pagelinks query is too slow

2
71.183.74.70 (talkcontribs)

I'm trying to display the list of articles with the most links on English Wikipedia: https://quarry.wmcloud.org/query/80666. I've run this query on other, smaller wikis, and results look good to me. However, on enwiki_p, it eventually times out and stops. How can I optimize it? Thank you!

TheDJ (talkcontribs)

You are asking the database to create in memory, a new calculated column, based on every single link, for each article (7 million). That won’t / can’t scale. The solution is to download a local copy of the database, and in batches of x articles, run and store the calculation, and then run your top x query against those calculated results.

Reply to "pagelinks query is too slow"

pagelinks doesn't work on commonswiki_p

4
Summary by BDavis (WMF)

Error was caused by database table maintenance.

Achim55 (talkcontribs)

Even 'DESCRIBE pagelinks' throws 'Error: View 'commonswiki_p.pagelinks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them'. Other wiki's DBs are fine.

Achim55 (talkcontribs)

What a miracle, now it works again...

BDavis (WMF) (talkcontribs)

Less of a miracle than things working as intended. phab:T352010#9551548 was the bug report in a location that folks would be looking for it, 4 minutes later phab:T352010#9551559 was the realization of what was needed to correct the issue, and finally phab:T352010#9551564 6 minutes after the initial report was the DBA asking for verification that the fix worked.

Matěj Suchánek (talkcontribs)

Format article titles as links to such

2
Summary by BDavis (WMF)

See phab:T74874 for feature request

StefenTower (talkcontribs)

Is there a way to show a Quarry result of wiki article titles and have those articles linkable from the result?

BDavis (WMF) (talkcontribs)

Count files until a specific upload time

3
PantheraLeo1359531 (talkcontribs)

Hi!


I have a query that counts all uploaded files and total size (https://quarry.wmcloud.org/query/65557). How do I change the code, so that the query only counts the files until a specific upload time and date (like "count uploaded files until upload date 2023-12-31").


Thank you! ~~~~

TheDJ (talkcontribs)
PantheraLeo1359531 (talkcontribs)

Thank you very much, this is what I looked for :) --~~~~

Reply to "Count files until a specific upload time"

Need Help, for Missing Infobox items

4
IJohnKennady (talkcontribs)

Hi Supports, How to find the missing or empty Infobox parameters, for e-g, Infobox Indian constituency this template have multiple parameters, find this "| constituency_no" and "| constituency_no = " is blank/empty. - IJohnKennady (talk) 17:22, 20 January 2024 (UTC)

Matěj Suchánek (talkcontribs)

Infobox parameters are not indexed by the database.

IJohnKennady (talkcontribs)

Is there any wiki tools available to find??? -

Matěj Suchánek (talkcontribs)

CirrusSearch, a.k.a. the native search functionality, comes to my mind. In particular, its hastemplate: and insource: features.

Reply to "Need Help, for Missing Infobox items"

how to get articles with images of a category

2
Info-farmer (talkcontribs)

I want to get articles with images of a the English Wikipedia category. Please help.

https://en.wikipedia.org/wiki/Category:Critically_endangered_plants

i tried by the code

select p.page_title from page p,page p1 where p.page_id in (select cl_from from categorylinks where cl_to = "Critically endangered plants") and p.page_namespace=0 and p1.page_namespace=1 and p1.page_title=p.page_title;
Matěj Suchánek (talkcontribs)

Would joining the page_props table satisfy your needs?

SELECT ..., pp_value
[...]
LEFT JOIN page_props ON pp_page = p.page_id AND pp_propname = 'page_image_free'
Reply to "how to get articles with images of a category"

Unknown column 'el_to' in 'field list'

3
Ameisenigel (talkcontribs)

Has anyone an idea what I can do to resolve the error "Unknown column 'el_to' in 'field list'" for the following query:

use commonswiki_p;

SELECT concat("M",page_id) AS MID, "P6305",

concat('"',REPLACE(el_to,'https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom&TicketNumber=',''),'"') AS OTRS,

    count(el_to) as num

FROM page

JOIN categorylinks c1 ON page_id=c1.cl_from AND c1.cl_to='Files_with_PermissionTicket_template_but_without_P6305_SDC_statement'

JOIN templatelinks ON page_id=tl_from AND tl_target_id = 242

JOIN externallinks ON page_id=el_from AND el_to LIKE 'https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom&TicketNumber=%'

group by MID

having num=1
Matěj Suchánek (talkcontribs)

el_to has been split into el_to_domain_index and el_to_path. Note that el_to_domain_index is an inverted index. Your query should now look like WHERE el_to_domain_index = 'https://org.wikimedia.ticket.' AND el_to_path LIKE '/otrs/index.pl?Action=AgentTicketZoom&TicketNumber=%'.

Ameisenigel (talkcontribs)

Thank you! That worked.

Miraburu (talkcontribs)
Matěj Suchánek (talkcontribs)

I think you should add AND rev_parent_id = 0. Then, you will have 1:1 relationship and won't need GROUP BY.

Miraburu (talkcontribs)

Thanks!

ZI Jony (talkcontribs)

I'm trying to run this query, I also tried this and this to get results, but both executed with no data! According to the external links search there is some data to be replaced.

Matěj Suchánek (talkcontribs)

el_to_domain_index is an inverted index to allow search within parent domain. el_to_domain_index = 'http://org.banglapedia.en.' will select all links to http://en.banglapedia.org.

ZI Jony (talkcontribs)
Matěj Suchánek (talkcontribs)

It seems that WHERE ... OR ... AND page_namespace = 0 is interpreted as WHERE ... OR (... AND page_namespace = 0). You need to add parentheses like WHERE (... OR ...) AND page_namespace = 0.

ZI Jony (talkcontribs)

Thanks, it’s working well.