On enwiki, I noticed that when I query the Page table, it gives me only local File: pages but not Commons File: pages used on enwiki. In what table do I find info on those? Sometimes they are in WikiProjects by way of their talk pages, so I'd like to connect them to see which of these files are in a WikiProject.
Talk:Quarry
Non-local File pages are not stored in the wiki's local database. Instead they are exposed via remote access to the foreign wiki's database either directly (true for most Wikimedia hosted wikis -> Commons) or via API (InstantCommons). Pages like Manual:$wgForeignFileRepos give some clues about how this works on the backend.
If I'm understanding your use case, I think it is similar to the problems described by T268240: Provide a mechanism for detecting duplicate files in commons and a local wiki and/or T267992: Provide mechanism to detect name clashed media between Commons and a Local project, without needing to join tables across wiki-db's. The TL;DR there is that it is not possible to perform an SQL join across the backing database partitions for enwiki and commonswiki. This type of computation needs to currently be done in custom software that makes separate requests to both sides of the partition instead.
Thank you for your quick response. I have created a workaround that assumes that the matching File: page is there, as oddly enough, their associated File talk: pages are tracked in the Page table.
File talk: pages would be local. Nice find! :)
I'm trying to display the list of articles with the most links on English Wikipedia: https://quarry.wmcloud.org/query/80666. I've run this query on other, smaller wikis, and results look good to me. However, on enwiki_p, it eventually times out and stops. How can I optimize it? Thank you!
You are asking the database to create in memory, a new calculated column, based on every single link, for each article (7 million). That won’t / can’t scale. The solution is to download a local copy of the database, and in batches of x articles, run and store the calculation, and then run your top x query against those calculated results.
Even 'DESCRIBE pagelinks' throws 'Error: View 'commonswiki_p.pagelinks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them'. Other wiki's DBs are fine.
What a miracle, now it works again...
Less of a miracle than things working as intended. phab:T352010#9551548 was the bug report in a location that folks would be looking for it, 4 minutes later phab:T352010#9551559 was the realization of what was needed to correct the issue, and finally phab:T352010#9551564 6 minutes after the initial report was the DBA asking for verification that the fix worked.
See phab:T352010. Maintenance probably.
Is there a way to show a Quarry result of wiki article titles and have those articles linkable from the result?
Hi!
I have a query that counts all uploaded files and total size (https://quarry.wmcloud.org/query/65557). How do I change the code, so that the query only counts the files until a specific upload time and date (like "count uploaded files until upload date 2023-12-31").
Thank you! ~~~~
https://quarry.wmcloud.org/query/80983
Technically, this is: those files for which the 'last' uploaded version was uploaded before that time.
Thank you very much, this is what I looked for :) --~~~~
Hi Supports, How to find the missing or empty Infobox parameters, for e-g, Infobox Indian constituency this template have multiple parameters, find this "| constituency_no" and "| constituency_no = " is blank/empty. - IJohnKennady (talk) 17:22, 20 January 2024 (UTC)
Infobox parameters are not indexed by the database.
Is there any wiki tools available to find??? -
CirrusSearch, a.k.a. the native search functionality, comes to my mind. In particular, its hastemplate:
and insource:
features.
I want to get articles with images of a the English Wikipedia category. Please help.
https://en.wikipedia.org/wiki/Category:Critically_endangered_plants
i tried by the code
select p.page_title from page p,page p1 where p.page_id in (select cl_from from categorylinks where cl_to = "Critically endangered plants") and p.page_namespace=0 and p1.page_namespace=1 and p1.page_title=p.page_title;
Would joining the page_props
table satisfy your needs?
SELECT ..., pp_value [...] LEFT JOIN page_props ON pp_page = p.page_id AND pp_propname = 'page_image_free'
Has anyone an idea what I can do to resolve the error "Unknown column 'el_to' in 'field list'" for the following query:
use commonswiki_p; SELECT concat("M",page_id) AS MID, "P6305", concat('"',REPLACE(el_to,'https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom&TicketNumber=',''),'"') AS OTRS, count(el_to) as num FROM page JOIN categorylinks c1 ON page_id=c1.cl_from AND c1.cl_to='Files_with_PermissionTicket_template_but_without_P6305_SDC_statement' JOIN templatelinks ON page_id=tl_from AND tl_target_id = 242 JOIN externallinks ON page_id=el_from AND el_to LIKE 'https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom&TicketNumber=%' group by MID having num=1
el_to
has been split into el_to_domain_index
and el_to_path
. Note that el_to_domain_index
is an inverted index. Your query should now look like WHERE el_to_domain_index = 'https://org.wikimedia.ticket.' AND el_to_path LIKE '/otrs/index.pl?Action=AgentTicketZoom&TicketNumber=%'
.
Thank you! That worked.
How I can sort by submited timestamp? [https://quarry.wmcloud.org/query/76871]
I think you should add AND rev_parent_id = 0
. Then, you will have 1:1 relationship and won't need GROUP BY
.
Thanks!
I'm trying to run this query, I also tried this and this to get results, but both executed with no data! According to the external links search there is some data to be replaced.
el_to_domain_index
is an inverted index to allow search within parent domain. el_to_domain_index = 'http://org.banglapedia.en.'
will select all links to http://en.banglapedia.org.
Matěj Suchánek thanks, now this one working, but AND page_namespace=0
didn’t filter only main namespace pages!
It seems that WHERE ... OR ... AND page_namespace = 0
is interpreted as WHERE ... OR (... AND page_namespace = 0)
. You need to add parentheses like WHERE (... OR ...) AND page_namespace = 0
.
Thanks, it’s working well.