Talk:Quarry

About this board

Previous discussion was archived at Talk:Quarry/Archive 1 on 2015-04-17. Discussion area for discussion about Quarry itself and help about individual queries.

List of categories which belong to only hidden categories

1
Tjmj (talkcontribs)

I'm trying to get a list of categories which belong to only hidden categories. In other words, when I go to the category page, all the categories listed at the bottom are hidden categories, none of which are normal categories. I manage to get the logic out, but I think the query is not efficient and cannot run in time before being stopped. Wondering if anyone could help with optimizing the query? Thank you!


(zhwiki_p)

SELECT p.page_title

FROM page p

INNER JOIN categorylinks cl

ON p.page_id = cl.cl_from

LEFT JOIN (

    -- 11k list of hidden categories

SELECT p2.page_title, pp.pp_propname

FROM page_props pp

INNER JOIN page p2 ON (p2.page_id = pp.pp_page)

WHERE pp.pp_propname = 'hiddencat'

) hc

ON cl.cl_to = hc.page_title

WHERE p.page_namespace = 14 -- 400k

GROUP BY p.page_title

HAVING COUNT(*) = SUM(CASE WHEN hc.pp_propname IS NULL THEN 0 ELSE 1 END)

Reply to "List of categories which belong to only hidden categories"

Take quarry results and use them in WikiBot

1
PerryPerryD (talkcontribs)

Im trying to take the results of my quarry and utalize them in a WikiBot, The bot would take a random item from the list. How would i do this?

Reply to "Take quarry results and use them in WikiBot"

Research paid contributors (cash bounty)

1
Fulldecent (talkcontribs)

Hello! I am researching paid contributors on English Wikipedia and need help with writing and executing a data query. I offering to pay USD 400 to someone that can complete this project (write and document query, execute query).

You may be able to do this on Quarry, but please also be prepared to download the full dataset (30TB) to make this happen.


The purpose

I'm looking to find all contributors on English Wikipedia that have contributed to living people articles and actually followed the rules. As a shortcut we can also find people that just used the recommended way of disclosing.

  1. Policy Wikipedia:Paid-contribution_disclosure summarizes: "... very strongly advises [paid] editors to both place the {{connected contributor (paid)}} template at the top of the talk page accompanying any paid contributions (and to fill in the parameters), and to supply a clearly visible list of their paid contributions on their main user page. The template {{paid}} can be used for this."


The query

Specifically.

  1. Select (page title, contributor name, most recent contribute date from this contributor to this page)...
  1. for all pages on English Wikipedia that have a "Living people" category link...
  2. that have a talk page with a "Connected_contributor_(paid)" template...
  3. for all contributors on that page with a "paid" template in their user page.


Progress

I have made minor progress on this, please see:

  1. Quarry 63059
  2. Quarry 63063

This narrows it down to just ~6000 articles. You may be able to just access those pages from the archives to accomplish the task.


The offer

This offer payment is valid for 30 days, until 2022-06-11 to the first person that completes this task DELIVERY, and claims in this chat thread to receive payment (also ping me at wikipedia.org at phor.net in case I don't see it). This offer is valid only to people that are eligible to receive payment from the United States using PayPal.


Delivery

Complete, reproducible instructions on how to execute this data project. If necessary, you can upload any scripts, etc. on GitHub. But please assume my mom is executing your instructions on a new MacBook (Xcode is not installed), and she can follow clear instructions but can barely use Excel.


In addition to the instructions, the actual data result.

Reply to "Research paid contributors (cash bounty)"

Check if an IP is globally blocked

2
ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talkcontribs)
BDavis (WMF) (talkcontribs)

The globalblocks table for Wikimedia's wiki farm is kept in the centralauth_p database.

Reply to "Check if an IP is globally blocked"

Stuck queries cannot be stopped

2
GeoffreyT2000 (talkcontribs)

Many queries at https://quarry.wmcloud.org/ are stuck in the "running" state. It likely means that the query is somehow "slow" and takes forever to run or simply refuses to run. When one tries to stop such a query, a "500 Internal Server Error" will be displayed, which likely means that there might be a problem with completing the query.

Dompi4602 (talkcontribs)

I have queries too that keep on running. But I can't stop them. Anyone any idea?

Reply to "Stuck queries cannot be stopped"

Getting "[Errno -2] Name or service not known" on execution of query

3
SuperSonicSpy1 (talkcontribs)

Can't connect to MySQL server on 'enwikipedia.analytics.db.svc.wikimedia.cloud' ([Errno -2] Name or service not known)

Query is on enwikipedia:

SELECT

   pagelinks.pl_from,

   pagelinks.pl_namespace,

   pagelinks.pl_title,

   pagelinks.pl_from_namespace,

   page.page_title

FROM pagelinks

INNER JOIN page ON pagelinks.pl_from = page.page_id;

TheDJ (talkcontribs)
SuperSonicSpy1 (talkcontribs)

Thanks for the help! Sorry I took so long to respond, got distracted by other stuff.

Reply to "Getting "[Errno -2] Name or service not known" on execution of query"

Show and tell: living people with paid contributors

2
151.197.235.186 (talkcontribs)

Showing off queries:

  • Pages with paid contributors: #63063
  • Pages of living persons: #63167
  • Joining those failed #63167, but it's okay you can just run the two above and join offline

What I would really like to do is to generate a table of (a) living persons and (b) paid contributors for that person. I may need to do download each page to do that, having some issues with that part.

Matěj Suchánek (talkcontribs)

The templates are on the respective talk pages, so joining on the page id will fail. And yes, you won't get the contributors from the database, downloading each page using some script will be necessary.

Reply to "Show and tell: living people with paid contributors"

I would like to be able to search in my queries

1
Wurgl (talkcontribs)

I know, that somewhen I wrote a really tricky query using "WITH RECURSIVE" but I sadly have now to open every single query from the last year to find it. A search function would be great.

BTW: Deleting some queries to cleanup the list would be nice too. Thanks

Reply to "I would like to be able to search in my queries"

Number of pages in DjVu files

6
Jarekt (talkcontribs)

To count number of one page PDF file one can use select count(*) from image where img_minor_mime='pdf' and img_metadata like '%\"Pages\";s:1:\"1";%'. I am looking for something similar for DjVu files. Are there better ways for counting pages in multipage files?

TheDJ (talkcontribs)

It seems that DjVu handler retrieves page count from img_metadata as well. It does so by parsing the xml, finding the 'metatree' element and then counting the amount of objects contained within it.

$count = count( $metatree->xpath( '//OBJECT' ) );

https://github.com/wikimedia/mediawiki/blob/master/includes/media/DjVuHandler.php#L292

A better way might be to use the api here, which has defined accessors for this (it's in the dimensions properties of images/media):

https://commons.wikimedia.org/w/api.php?action=query&titles=File:H.M.S.%20Pinafore.djvu&prop=imageinfo&iiprop=timestamp%7Cuser%7Curl%7Cdimensions


Jarekt (talkcontribs)

I know I can access number of pages from lua and I see I can also use API, but I was looking for a way to build SQL query to detect single page DjVu files. I think I got it: quarry:query/32028 seems to work.

Mitar (talkcontribs)

This does not seem to work for all DjVu files. Many seems to have as metadata only:


{

  "blobs": {

   "data": "tt:609532023",

   "text": "tt:609532024"

  },

  "data": []

}


Which are pointers into text table, but it is not really possible to access text table? At least not for me as I am working through SQL dumps and it seems it is not available there.

TheDJ (talkcontribs)
Mitar (talkcontribs)

Yes, I have noticed this. As I said, metadata is now moved to blob storage. But there are no dumps of blob storage? So how does one access metadata now in offline mode?

Reply to "Number of pages in DjVu files"

How would I check if the specific page has been previously deleted?

2
Gone Postal (talkcontribs)

I am trying to find categories, which have been deleted, but still have files in them. The latter part is trivial, but how do I check that the page existed before?

Matěj Suchánek (talkcontribs)

I see two possible ways:

1) check Special:Log (using EXISTS or JOIN) like

WHERE EXISTS (SELECT 1 FROM logging WHERE log_type = 'delete' AND log_action = 'delete' AND log_namespace = 14 AND log_title = cl_to)

2) check page archive

WHERE EXISTS (SELECT 1 FROM archive WHERE ar_title = cl_to AND ar_namespace = 14)
Reply to "How would I check if the specific page has been previously deleted?"