Talk:Quarry

Jump to navigation Jump to search

About this board

Template:Wikitext Talk Page Converted To Flow Discussion area for discussion about Quarry itself and help about individual queries.

Землеройкин (talkcontribs)
Zhuyifei1999 (talkcontribs)

It'd probably dead already. No status being updated is a consequence of quarry running out of memory while trying to store results.

RhinosF1 (talkcontribs)
Milimetric (WMF) (talkcontribs)

Just to clarify, do you mean:

Let the list of people on that page be L. You want to find this set:

{user | user <- L and user has not made any edits on this wiki in 5 years}

What about other wikis? Also, what do you mean by "remove them", like remove them from the list?

To find users that haven't made any edits in one wiki you could do:

 select user_id
   from user
  where user_id in (... your list ...) 
    and user_id not in
        (select distinct rev_user
           from revision
          where rev_timestamp > '20140301000000'
        )
;

But that subquery would be impossibly slow on enwiki, so this is a heavy query. Not sure how to optimize it off the top of my head. Might just be better to make a temporary table somewhere out of the users that did edit in the past 5 years and query that.

RhinosF1 (talkcontribs)

Editors in last 5 years would also work. That query failed (https://quarry.wmflabs.org/query/34487). Just enwiki is what we need. We need to end up with a list of users who have/have not edited in the past 5 years and are on that page.

Milimetric (WMF) (talkcontribs)

you substituted "https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Apple_Inc./Subscribe" where I have (... your list ...) in the query, but that's not how Quarry works. Quarry just executes SQL, it doesn't now how to read the wiki and parse the user ids out of it. You'd have to do that manually, or build a query to get a list of user_ids based on user names. You can also make a list of user names that looks like this:

('user name 1', 'user name 2', ...) and pass that in the query above as `user_name in ('user name 1', 'user name 2')`.

RhinosF1 (talkcontribs)

Sorry for the late reply, The query was done in the end via production for us

Saeidpourbabak (talkcontribs)

Hi,

This simple quarry should return some pages in enwiki which have no fa interwiki. It works for articles, but not for templates (the results for templates have fa interwiki). Can somebody help?

Matěj Suchánek (talkcontribs)

I couldn't find an example in the results. Note that there may be a problem in the database, like a deferred update.

Saeidpourbabak (talkcontribs)
Matěj Suchánek (talkcontribs)

Looks like Wikidata doesn't do database updates properly. I did a null edit on that page and now the entry is in the database. Unfortunately, you will need to do this on every page if you want accurate results.

Saeidpourbabak (talkcontribs)

Thank you, but isn't there any more efficient and standard way to fix the problem? The quarry I referred to was just an example I made to show the issue. The quarry that I used and encountered problem with has more than 13k results, which performing null edit on that number of pages is impractical.

Matěj Suchánek (talkcontribs)

Well, to "fix the problem" you need to somehow trigger a new database update. And as a casual user, the only way (if you're lucky) is really performing a null edit. I will make a bug report on this shortly.

Matěj Suchánek (talkcontribs)
RhinosF1 (talkcontribs)
Milimetric (WMF) (talkcontribs)

I see you already have a few examples like https://quarry.wmflabs.org/query/35057, and I'm wondering what else you'd like help with. One thing I can suggest is that you try to format your queries so they can be more easily read. For example, try this style (which I'm a little too obsessive about, I admit, but it's good)

SELECT COUNT(DISTINCT rc_user)
  FROM recentchanges_userindex
           JOIN
       user                       ON (rc_user = user_id)
           LEFT JOIN
       ipblocks                   ON (rc_user = ipb_user)
 WHERE rc_user != 0
   AND rc_timestamp > DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 30 DAY), '%Y%m%d%H%i%S')
   AND rc_type = 0
   AND rc_bot = 0
   AND ipb_id IS NULL
   AND (user_registration IS NULL OR user_registration < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 1 DAY), '%Y%m%d%H%i%S'))
   AND user_editcount >= 1

And btw, you can write that user registration clause as:

(coalesce(user_registration, '2000') < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 1 DAY), '%Y%m%d%H%i%S'))
RhinosF1 (talkcontribs)

Ah, It wasn't me that made them! Can't remember who! - I was looking for one that just did a number of editors with a logged edit or action in last 30 days @Milimetric (WMF)

RoySmith (talkcontribs)

I want to find all revisions where a given user added a specific word to a page. Is this possible?

The best I can see is to find all revisions a user has made, then grovel over them to generate the diffs for each one and test if the word is in the diff.

Matěj Suchánek (talkcontribs)
Ineuw (talkcontribs)

Am I correct to assume that the text layer of a Wikisource page is encoded as UTF-8? Does this apply to all Wikipedia sites that contain text?

Achim55 (talkcontribs)

Yes and yes as far as I know. Citing Manual:$wgDBTableOptions: "MediaWiki internally always produces data in UTF-8 encoding to be saved in the database."


Subquery succeeds but Join times out

3
PaigePhault (talkcontribs)

I'm trying to get some estimates of the prevalence of WP policy violations, but am seeing timeouts for queries that do joins.

SELECT page.page_title, revision.rev_id, mycomment.comment_text FROM (SELECT comment.comment_text, comment.comment_id FROM comment WHERE comment.comment_text RLIKE '[[:<:]]WP:(OUTING|PRIVACY|DOX)[[:>:]]' LIMIT 10) AS mycomment INNER JOIN revision ON revision.rev_comment_id = mycomment.comment_id INNER JOIN page on page.page_id = revision.rev_page;

The internal select statement finishes in a <cough> reasonable amount of time, but trying to use it in a join fails every time. Any suggestions on how to materialize the subselect perhaps?

Matěj Suchánek (talkcontribs)

What about...?

SELECT page_title, rev_id, comment_text
FROM page
JOIN revision ON rev_page = page_id
JOIN `comment` ON rev_comment_id = comment_id
WHERE rev_comment_id IN (SELECT comment_id FROM `comment` AS c1 WHERE c1.comment_text RLIKE '[[:<:]]WP:(OUTING|PRIVACY|DOX)[[:>:]]')
LIMIT 10;
PaigePhault (talkcontribs)

Missing columns in revision table?

6
Summary by BDavis (WMF)

The Wiki Replica views have changed to remove *_user and *_user_text fields from all tables. See wikitech:News/Actor storage changes on the Wiki Replicas for more information.

Le Deluge (talkcontribs)

I've just tried to run a query that uses rev_user_text, one of the columns in the revision table, and get a failure with :

Unknown column 'rev_user_text' in 'field list'

It runs fine when I remove SELECT rev_user_text from the query. Is this a temporary glitch, or has rev_user_text been permanently removed?

Edgars2007 (talkcontribs)
BDavis (WMF) (talkcontribs)
This post was hidden by Barkeep49 (history)
TheDJ (talkcontribs)
BDavis (WMF) (talkcontribs)

I created phab:T225007 today to track the need updating the Manual. Part of the issue that has happened here is explained (but maybe not well) on wikitech:News/Actor storage changes on the Wiki Replicas is that the Wiki Replica views have changed in advance of MediaWiki changes. I'm not 100% sure which patches have landed in the MediaWiki code base and which are still pending in Gerrit. There definitely is some documentation lag however because even if the legacy user fields have not been remove from MediaWiki core yet, the actor table and related fields have landed.

MisterSynergy (talkcontribs)

Due to the ongoing migration to the actor table, quite a number of queries need modifications. Currently I have 75 queries in my Quarry account, and it would be nice if there was a "Search in my queries" functionality that would allow me to look for e.g. "_user". Would it be possible to add such a search field to Quarry?

Edgars2007 (talkcontribs)

I currently simply use scraping for such purposes (also it acts like backup). But yeah, this would be very, very nice.

There should be thread or phab ticket for this. Something like "meta-quarry" (that is, query the quarry data).

BDavis (WMF) (talkcontribs)

phab:T90509 seems related, but it sounds like you might be hoping for searching of the SQL query body itself too.

Number of people who put specific page into Watchlist

2
Japanologie (talkcontribs)
Matěj Suchánek (talkcontribs)

No, the relevant tables are not public for privacy reasons.