I'm trying to find all pages created by a bot many years ago, and am limited to only those pages that happen to have been edited recently (~3 months from what I gather, like the recent changes table). [https://tools.wmflabs.org/sigma/created.py Pages created search] might do the trick, but there are so many pages that it would have to be a series of automated grabs, which I haven't been able to do in AWB via regular GetHTML'ing. (please ping)
About this board
Discussion area for discussion about Quarry itself and help about individual queries.
How to find old pages created beyond the time horizon of the revision table?
revision table is not limited by time (unless the pages were deleted since).
Manual:revision table is ambiguous about that ("Items in the recentchanges table are periodically purged; those in the revision table are typically kept longer or indefinitely"). The evidence I have for this is that I ran a query sometime before 14 March 2018, and again sometime after 14 March 2018, and en:Diplothrix showed up in the new query, but not in the old, because the last time it was edited prior to 14 March 2018 was 19 June 2017.
Creating new genus with Diplothrix legatus in it. is matched by neither
'%list of judges%'. (quarry:query/26168)
I never said it did. I ran a different query, https://quarry.wmflabs.org/query/25510.
Silly attempt, I admit. Anyway, everything looks correct to me: quarry:query/26169.
Once the page is edited it will show up in the revision table for some period (< 9 months in the case of en:Diplothrix). Showing that it exists in the table now is meaningless. If that's the best input you can provide, I suggest letting someone more knowledgeable & willing to help respond.
I'm sorry if it looks as if I'm not willing to help you (but I am). Is there any solid evidence that I could actually see? I'm sure
revision rows are only purged when the page is deleted. (You can run
SELECT * FROM revision WHERE rev_timestamp < 20170705154900 LIMIT 25 to see that even older rows are in the database.)
I can't be any more clear than in my original post and my first response.
@Tom.Reding, I'm a little late here, but just in case others see this:
- When a page is deleted, all of its revisions are moved from the revision table into the archive table.
- When a page is restored, its revisions are moved back from the archive table into the revision table
- The revision_userindex view is faster to query than the revision table (better indices).
- So it's definitely possible for you to find revisions in the revision table, have them disappear on delete, and reappear on restore. It's more rare, but it's possible. Nevertheless, records are never deleted from the revision table for no good reason. If you want to get a complete picture, take a look at the archive table. By the way, you can also use archive_userindex for speed.
Here's a query that shows early creations for that user and uses revision_userindex as an example:
And by the way I agree that the Manual help text is ambiguous, we should really update that.
Get Wikidata Page's Related changes
Hello, Please help me. I want to Related changes from 2018-01-26 00 hours to 2018-01-27 23:59 of User:Titodutta/Events/2018 Republic Day of India. like https://www.wikidata.org/wiki/Special:RecentChangesLinked?hidebots=1&hidecategorization=1&target=User%3ATitodutta%2FEvents%2F2018_Republic_Day_of_India&limit=500&days=30&urlversion=2 but from 2018-01-26 00 hours to 2018-01-27 23:59.
Remind me, does "RelatedChanges" get edits to all pages that link to "User:Titodutta/Events/2018 Republic Day of India"?
I think this gets it: https://quarry.wmflabs.org/query/26400 It looks like there we no edits during that timespan.
Hello, User:Halfak (WMF)
a) Yes, "RelatedChanges" get edits to all pages that link to a page. For example, kindly see Special:RecentChangesLinked/MediaWiki
b) Yes, edits were made on the 2 days. There were a small edit-a-thon as well.
BETWEEN "2018012600" AND "2018012824" seems to be incorrect. I'd say you need to provide all values, including minutes and seconds.
It turns out you don't need to provide all of the values. You just need the prefixes. This is due to how string comparison works. I've updated my query to show that it works for a wider date range. See: https://quarry.wmflabs.org/query/26400
This query was giving correct results before, and now it is showing items that already have labels in Arabic as well as those who doesn't. What changed? and can any one help fixing it? Thanks.
You shouldn't query for
term_entity_id anymore. Use
@Matěj Suchánek any idea of how to fix it? :)
Thank you @Matěj Suchánek. :)
Issue with displaying results
This query finds parent and child categories of a particular article. It works properly when displaying either child or parent in different runs (by editing the 3rd line), but fails to display both in one run and gets stuck. Any idea what is wrong? Thanks.
P.S: I noticed that as the header it writes "cl_to" (and not "chld.cl_to"). Does it give a hint?
Re-running a query automatically
Is there an accepted programmatic way to keep a query updated? I have a query that I'd like to be able to keep updated automatically on an hourly or daily basis, but it doesn't seem like there's an easy way to ask Quarry to re-run a query programmatically. I could call the <code>/api/query/run</code> endpoint from my script, but that seems really hacky—that would require that my script a) re-upload the entire SQL statement every time, and b) be authenticated. Is there a better way I could be doing this? Thanks.
I would suggest getting Toolforge account.
It's been almost three years, but we still don't know how to implement it in Quarry.
Tracking Wikidata redirects
I have a list of 250 Wikidata Item ID which I suspect of being redirects. What would be the easiest way to check if they are and to what page they are redirecting. For example for Q817614 I would like to get Q15123417
Something like this:
USE wikidatawiki_p; SELECT page_title, rd_title FROM page LEFT JOIN redirect ON rd_from = page_id WHERE page_namespace = 0 AND page_title IN ( "Q1", "Q2", # ... )
Excellent, That worked great. Thanks
Help with Query
I was wondering if anyone can help me with a query i am trying to run on Quarry. I would like to get a dataset of all Oil & Gas companies and their (name, location, total assets, total revenues, total reserves). Eventually i would also like to get a list of oil and gas service providers with the same parameters/columns.
Anyway anyone can help me build this query or guide me on how to create it? I am new to sql and i don't understand the database schema very well for wikipedia.
thanks again for your help
Thanks Matej. I have actually tried using Wikidata query service and the results were not consistent. I will post my question their along with the query i executed and see if anyone can help me refine it.
thanks again for your help.
I didn't check the fist query, but the second has 11298269 rows in set, 3816 warnings (11 min 55.68 sec). Quarry simply can't store so many results.
Thanks! I'll fix the number of records problem by adding a HAVING clause. What about the warnings? Are they related to any error in the query?
Nevermind. There were multiple errors in the query. These are fixed now.
Any way to get the 'current time' as part of a result output?
I would like to get the time when a quarry was ran last as a part of the result output (either as a column or at header row or just as some element somewhere in the result set). Is there any way to accomplish this? I tried the GETDATE() or CURRENTTIME() SQL functions, but the server would instantly deny executing such functions. :(
Try SELECT now(); or did I misunderstand?
Great! That's the one!
One more: Pls see this: [https://quarry.wmflabs.org/query/8677] Is there anyway I can cut this code into smaller size? Thanks in advance.
Deleting my old SQL statements
I have a number of old unpublished, and most likely non-working practice SQL statements. Is it possible for me to delete them?
You can blank them.
Unfortunately, blanking them does nothing. On reopening the page, the SQL statements are still there
Did you click Submit Query after blanking?
Now, that makes sense! Thank you. Happy holidays.
Again, Returning to mention that I cleared the SQL and submitted the query as told, but the empty queries remained on the list.
You cannot delete done queries. One can only blank them for to "hide" the used code.
Can I request deletion of the empty queries? There is at least 25 practice queries from which which I removed the SQL statements.
You can't really delete any queries - even if you blank them they stay in the database (and we might expose the history at some point)
A better way that I practice is to reuse those old / blank queries instead of creating new ones whenever required. This saves list space on one's history watch list.