Talk:Quarry

Jump to: navigation, search

About this board

Discussion area for discussion about Quarry itself and help about individual queries.

How to find old pages created beyond the time horizon of the revision table?

11
Tom.Reding (talkcontribs)

I'm trying to find all pages created by a bot many years ago, and am limited to only those pages that happen to have been edited recently (~3 months from what I gather, like the recent changes table). [https://tools.wmflabs.org/sigma/created.py Pages created search] might do the trick, but there are so many pages that it would have to be a series of automated grabs, which I haven't been able to do in AWB via regular GetHTML'ing. (please ping)

Matěj Suchánek (talkcontribs)

revision table is not limited by time (unless the pages were deleted since).

Tom.Reding (talkcontribs)

Manual:revision table is ambiguous about that ("Items in the recentchanges table are periodically purged; those in the revision table are typically kept longer or indefinitely"). The evidence I have for this is that I ran a query sometime before 14 March 2018, and again sometime after 14 March 2018, and en:Diplothrix showed up in the new query, but not in the old, because the last time it was edited prior to 14 March 2018 was 19 June 2017.

Matěj Suchánek (talkcontribs)

Creating new genus with Diplothrix legatus in it. is matched by neither '%congress%', '%fjc.gov%' nor '%list of judges%'. (quarry:query/26168)

Tom.Reding (talkcontribs)
Matěj Suchánek (talkcontribs)
Tom.Reding (talkcontribs)

Once the page is edited it will show up in the revision table for some period (< 9 months in the case of en:Diplothrix). Showing that it exists in the table now is meaningless. If that's the best input you can provide, I suggest letting someone more knowledgeable & willing to help respond.

Matěj Suchánek (talkcontribs)

I'm sorry if it looks as if I'm not willing to help you (but I am). Is there any solid evidence that I could actually see? I'm sure revision rows are only purged when the page is deleted. (You can run SELECT * FROM revision WHERE rev_timestamp < 20170705154900 LIMIT 25 to see that even older rows are in the database.)

Tom.Reding (talkcontribs)

I can't be any more clear than in my original post and my first response.

Milimetric (WMF) (talkcontribs)

@Tom.Reding, I'm a little late here, but just in case others see this:

  • When a page is deleted, all of its revisions are moved from the revision table into the archive table.
  • When a page is restored, its revisions are moved back from the archive table into the revision table
  • The revision_userindex view is faster to query than the revision table (better indices).
  • So it's definitely possible for you to find revisions in the revision table, have them disappear on delete, and reappear on restore. It's more rare, but it's possible. Nevertheless, records are never deleted from the revision table for no good reason. If you want to get a complete picture, take a look at the archive table. By the way, you can also use archive_userindex for speed.

Here's a query that shows early creations for that user and uses revision_userindex as an example:

https://quarry.wmflabs.org/query/26524

Milimetric (WMF) (talkcontribs)

And by the way I agree that the Manual help text is ambiguous, we should really update that.

Reply to "How to find old pages created beyond the time horizon of the revision table?"
Jayprakash12345 (talkcontribs)
Halfak (WMF) (talkcontribs)

Remind me, does "RelatedChanges" get edits to all pages that link to "User:Titodutta/Events/2018 Republic Day of India"?

Halfak (WMF) (talkcontribs)
Titodutta (talkcontribs)
Matěj Suchánek (talkcontribs)

BETWEEN "2018012600" AND "2018012824" seems to be incorrect. I'd say you need to provide all values, including minutes and seconds.

Halfak (WMF) (talkcontribs)

It turns out you don't need to provide all of the values. You just need the prefixes. This is due to how string comparison works. I've updated my query to show that it works for a wider date range. See: https://quarry.wmflabs.org/query/26400

Reply to "Get Wikidata Page's Related changes"
FShbib (talkcontribs)

Hi.

This query was giving correct results before, and now it is showing items that already have labels in Arabic as well as those who doesn't. What changed? and can any one help fixing it? Thanks.

Matěj Suchánek (talkcontribs)

You shouldn't query for term_entity_id anymore. Use term_full_entity_id instead.

FShbib (talkcontribs)
Matěj Suchánek (talkcontribs)
FShbib (talkcontribs)
Saeidpourbabak (talkcontribs)

Hello,

This query finds parent and child categories of a particular article. It works properly when displaying either child or parent in different runs (by editing the 3rd line), but fails to display both in one run and gets stuck. Any idea what is wrong? Thanks.

P.S: I noticed that as the header it writes "cl_to" (and not "chld.cl_to"). Does it give a hint?

Reply to "Issue with displaying results"
IagoQnsi (talkcontribs)

Is there an accepted programmatic way to keep a query updated? I have a query that I'd like to be able to keep updated automatically on an hourly or daily basis, but it doesn't seem like there's an easy way to ask Quarry to re-run a query programmatically. I could call the <code>/api/query/run</code> endpoint from my script, but that seems really hacky—that would require that my script a) re-upload the entire SQL statement every time, and b) be authenticated. Is there a better way I could be doing this? Thanks.

Edgars2007 (talkcontribs)
Zhuyifei1999 (talkcontribs)
Reply to "Re-running a query automatically"
Jarekt (talkcontribs)

I have a list of 250 Wikidata Item ID which I suspect of being redirects. What would be the easiest way to check if they are and to what page they are redirecting. For example for Q817614 I would like to get Q15123417

Matěj Suchánek (talkcontribs)

Something like this:

USE wikidatawiki_p;
SELECT page_title, rd_title FROM page
LEFT JOIN redirect ON rd_from = page_id
WHERE page_namespace = 0
AND page_title IN (
 "Q1", "Q2", # ...
)
Jarekt (talkcontribs)

Excellent, That worked great. Thanks

Nickshel81 (talkcontribs)

Hi Everyone,

I was wondering if anyone can help me with a query i am trying to run on Quarry. I would like to get a dataset of all Oil & Gas companies and their (name, location, total assets, total revenues, total reserves). Eventually i would also like to get a list of oil and gas service providers with the same parameters/columns.

Anyway anyone can help me build this query or guide me on how to create it? I am new to sql and i don't understand the database schema very well for wikipedia.

thanks again for your help

Matěj Suchánek (talkcontribs)
Nickshel81 (talkcontribs)

Thanks Matej. I have actually tried using Wikidata query service and the results were not consistent. I will post my question their along with the query i executed and see if anyone can help me refine it.

thanks again for your help.

MarioGom (talkcontribs)

I've got two queries stuck for 20 hours. They show up as "running", but clicking "explain" shows "Hmm... Is the SQL actually running?!". These are: query:26180 and query:25311.

Zhuyifei1999 (talkcontribs)

I didn't check the fist query, but the second has 11298269 rows in set, 3816 warnings (11 min 55.68 sec). Quarry simply can't store so many results.

MarioGom (talkcontribs)

Thanks! I'll fix the number of records problem by adding a HAVING clause. What about the warnings? Are they related to any error in the query?

MarioGom (talkcontribs)

Nevermind. There were multiple errors in the query. These are fixed now.

Any way to get the 'current time' as part of a result output?

3
Viswaprabha (talkcontribs)

I would like to get the time when a quarry was ran last as a part of the result output (either as a column or at header row or just as some element somewhere in the result set). Is there any way to accomplish this? I tried the GETDATE() or CURRENTTIME() SQL functions, but the server would instantly deny executing such functions. :(

Achim55 (talkcontribs)

Try SELECT now(); or did I misunderstand?

Viswaprabha (talkcontribs)
Reply to "Any way to get the 'current time' as part of a result output?"
Ineuw (talkcontribs)

I have a number of old unpublished, and most likely non-working practice SQL statements. Is it possible for me to delete them?

Achim55 (talkcontribs)

You can blank them.

Ineuw (talkcontribs)

Unfortunately, blanking them does nothing. On reopening the page, the SQL statements are still there

Achim55 (talkcontribs)

Did you click Submit Query after blanking?

Ineuw (talkcontribs)

Now, that makes sense! Thank you. Happy holidays.

Ineuw (talkcontribs)

Again, Returning to mention that I cleared the SQL and submitted the query as told, but the empty queries remained on the list.

Achim55 (talkcontribs)

You cannot delete done queries. One can only blank them for to "hide" the used code.

Ineuw (talkcontribs)

Can I request deletion of the empty queries? There is at least 25 practice queries from which which I removed the SQL statements.

YuviPanda (talkcontribs)

You can't really delete any queries - even if you blank them they stay in the database (and we might expose the history at some point)

Viswaprabha (talkcontribs)

A better way that I practice is to reuse those old / blank queries instead of creating new ones whenever required. This saves list space on one's history watch list.

Reply to "Deleting my old SQL statements"