Talk:Quarry

Jump to navigation Jump to search

About this board

Previous discussion was archived at Talk:Quarry/Archive 1 on 2015-04-17. Discussion area for discussion about Quarry itself and help about individual queries.

GeoffreyT2000 (talkcontribs)
BDavis (WMF) (talkcontribs)

We consider wmflabs.org hostnames to be "legacy" these days. Wmcloud.org hostnames are the common replacement. The Cloud VPS instances running Quarry were recently replaced with newer versions and updating the hostname for the service was a convenient side effect. As you noticed, the legacy hostname and URLs still work, but they will be redirected to the new hostname. There are more updates for Quarry planned for the coming months, see https://techblog.wikimedia.org/2021/09/02/digging-deeper-into-quarry/.

Matěj Suchánek (talkcontribs)
Reply to "New URL"
Hogü-456 (talkcontribs)

At the Wikimania Hackathon there was a discussion about Quarry and PAWS and how they could be improved. One suggestion was as I understand the Etherpad https://etherpad.wikimedia.org/p/Wikimania-2021-PAWS-and-Quarry-discussion to create something like VizQuery for querying Quarry. Has someone of you expierence with creating an user interface for an specific query. So that it is possible to enter values into a form instead of editing the SQL-Text of the Query directly. Such an Interface should include possible values that can be selected through a drop down menue or with a search function. I think a drop down menue is enough. From my point of view this is helpful if it exists.

TheDJ (talkcontribs)

I too think that having autocomplete for tables and columns, as well as explanations about what each of them are, is probably already a very significant improvement. Can probably be built quite easily with CodeMirror. Not visual, but it's a step forward.

Reply to "User Interface for Queries"

New way to enter a database name

11
GeoffreyT2000 (talkcontribs)

Goodbye USE databasename; and hello typing databasename in the box above the word "SQL".

So, if you have any old queries, then before running them again, you should remove the "USE" statement at the top and instead type in the database name in the box above the word "SQL".

Bdijkstra (talkcontribs)

Is the box going to have a text next to it saying that it's for the database name? Also, nothing currently happens when I click 'Submit'.

CommanderWaterford (talkcontribs)

Let me say that this is absolute bullsh** now, no matter which DB Name I enter ...nothing happens..query never gets executed.

Matěj Suchánek (talkcontribs)

There are reports of some problems, please be patient or share yours there.

GregorB (talkcontribs)

This is OK I guess, although I don't see any real benefit. Maybe if database prompt was a drop down list.

Matěj Suchánek (talkcontribs)

Such a drop down list would be around 900 items long...

Wbm1058 (talkcontribs)

I don't have a USE databasemame; in my query, which used to work before and now is broken without explanation.

Where can I find a list of these 900 darabases?

What is the name of the English Wikipedia database?

Wbm1058 (talkcontribs)

That's "databases". How can I edit my post to correct a typo?

Wbm1058 (talkcontribs)

I see it worked with "enwiki". You could make that the default to give ppl a clue

KylieTastic (talkcontribs)

I had an old query that cross referenced commonswiki_p and enwiki_p to find redirects on enwiki hiding commons files. Is there a new way to reference multiple databases, or have we just lost this ability?

Matěj Suchánek (talkcontribs)

It's lost (forever).

Reply to "New way to enter a database name"

Query using LAG() takes a long time

3
RoySmith (talkcontribs)

I want to look at the intervals between successive revisions. I tried (against enwiki_p):

select rev_id, rev_timestamp, lag(rev_timestamp) over (order by rev_id)
from revision
limit 5;

but that's running for a long time; I assume it'll time out eventually. It's as if it were doing a full table scan. What am I doing wrong?

BDavis (WMF) (talkcontribs)

The explain I got by using https://sql-optimizer.toolforge.org/ looks like index would be used, but your query has no WHERE clause to limit it's scope. The LIMIT clause truncates the returned results, but having an ORDER BY involved requires the database engine to produce and order the data set before truncating to the 5 desired results. The sheer size of the enwiki revision table (913,325,481 rows) makes operations on it without WHERE filtering very time and memory expensive, especially when ORDER BY is also involved.

Results

id select_type table type possible_keys key key_len ref rows Extra
1.1 SIMPLE revision index page_timestamp 20 913325481 Using index; Using temporary
RoySmith (talkcontribs)

OK, that makes sense. I assumed that since rev_timestamp was indexed, it would be smart enough to do the right thing, but I guess not.

Reply to "Query using LAG() takes a long time"
Whatamidoing (WMF) (talkcontribs)

Is there a way to delete one of my queries? Or if I don't want an endless collection of broken/useless/duplicate/test queries, is the only solution to continuously re-write one sandbox query?

Achim55 (talkcontribs)

AFAIK there is no way to delete them. But you can blank the query, as it's done by many users.

BDavis (WMF) (talkcontribs)
Reply to "Deleting queries"

wikidatawiki_p.* not working

5
Summary by BDavis (WMF)

"cross wiki" joins are no longer possible due to technical changes that we have been forced to make on the backing wiki replica database servers. See wikitech:News/Wiki Replicas 2020 Redesign and phab:T260389 for more detailed information.

ZI Jony (talkcontribs)
BDavis (WMF) (talkcontribs)

wikidatawiki_p is a database name. This query is attempting a "cross wiki join" which is no longer possible due to technical changes that we have been forced to make on the backing wiki replica database servers. See wikitech:News/Wiki Replicas 2020 Redesign and phab:T260389 for more detailed information.

ZI Jony (talkcontribs)

Thanks for your info. Could someone rewrite the quarry, so it can be queried.

Matěj Suchánek (talkcontribs)

No, it isn't possible to do this kind of query using Quarry anymore.

ZI Jony (talkcontribs)

Thank you for response.

Queries not being executed for several hours now

1
Summary by CommanderWaterford

Solved

CommanderWaterford (talkcontribs)

Anyone working on this?

Kanzat (talkcontribs)
Kanzat (talkcontribs)
Kanzat (talkcontribs)

But if someone knows a faster query - please tell me.

Matěj Suchánek (talkcontribs)

You can probably optimize using actor_revision view.

Reply to "Searching unreviewed pages"

Searching all public quarries

2
Bennylin (talkcontribs)

Hi, is it possible to search for all public quarries? Right now we can only browse the most recent ones. Is it not available by design or for some other reasons? TY.

Bennylin (talkcontribs)
Reply to "Searching all public quarries"
91.160.58.193 (talkcontribs)

Hello,

I wonder if my research is possible.

I'm just trying to get a list of the names of all the 'new' persons that have been entered in wiki in the last two months (so I should find there Stella Morris, George Floyd, Derek Chauvin etc... and a lot of people who were unknown 2 months ago).

I suppose I should adapt this query but I don't know how.

Thanks for your help if you can

USE wikidatawiki_p;

SELECT rev_id AS first_edit FROM revision WHERE rev_timestamp BETWEEN "20200501" AND "20200624" ORDER BY RAND() LIMIT 5000;

Quarry User (talkcontribs)

User account creation log is stored on logging table. Try this query!

quarry.wmflabs.org/query/53090

Reply to "New People in Wiki"