Talk:Quarry

Jump to navigation Jump to search

About this board

Discussion area for discussion about Quarry itself and help about individual queries.

Results not changing for several days?

2
Mike Peel (talkcontribs)

I've noticed that my query at https://quarry.wmflabs.org/query/40539 is only returning a few new results on top of the usual set, while it normally has around 300 new results per day. Is there currently a lag in the database of a few days?

BDavis (WMF) (talkcontribs)

Current replication lag for the Wiki Replica servers can be found at toolforge:replag/. As I write this it appears that the "s4" slice which hosts the commonswiki data is lagged by ~60 hours. That much lag usually means some maintenance job is running on the lead server for that slice. I this case I would guess that phab:T232446 is the reason.

Reply to "Results not changing for several days?"

Get revision by revision comment id

2
Matanel11111 (talkcontribs)

Hello.

How to get revision details by revision comment id in fast way?


I tried the following query, but it's very slow:


USE enwiki_p;

SELECT * from revision WHERE rev_comment_id = 60565 LIMIT 1;


Zhuyifei1999 (talkcontribs)
Reply to "Get revision by revision comment id"
RoySmith (talkcontribs)

How do I get Quarry to generate an HTML table of results? I tried the obvious, "Download Data" and selected "HTML" as the format, but I ended up with all the HTML markup escaped:


<td>&lt;a href=&#34;http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Du_Kang&#34;&gt;Articles_for_deletion/Du_Kang&lt;/a&gt;</td>


I'm running https://quarry.wmflabs.org/query/41231

Zhuyifei1999 (talkcontribs)

I don't think we can allow unescaped HTML for XSS reasons. However, the tabular formats CSV and TSV are mostly unescaped so you might want to try that.

Reply to "Generating HTML from a query?"

How to query the titles of articles of specific category??

1
Madhurimukund (talkcontribs)

Can anyone suggest please??

Reply to "How to query the titles of articles of specific category??"

what is the query for retrieving articles of 8000 bytes in simple english wikipedia?

2
Ramu ummadishetty (talkcontribs)

can i get the query?

Quarryist (talkcontribs)

Is this what you wish?

quarry.wmflabs.org/query/41198

Reply to "what is the query for retrieving articles of 8000 bytes in simple english wikipedia?"

Pages that have been deleted and restored?

5
RoySmith (talkcontribs)

I'm trying to find pages which a given user has deleted, and were subsequently restored (by anybody). This seems like it should be the right query:


select count(*)

from logging_userindex lu

join logging_logindex ll on lu.log_page = ll.log_page

join actor on lu.log_actor = actor_id

where actor_name = 'RoySmith'

and lu.log_type = 'delete'

and lu.log_action = 'delete'

and ll.log_type = 'delete'

and ll.log_action = 'restore'

;


but it's taking forever to run. I'm assuming it's the self-join on logging that's the problem. Is there a better way to do this?

Zhuyifei1999 (talkcontribs)

EXPLAIN:

+------+-------------+---------------------+-------+----------------------------------------------------------------------------+---------------------+---------+------------------------+-----------+------------------------------------+
| id   | select_type | table               | type  | possible_keys                                                              | key                 | key_len | ref                    | rows      | Extra                              |
+------+-------------+---------------------+-------+----------------------------------------------------------------------------+---------------------+---------+------------------------+-----------+------------------------------------+
|    1 | PRIMARY     | actor               | const | PRIMARY,actor_name                                                         | actor_name          | 257     | const                  |         1 |                                    |
|    1 | PRIMARY     | logging             | ref   | type_time,actor_time,log_actor_type_time,log_actor_deleted,log_type_action | log_actor_type_time | 42      | const,const            |     12316 | Using index condition; Using where |
|    1 | PRIMARY     | logging             | ref   | type_time,log_page_id_time,log_type_action                                 | log_page_id_time    | 5       | func                   |         2 | Using index condition; Using where |
|   13 | SUBQUERY    | revision            | ALL   | PRIMARY                                                                    | NULL                | NULL    | NULL                   | 778735536 | Using where                        |
|   13 | SUBQUERY    | revision_actor_temp | ref   | PRIMARY,revactor_rev                                                       | PRIMARY             | 4       | enwiki.revision.rev_id |         1 | Using index                        |
|   12 | SUBQUERY    | logging             | range | type_time,actor_time,log_actor_type_time,log_actor_deleted,log_type_action | log_actor_type_time | 42      | NULL                   |     15784 | Using index condition; Using where |
|   11 | SUBQUERY    | recentchanges       | ref   | rc_actor,rc_actor_deleted                                                  | rc_actor_deleted    | 8       | const                  |       878 | Using where; Using index           |
|   10 | SUBQUERY    | filearchive         | ref   | fa_actor_timestamp,fa_actor_deleted                                        | fa_actor_deleted    | 8       | const                  |        15 | Using where; Using index           |
|    9 | SUBQUERY    | oldimage            | ref   | oi_actor_timestamp,oi_actor_deleted                                        | oi_actor_deleted    | 8       | const                  |         1 | Using where; Using index           |
|    8 | SUBQUERY    | image               | ref   | img_actor_timestamp                                                        | img_actor_timestamp | 8       | const                  |        14 | Using index                        |
|    7 | SUBQUERY    | ipblocks            | ref   | ipb_actor_deleted                                                          | ipb_actor_deleted   | 9       | const,const            |       317 | Using index                        |
|    6 | SUBQUERY    | archive             | ref   | ar_actor_timestamp,ar_actor_deleted                                        | ar_actor_deleted    | 8       | const                  |      3268 | Using where; Using index           |
|    5 | SUBQUERY    | user                | const | PRIMARY                                                                    | PRIMARY             | 4       | const                  |         1 | Using index                        |
+------+-------------+---------------------+-------+----------------------------------------------------------------------------+---------------------+---------+------------------------+-----------+------------------------------------+

The most expensive subquery is the ALL on revision, which is a result from the view actor.

A simple fix is with https://quarry.wmflabs.org/query/41110:

+------+-------------+---------+-------+----------------------------------------------------------------------------+---------------------+---------+-------------+-------+------------------------------------+
| id   | select_type | table   | type  | possible_keys                                                              | key                 | key_len | ref         | rows  | Extra                              |
+------+-------------+---------+-------+----------------------------------------------------------------------------+---------------------+---------+-------------+-------+------------------------------------+
|    1 | PRIMARY     | actor   | const | PRIMARY,actor_name                                                         | actor_name          | 257     | const       |     1 | Using index                        |
|    1 | PRIMARY     | logging | ref   | type_time,actor_time,log_actor_type_time,log_actor_deleted,log_type_action | log_actor_type_time | 42      | const,const | 12316 | Using index condition; Using where |
|    1 | PRIMARY     | logging | ref   | type_time,log_page_id_time,log_type_action                                 | log_page_id_time    | 5       | func        |     2 | Using index condition; Using where |
|    5 | SUBQUERY    | logging | range | type_time,actor_time,log_actor_type_time,log_actor_deleted,log_type_action | log_actor_type_time | 42      | NULL        | 15784 | Using index condition; Using where |
+------+-------------+---------+-------+----------------------------------------------------------------------------+---------------------+---------+-------------+-------+------------------------------------+
Zhuyifei1999 (talkcontribs)

Looks like it is still too slow :(

Zhuyifei1999 (talkcontribs)
Zhuyifei1999 (talkcontribs)
Reply to "Pages that have been deleted and restored?"

How to get random quotes from en.wikiquote.org

2
151.238.177.184 (talkcontribs)

How can I extract specific number of random quotes from en.wikiquote.org? Also The final list should contain the name of the person that the quote belongs to.

BDavis (WMF) (talkcontribs)

The Wiki Replica databases contain metadata about pages and edits, but they do not contain the content of articles. The Action API can be used to fetch page content. action=query's list=random can probably be used to do what you are interested in. Try it out in en.wikiquote's API Sandbox.

Reply to "How to get random quotes from en.wikiquote.org"

How do you find the original uploader of a file?

4
Jarekt (talkcontribs)
use commonswiki_p;
SELECT page_title, actor_name
FROM page
JOIN image ON img_name = page_title 
join actor on img_actor=actor_id # get the last uploader
WHERE page_namespace = 6
AND   page_id = 15730

lists the username of the last uploader of File:Mta_station_wall.jpg. How do you look up the name of the original uploader?

Tohaomg (talkcontribs)

Join it with 'revision' table and add the condition 'rev_parent_id = 0' which means that it is a page creation (no previous version). Then find the original uploader using 'rev_actor' field

Edgars2007 (talkcontribs)

Seems to be that only revision table has that information:

use commonswiki_p;
SELECT *
from revision
where rev_page=15730 and rev_parent_id=0
Edgars2007 (talkcontribs)

(that kind of was "edit conflict")

Reply to "How do you find the original uploader of a file?"

how do you find the author of a file?

4
Jarekt (talkcontribs)

Lately I was adding SDC statements P7482=Q66458942 to files that have {{Own}} template on Commons, using https://quarry.wmflabs.org/query/40759 query. User:Schlurcher pointed out cases where people are incorrectly using {{Own}} template for files where person listed as author is not the uploader. I would like to exclude such cases in the future query and find files where I added P7482, but where uploader is not the same as the author listed in {{Information}} template.

https://quarry.wmflabs.org/query/40838 query can find my edits from this month to files and can find the name of the original uploader. Anybody knows how to find the name listed as "author"? Extension:Media Viewer displays "author" so I assume it is stored somewhere. Any ideas?

Edgars2007 (talkcontribs)

You will have to use API (or parsing of desciption pages). Database doesn't have that info. Also Mediaviewer gets author with API: today's POTD example

Jarekt (talkcontribs)

Edgars2007, Thank you for reply. API is not a very useful solution as I was hoping to write SQL query that compares the uploader to the author (listed in {{information}} template). Do SQL queries have access to the wikitext of the files? I could just get the author through regular expressions.

Edgars2007 (talkcontribs)

Yes, I understand that it's not useful, creates a headache etc.  :)

And no, database doesn't have the wikitext of pages.

Reply to "how do you find the author of a file?"
This post was hidden by Achim55 (history)
Reply to "Interior renovations"