Talk:Quarry

Jump to navigation Jump to search

About this board

Discussion area for discussion about Quarry itself and help about individual queries.

Invisigoth67 (talkcontribs)

Hi, I am looking for a substitute for the talk page tool by DrTrigon (not working since a couple of month).It should list the names of all talk pages of the corresponding Wikipedia articles (e.g. in dewiki) that belong to a specified article category (including a depth of subcategorys that can also be scepicified, e.g. 10) and where the talk pages have been changend in the last e.g. 14 days. Is this possible with a query?

Matěj Suchánek (talkcontribs)

Deep category search with SQL is very difficult unless the depth of the category is one or two.

Invisigoth67 (talkcontribs)

So if I use Petscan to give me a list of all sub categorys and the list is below the 65K query length limit and I paste it into the query (e.g. "...where category in ('American 3D films','Australian 3D films','Brazilian 3D films‎'...", will there be a way?

Matěj Suchánek (talkcontribs)

Probably yes:

USE enwiki_p;
SELECT rc_title FROM recentchanges JOIN categorylinks ON cl_from = rc_cur_id
WHERE rc_namespace = 1
AND cl_to IN ('American_3D_films', 'Australian_3D_films', 'Brazilian_3D_films‎', ...)
AND rc_timestamp >= DATE_ADD(NOW(), INTERVAL -14 DAY)
ORDER BY rc_timestamp;
Invisigoth67 (talkcontribs)

Thanks for the query, but I get null results where there should be some. Query 27673, dewiki, article "S-Bahn Wien" in category "Bahnstrecke in Wien" had talk page contributions one week ago.

Matěj Suchánek (talkcontribs)

I forgot to highlight (and wanted to do so) that you need to include underscores (_) instead of spaces in category names.

Invisigoth67 (talkcontribs)

I tried it both with underscores and spaces, but still no results, please see Query 27673.

Matěj Suchánek (talkcontribs)

Uh, I misunderstood your wish and assumed that the talk pages should be in the category, not the articles. This query should work:

USE dewiki_p;
SELECT rc_title, rc_timestamp FROM recentchanges
JOIN page ON rc_namespace = 1 AND rc_title = page_title AND page_namespace = 0
JOIN categorylinks ON cl_from = page_id
WHERE cl_to IN ('Bahnstrecke_in_Wien', 'Musiker_(Wien)')
AND rc_timestamp >= DATE_ADD(NOW(), INTERVAL -14 DAY)
ORDER BY rc_timestamp;
Invisigoth67 (talkcontribs)

Just tested the new query, this is exactly what I was looking for. Thank you very, very much!

Certes (talkcontribs)

https://quarry.wmflabs.org/query/24254 claims to be still running after several days. Should we do something to get rid of it, or was it actually killed after 30 minutes?

Also, please can anyone suggest how to improve the performance? It looks as if it should be efficient, but it doesn't seem to access the page table by the obvious index of namespace+title.

Zhuyifei1999 (talkcontribs)

The query could not store the results because of phab:T170464. The query might not be running slow.

Reply to "Query still running?"
Estopedist1 (talkcontribs)
Edgars2007 (talkcontribs)

INNER JOIN page doesn't have ON statement (the same with templatelinks).

Estopedist1 (talkcontribs)
Edgars2007 (talkcontribs)
use commonswiki_p;
select CONCAT ("# [[:File:",gil_to,"]] > [[:et:",gil_page_title,"]]")  
from globalimagelinks gil
INNER JOIN page 
ON gil_to=page_title
INNER JOIN templatelinks tl
ON tl_from = page_id
where tl_title = 'Uncategorized' AND gil_wiki = 'etwiki' 
limit 100
Estopedist1 (talkcontribs)

This code is identical to my example. This quarry doesn't work. Error:

You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ' ' at line 9
Edgars2007 (talkcontribs)
Estopedist1 (talkcontribs)

Yeah, maybe the space at the end was the problem. Now it is working. Big thanks to Latvia. By the way I live in Estonia (Igaunija) :)

Edgars2007 (talkcontribs)

Yes, I understood that from query. We Baltics have to stick together :)

XXN (talkcontribs)

Is it possible to find in Quarry pages in a wiki containing HTML comments? As table with page text is unreachable, I suppose this is not possible...

YuviPanda (talkcontribs)

Yes, there is no way to search through text with quarry, unfortunately.

Edgars2007 (talkcontribs)

You can do a wikisearch: insource:/\<!--/i Will find those pages, which has the opening tag.

XXN (talkcontribs)

is an alternative. Thank you.

Ineuw (talkcontribs)

I have a number of old unpublished, and most likely non-working practice SQL statements. Is it possible for me to delete them?

Achim55 (talkcontribs)

You can blank them.

Ineuw (talkcontribs)

Unfortunately, blanking them does nothing. On reopening the page, the SQL statements are still there

Achim55 (talkcontribs)

Did you click Submit Query after blanking?

Ineuw (talkcontribs)

Now, that makes sense! Thank you. Happy holidays.

Ineuw (talkcontribs)

Again, Returning to mention that I cleared the SQL and submitted the query as told, but the empty queries remained on the list.

Achim55 (talkcontribs)

You cannot delete done queries. One can only blank them for to "hide" the used code.

Ineuw (talkcontribs)

Can I request deletion of the empty queries? There is at least 25 practice queries from which which I removed the SQL statements.

YuviPanda (talkcontribs)

You can't really delete any queries - even if you blank them they stay in the database (and we might expose the history at some point)

Viswaprabha (talkcontribs)

A better way that I practice is to reuse those old / blank queries instead of creating new ones whenever required. This saves list space on one's history watch list.

Smartse (talkcontribs)

Can anyone suggest a way to improve the performance of https://quarry.wmflabs.org/query/24717 ?

It seems like a relatively simple query - listing the revision lengths and times of a specific article - but while it ran yesterday (very slowly), today it failed completely.

Zhuyifei1999 (talkcontribs)
Smartse (talkcontribs)

I don't know what that means but I can see how it's an improvement! It also helps if I use an article that I didn't delete yesterday!

Smartse (talkcontribs)

Sorry for being an SQL noob, but does this mean that in general it is faster if you use more 'and' statements? I can see how searching for the title in this case might be slow, so does setting "page_namespace=0" mean that the table is quickly shortened and then only that is searched for the title, rather than all 7 million rows? The simpleton in me thought that more statements would be slower.

Zhuyifei1999 (talkcontribs)

It depends on what you are searching on. In this case, the name_title UNIQUE INDEX means that there can be at most one row that match both the page title and namespace, and finishes the search once a single row is found. There are also special areas / code / storage optimized for searching when both namespace and title are specified.

Smartse (talkcontribs)

That makes more sense. Thanks for your help.

Recentchanges table getting cleared?

3
Abbe98 (talkcontribs)

I'm not able to get items from the recentchanges table that have a timestamp value prior to January 2018. Is this table getting cleared? How would I retrieve changes prior to January 2018?

Example COUNT query of recentchanges(last run in 2017): https://quarry.wmflabs.org/query/23105

Returned 136380 rows.

The same query that I ran today: https://quarry.wmflabs.org/query/25408

Returned 1720 rows

Thanks!

Edgars2007 (talkcontribs)
Abbe98 (talkcontribs)

Thanks! that explains the difference between "revisions" and "recent changes" ;-)

132.73.198.144 (talkcontribs)

Hey guys,

Is there a way to know the number of entries/rows of a table stright away without running a count(*) query?

Right now I'm trying to understand the size of templatelinks but could be relevant for any other table.

Thanks in advance!

Bsdla (talkcontribs)

Can anybody please help? any clue? I'm sure the table size is mentioned somewhere.

Thanks

Milimetric (WMF) (talkcontribs)

The database engine keeps track of estimated table sizes but you can't access those through Quarry because the information is in the system database. Giving access to the system database is not a great idea, so select count(*) is about the only way you can see the size of the table.

Halfak (WMF) (talkcontribs)

You can do something to get the count more quickly. If you run a query like this: https://quarry.wmflabs.org/query/23874

SELECT COUNT(*) FROM revision;

And then click on "Explain" while the query executes, you'll see that the query engine predicts that it will need to scan 716678936 rows -- which is a good approximation of how tall the table is.

Bsdla (talkcontribs)

Thank you both.

Halfak, I'll try that.

List of users with at least 10 edits

4
Jon Harald Søby (WMNO) (talkcontribs)

Hi! We're planning on running a survey on the Northern Sami Wikipedia, and I think the most efficient way to get a list of who to get in touch with would be to run a query on Quarry. But I don't know how to construct such a query (never used SQL…), so I'm wondering if someone could help me. What I want is basically just a list of all registered users who have made more than 10 edits ever. Can someone help me with that?

Edgars2007 (talkcontribs)
Edgars2007 (talkcontribs)

Note that user_editcount let's say isn't very precise, it could be some(?) edits off the real numbers. But for this purpose it probably will be fine.

Jon Harald Søby (WMNO) (talkcontribs)
IKhitron (talkcontribs)

Hi. Is there a reason that page_id is defined as varchar and not unsigned, so order by does not work? Thank you.

IKhitron (talkcontribs)

Or, maybe, quarry casts it. Because I added casts everywhere, and it still does not work.

Zhuyifei1999 (talkcontribs)

Example of a broken query is welcome.

IKhitron (talkcontribs)

Sure. See the first commented query in 3338. Thank you.

Zhuyifei1999 (talkcontribs)
IKhitron (talkcontribs)

I expected the minimum id1 to be 7, not 5724.

Zhuyifei1999 (talkcontribs)
MariaDB [ruwiki_p]>       select
    ->         page_title,
    ->         cast(page_id as unsigned) as id
    ->       from page
    ->       where page_namespace = 0
    ->         and not page_is_redirect
    ->       order by cast(page_id as unsigned) asc
    -> limit 10;
+---------------------------+----+
| page_title                | id |
+---------------------------+----+
| Литва                     |  7 |
| Россия                    |  9 |
| Слоновые                  | 10 |
| Мамонты                   | 11 |
| Красная_книга             | 15 |
| Соционика                 | 16 |
| Школа                     | 18 |
| Лингвистика               | 20 |
| Социология                | 21 |
| Киевская_Русь             | 27 |
+---------------------------+----+
10 rows in set (14.25 sec)

MariaDB [ruwiki_p]>     select 
    ->       page_title as article, 
    ->       cast(id as unsigned) as id0
    ->     from (
    ->       select
    ->         page_title,
    ->         cast(page_id as unsigned) as id
    ->       from page
    ->       where page_namespace = 0
    ->         and not page_is_redirect
    ->       order by cast(page_id as unsigned) asc
    ->     )f
    -> limit 10;
+------------------------------------------------------------------------------+---------+
| article                                                                      | id0     |
+------------------------------------------------------------------------------+---------+
| !                                                                            |  452225 |
| !!!                                                                          | 1301215 |
| !!_(значения)                                                                | 6062007 |
| !Action_Pact!                                                                |  892368 |
| !T.O.O.H.!                                                                   | 1340518 |
| !_(альбом)                                                                   | 6005448 |
| !ФЕСТ                                                                        | 3682212 |
| "Нурлы"                                                                      | 7419278 |
| $                                                                            | 3607625 |
| $pringfield_(Or,_How_I_Learned_to_Stop_Worrying_and_Love_Legalized_Gambling) | 3321107 |
+------------------------------------------------------------------------------+---------+
10 rows in set (0.00 sec)

Don't expect the ordering in a subquery (especially without limit) to affect the parent query.

IKhitron (talkcontribs)

You mean it's a bug in MySql?

Zhuyifei1999 (talkcontribs)

Whether it's a bug or intended behavior I cannot say, but yes that is the bahavior of MariaDB.

IKhitron (talkcontribs)

Weird. By the way, I succeeded to do this, still using subqueries. Thank you.

Reply to "Varchar"