Talk:Quarry

Jump to navigation Jump to search

About this board

Previous discussion was archived at Talk:Quarry/Archive 1 on 2015-04-17. Discussion area for discussion about Quarry itself and help about individual queries.

How to get ORES article quality predictions for an article?

1
Joe Roe (talkcontribs)

I'm having trouble understand how ORES predictions are represented in the ores_classification table.

For example, using the ORES API to get "articlequality" (57) classifications for enwiki revision 982883941 (https://ores.wikimedia.org/scores/enwiki/?models=articlequality&revids=982883941) returns six scores and a prediction.

But running what should be an identical query (as far as I understand) using quarry returns only one row, with none of the values seeming to match up: https://quarry.wmflabs.org/query/49243.

Can anyone help? What I would ultimately like to do is get the predicted article quality for a set of revision IDs.

Reply to "How to get ORES article quality predictions for an article?"
Summary by 1234qwer1234qwer4

SQL functions are not allowed on Quarry.

1234qwer1234qwer4 (talkcontribs)

My query at https://quarry.wmflabs.org/query/48686 keeps failing with Access denied for user 's52788'@'%' to database 'commonswiki_p'. I'm new to SQL, so I might have an error in my code, but two other queries to the same database did work for me.

Edit:

Matěj Suchánek (talkcontribs)

I am pretty sure SQL functions are not allowed.

1234qwer1234qwer4 (talkcontribs)

Hm, that may be the issue. Do you have any suggestions how to make the query without declaring a function?

Matěj Suchánek (talkcontribs)

Something like:

USE commonswiki_p;
SELECT page_title, COUNT(cl_to) AS panoramiocatcount
FROM page
JOIN categorylinks ON cl_from = page_id
WHERE page_namespace = 6
AND cl_to LIKE 'Photos\_from\_Panoramio\_ID\_%'
GROUP BY page_id
HAVING panoramiocatcount > 1;
1234qwer1234qwer4 (talkcontribs)

Thanks (it works)!

execute command denied for DATE_TRUNC?

2
Ospachi (talkcontribs)

why is this error showing up? execute command denied to user 's52788'@'%' for routine 'enwiki_p.DATE_TRUNC'


I am trying to get a query of monthly active contributors.

BDavis (WMF) (talkcontribs)

DATE_TRUNC is a standard function in Postgres, but it is not available in MySQL. You might want to try a web search for "mysql DATE_TRUNC" to see some possible alternatives.

Reply to "execute command denied for DATE_TRUNC?"

Articles written by a single editor

3
Syced (talkcontribs)

I want to find all enwiki articles whose revisions have all been written by a unique editor (excluding editors with case-insensitive "bot" in their name).

How would you write such a query?

I get Unknown column 'rev_user' in 'where clause' when I try USE enwiki_p; SELECT rev_user FROM revision LIMIT 1

Achim55 (talkcontribs)
Syced (talkcontribs)

Wonderful, thanks a lot :-)

Looking for info about table links

7
Ineuw (talkcontribs)

Is info available on the database table links to other tables? I saw the new schema (very nice) here on MW, and studied the field names but couldn't figure out the links as I have been able. Was there a redesign of the table links in the past couple of years?

Matěj Suchánek (talkcontribs)

What do you mean with "links"? On Quarry, you can use SHOW TABLES; to display all available tables and DESCRIBE some_table; to see the list of columns.

Ineuw (talkcontribs)

You are right. I was used to the old schema where the related fields were graphically linked.

P.S: I found what I was looking for but it was for version 1.20

Matěj Suchánek (talkcontribs)

While I'm more used to the "command" interface, there is also "Database tables" tab on the top the pages on Quarry which provides a link to the page with "Explore database schema" and also to https://sql-optimizer.toolforge.org/ which includes "Schema browser" feature.

Ineuw (talkcontribs)

Is there anyone here can tell me which table to use to link page and user tables? Thanks in advance.

If it's not known, perhaps someone can point me to a list of users' published queries, to study them?

Matěj Suchánek (talkcontribs)

The trick is to normalize spaces/underscores.

Either:

SELECT * FROM page JOIN user ON page_namespace = 2 AND user_name = REPLACE(page_title, '_', ' ')

or

SELECT * FROM user JOIN page ON page_namespace = 2 AND page_title = REPLACE(user_name, ' ', '_')
Ineuw (talkcontribs)

I am just late, but did not forget. Thanks for the great help.

Articles written by a single editor: How to improve query speed?

3
Syced (talkcontribs)

In my quest to find all articles that have only been edited by one human editor, I wrote this query:

USE enwiki_p;
SELECT page_title FROM (
  SELECT p.page_title, r.rev_actor, a.actor_name
  FROM (
    SELECT page_title, page_len, page_id
    FROM page
    WHERE page_namespace = 0 # Mainspace
      AND NOT page_is_redirect
  ) AS p # All mainspace pages
  LEFT JOIN revision_userindex r ON r.rev_page = p.page_id
  LEFT JOIN actor a ON r.rev_actor = a.actor_id
  WHERE NOT IS_IPV4(a.actor_name) # Ignore IP editors and bots
    AND NOT IS_IPV6(a.actor_name)
    AND LOWER(a.actor_name) NOT LIKE '%bot%'
    AND LOWER(a.actor_name) NOT LIKE '%script%'
) AS pra
GROUP BY (page_title)
HAVING COUNT(rev_actor) < 2 # Only 1 editor

Problem: It times out.
Question: How to make it run faster?

For instance, to improve speed I have thought about skipping pages that have more than 50 revisions, but I am not sure how to implement it.

Saeidpourbabak (talkcontribs)
Syced (talkcontribs)

Thanks, your is faster indeed :-)

Reply to "Articles written by a single editor: How to improve query speed?"

Something wrong again with Excel XLSX download format

3
Jarekt (talkcontribs)

A while ago there was an issue with Excel XLSX download format, which was fixed by cleaning some temp files from somewhere. It seems to be happening again and an attempt to open generated spreadsheets gives invalid file format or extension error.

Framawiki (talkcontribs)

Hello Jarekt, can you confirm the problem was solved in the meantime ? Thanks!

Jarekt (talkcontribs)

yes it is solved

Reply to "Something wrong again with Excel XLSX download format"

Case-insensitive search of usernames

3
AlanM1 (talkcontribs)

1. I did select user_name from user where user_name like '%chandandolui%'; and got two results: Iamchandandolui and Mr.chandandolui.

2. I then did select user_name from user where UPPER(user_name) like '%CHANDANDOLUI%'; which ran for about a minute and returned no rows. Shouldn't I get at least the same two rows? I know of at least two more (Chandandolui and Chandandolui1) that should match the case-insens search.

3. If I run select user_name from user where UPPER(user_name) like '%chandandolui%';, which shouldn't match anything I think, it returns the same two rows as the first case.

I'm a little rusty but what am I missing? —[AlanM1(talk)]— 08:35, 5 September 2020 (UTC)

Matěj Suchánek (talkcontribs)

user_name is varbinary. According to , something like UPPER(CONVERT(user_name USING utf8mb4)) could do the trick.

AlanM1 (talkcontribs)

Thanks! The varbinary type didn't occur to me, but I can guess at the reason. en:WP:VPT came up with using RLIKE '(?i)chandandolui', which seem like an easier solution for me to remember (regex comparisons were always on my SQL wishlist when I was working with it :) ). —[AlanM1(talk)]— 11:26, 5 September 2020 (UTC)

Reply to "Case-insensitive search of usernames"

Which tables are publicly accessible?

3
Suriname0 (talkcontribs)

Hi, I really appreciate the great database schema, but it seems to me that some of the tables are not public. For example, it looks like watchlist information is not public, I assume to prevent vandalism. Is there a version of the schema that includes info about what tables are public-facing, or am I confused about the tables in some way? Thank you! Otherwise, looks like I can use any table returned by SHOW tables;

BDavis (WMF) (talkcontribs)

You should indeed be able to use anything that show tables; lists.

The best documentation of how the Wiki Replicas differ from the raw operational schema of the production wikis is found at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Replica_database_schema_(tables_and_indexes).

The watchlist table specifically is hidden from the Wiki Replica databases because details of who is watching which page are not public on the wikis. A logged in user can only see their own watchlist and anonymous users cannot see any watchlists at all.

There is an open ticket about restoring a derived table that once existed on the Wiki Replicas that would expose aggregate counts how many people are watching a given page. This information is currently available via the Action API using api.php?action=query&prop=info&inprop=watchers&titles=.... That ticket is currently stalled on the DBA and Cloud Services teams finding a reliable way to manage derived tables across the Wiki Replica fleet.

Suriname0 (talkcontribs)

Thank you, Bryan! Will be interesting to see if that derived table ever goes live. In addition to the technical challenges, I wonder about the risks of exposing low-count pages in namespace 0 to potential vandals.

Need help for a querry.

5
Summary by CreativeC

Achim55 helped me

CreativeC (talkcontribs)

Hi, Xaoflux gently made me a quarry a year ago (quarry:query/33757), but now I would like to find all french translation pages on Meta with a "·" in its content. And where could I find an index of the different databases replicas of commons, outreach, species, mv, fr.wikt... to fork the quary for different wikis ?

Thank you for the help

Achim55 (talkcontribs)
CreativeC (talkcontribs)

Thank you but how would I do such thing ?

Achim55 (talkcontribs)

Can you give an example of a page you want to be found? Did a simple search for intitle:/fr insource:/·/ fit your needs?

CreativeC (talkcontribs)

I didn't know the "insource" parameter in the research ! It perfectly resolve my problem ! Thank you !