Talk:Quarry

Jump to navigation Jump to search

About this board

Previous discussion was archived at Talk:Quarry/Archive 1 on 2015-04-17. Discussion area for discussion about Quarry itself and help about individual queries.

Jarekt (talkcontribs)

Some queries of Wikidata can not be made in SPARQL, for example I would like to query a time of creation of a specific property of specific item (Lets try P31 of Q1). I could use help with the query itself and with documentation of Wikidata SQL schema. Also is there an easy way of searching Quarry for Wikidata queries?

Uziel302 (talkcontribs)
Reply to "Wikidata queries"

Query results not updated

2
Justanothersgwikieditor (talkcontribs)

I used a few queries to do some wiki gnoming so I usually just submit the same query again to get an updated list of results. I noticed since a few days back the results are not updated at all. It still showed the same results even though it should be lesser now.

Is there a caching issue or something else?

Achim55 (talkcontribs)
Reply to "Query results not updated"

Request bad queries deletion

3
Ineuw (talkcontribs)
Framawiki (talkcontribs)

Hi, perhaps they were already fixed by another admin, but they are no longer shown as Running.

Note that seeing Running for a long time likely just means status was not updated, due to an error. Query owner can just click on Run button again.

Ineuw (talkcontribs)

Thanks. That's what I thought, but it remained running and when I copied the sql statement into another query, it repeated the same.

Reply to "Request bad queries deletion"

databases schema does not match to database fields

5
Summary by BDavis (WMF)

See Manual:Database layout and Manual:Database layout/diagram for MediaWiki core schema documentation.

Ineuw (talkcontribs)
BDavis (WMF) (talkcontribs)
Ineuw (talkcontribs)

Thanks for the info. It's much appreciated. My interest is limited to the English Wikisource. I am flying blind trying to extract data. Perhaps there is current list of the table structures used by Wikisource?

BDavis (WMF) (talkcontribs)
Ineuw (talkcontribs)

Does the pagelinks table not get purged of links that no longer exist?

5
MSUGRA (talkcontribs)
Matěj Suchánek (talkcontribs)

The redirect page contains some navboxes, the links seem to come from those...

MSUGRA (talkcontribs)

Ah right, I missed those. Thanks for clearing that up!

MSUGRA (talkcontribs)

On a related note: the redirect table is said to only track redirects that have been changed after 2007. Without deeper knowledge of the subject, I wouldn't necessarily assume that old redirects on Wikipedia have been touched since then. That's why I opted to reconstruct them from the pagelinks+page tables instead. Someone on IRC said they think every page will have experienced an automated update of some sort since then, prompting them to be tracked.


Do you know more about this? Can I rely on the redirect table tracking all (relevant) redirects the way the pagelinks table does?

Matěj Suchánek (talkcontribs)

This is news for me. But we can quickly check if all redirects have an entry:

SELECT page.* FROM page
LEFT JOIN redirect ON rd_from = page_id
WHERE page_is_redirect = 1 AND rd_title IS NULL
Reply to "Does the pagelinks table not get purged of links that no longer exist?"

Wikidata query returning unfiltered results

4
Summary by BDavis (WMF)

The <code>wb_terms</code> table has been deprecated. See d:Wikidata:PC#Important for tool maintainers: last steps of wb terms table and phab:T221767 for more details.

FShbib (talkcontribs)

Hi. I use this quarry to find pages that don't have Arabic label at Wikidata, but it's started showing them all, and not filtering items that already have labels. Any idea why, and how to fix it? Thanks in advance. @Edgars2007 :)

Matěj Suchánek (talkcontribs)
FShbib (talkcontribs)

Thank you @Matěj Suchánek, is there a replacement for this quarry that can do the same task?

BDavis (WMF) (talkcontribs)

@FShbib phab:T221767 may have some ideas for you on how to adjust your queries. That would be a reasonable place to ask the Wikidata folks for help if you don't find what you need too.

Query help needed to get page name, industry and founded.

3
2601:600:8780:66D0:A517:F6A7:D074:4C5D (talkcontribs)

Can someone pls help me with a query that gets name(page name), industry and founded from wiki pages for companies. Can be limited to US, if it improves performance.

Ex: https://en.wikipedia.org/wiki/Microsoft

Company Name/Page Name: Microsoft

Industry: Software developement (Can list all or one industry)

Founded: April 4 1975 (Year is fine)

Need this list for al us based companies.

Matěj Suchánek (talkcontribs)

Data in wiki pages cannot be queried (except for categories). For querying Wikidata, d:Wikidata:Request a query is the place to ask for queries.

Padi2020 (talkcontribs)

Thanks a lot for quick response and appropriate link.

Землеройкин (talkcontribs)
Zhuyifei1999 (talkcontribs)

It'd probably dead already. No status being updated is a consequence of quarry running out of memory while trying to store results.

RhinosF1 (talkcontribs)
Milimetric (WMF) (talkcontribs)

Just to clarify, do you mean:

Let the list of people on that page be L. You want to find this set:

{user | user <- L and user has not made any edits on this wiki in 5 years}

What about other wikis? Also, what do you mean by "remove them", like remove them from the list?

To find users that haven't made any edits in one wiki you could do:

 select user_id
   from user
  where user_id in (... your list ...) 
    and user_id not in
        (select distinct rev_user
           from revision
          where rev_timestamp > '20140301000000'
        )
;

But that subquery would be impossibly slow on enwiki, so this is a heavy query. Not sure how to optimize it off the top of my head. Might just be better to make a temporary table somewhere out of the users that did edit in the past 5 years and query that.

RhinosF1 (talkcontribs)

Editors in last 5 years would also work. That query failed (https://quarry.wmflabs.org/query/34487). Just enwiki is what we need. We need to end up with a list of users who have/have not edited in the past 5 years and are on that page.

Milimetric (WMF) (talkcontribs)

you substituted "https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Apple_Inc./Subscribe" where I have (... your list ...) in the query, but that's not how Quarry works. Quarry just executes SQL, it doesn't now how to read the wiki and parse the user ids out of it. You'd have to do that manually, or build a query to get a list of user_ids based on user names. You can also make a list of user names that looks like this:

('user name 1', 'user name 2', ...) and pass that in the query above as `user_name in ('user name 1', 'user name 2')`.

RhinosF1 (talkcontribs)

Sorry for the late reply, The query was done in the end via production for us

Saeidpourbabak (talkcontribs)

Hi,

This simple quarry should return some pages in enwiki which have no fa interwiki. It works for articles, but not for templates (the results for templates have fa interwiki). Can somebody help?

Matěj Suchánek (talkcontribs)

I couldn't find an example in the results. Note that there may be a problem in the database, like a deferred update.

Saeidpourbabak (talkcontribs)
Matěj Suchánek (talkcontribs)

Looks like Wikidata doesn't do database updates properly. I did a null edit on that page and now the entry is in the database. Unfortunately, you will need to do this on every page if you want accurate results.

Saeidpourbabak (talkcontribs)

Thank you, but isn't there any more efficient and standard way to fix the problem? The quarry I referred to was just an example I made to show the issue. The quarry that I used and encountered problem with has more than 13k results, which performing null edit on that number of pages is impractical.

Matěj Suchánek (talkcontribs)

Well, to "fix the problem" you need to somehow trigger a new database update. And as a casual user, the only way (if you're lucky) is really performing a null edit. I will make a bug report on this shortly.

Matěj Suchánek (talkcontribs)