Talk:Wikidata Query Service/User Manual/MWAPI

Jump to navigation Jump to search

About this board

MWAPI source code or configuration

2
Zache (talkcontribs)

How the MWAPI is technically implemented? IE is it some Blazegraph extension or is there some externeal code in github etc?

Abbe98 (talkcontribs)
Reply to "MWAPI source code or configuration"

EntitySearch returns empty result

3
77.183.100.132 (talkcontribs)

Hi,

I tried to query wikidata with entietySearch and I get no result. A few weeks ago everything was working.

Also the first example in this article does not return any result.

Has anything changed or is this a temporary issue?

2607:FEA8:91E0:1170:C0A9:6CF7:3B40:995D (talkcontribs)

same here, entity search return empty. Even when one the examples on the article is used.

Kdutia (talkcontribs)
Reply to "EntitySearch returns empty result"
Jarekt (talkcontribs)

Hi now that https://wcqs-beta.wmflabs.org is up and running I was experimenting with how to combine SDC SPARQL queries with information stored in SQL database like category membership, presence of specific templates, etc. I could not fine any way with exception of wikibase:mwapi service, I tried

SELECT  ?file ?wd ?fileStr {
  SERVICE wikibase:mwapi {
	 bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:gcmtitle "Category:Artworks with mismatching structured data P6243 property" .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmtype "page" .
     bd:serviceParam mwapi:gcmlimit "max" .
     bd:serviceParam mwapi:gcmsort "timestamp" .
     ?pageid wikibase:apiOutputItem mwapi:pageid.
     ?ns     wikibase:apiOutput "@ns".
  }
  #?file schema:contentUrl ?url .
  FILTER (?ns = "6") # files only
  BIND (replace(str(?pageid),'http://www.wikidata.org/entity/','https://commons.wikimedia.org/entity/M')  as ?fileStr)
  BIND (str(?file)  as ?fileStr)
  ?file wdt:P6243 ?wd .
<nowiki>}</nowiki>

Try it!

but so far I did not managed to get it to work. I was thinking that since

SELECT  ?file ?wd ?fileStr {
  BIND (str(?file)  as ?fileStr)
  ?file wdt:P6243 ?wd .
} limit 10

Try it!

and

SELECT  ?fileStr {
  SERVICE wikibase:mwapi {
	 bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:gcmtitle "Category:Artworks with mismatching structured data P6243 property" .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmtype "page" .
     bd:serviceParam mwapi:gcmlimit "max" .
     bd:serviceParam mwapi:gcmsort "timestamp" .
     ?pageid wikibase:apiOutputItem mwapi:pageid.
     ?ns     wikibase:apiOutput "@ns".
  }
  #?file schema:contentUrl ?url .
  FILTER (?ns = "6") # files only
  BIND (replace(str(?pageid),'http://www.wikidata.org/entity/','https://commons.wikimedia.org/entity/M')  as ?fileStr)
} limit 10

Try it!

both create ?fileStr like "https://commons.wikimedia.org/entity/M9094174" than I can combine them in order to query SDC statements within a category. Any idea how to get this to work?

Zache (talkcontribs)

I think that just converting the FileStr to URI should make it a proper M-item for SDC. However, my example query below is pretty slow so i think that it may needs to be splitted to two (like here).

SELECT  ?file ?p6243 {
  SERVICE wikibase:mwapi {
	 bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:gcmtitle "Category:Artworks with mismatching structured data P6243 property" .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmtype "page" .
     bd:serviceParam mwapi:gcmlimit "max" .
     bd:serviceParam mwapi:gcmsort "timestamp" .
     ?pageid wikibase:apiOutputItem mwapi:pageid.
     ?ns     wikibase:apiOutput "@ns".
  }
  #?file schema:contentUrl ?url .
  FILTER (?ns = "6") # files only
  BIND (URI(replace(str(?pageid),'http://www.wikidata.org/entity/','https://commons.wikimedia.org/entity/M'))  as ?file)
  ?file wdt:P6243 ?p6243 
} limit 10

Try it!

Jarekt (talkcontribs)
Reply to "using MWAPI with wcqs-beta"

How I can read content of revision (wikitext) using MWAPI?

3
Summary by Zache

Solved and correct answer was updated to starting message

Zache (talkcontribs)

Hi, I tried to fetch revision like this, but i could not figure out how to access to actual content which should be under the key "*". Do you know how i should do that?

SOLVED: Example is now fixed based on answer below

SELECT * WHERE {
  BIND(wd:Q42 AS ?item)
  ?item wdt:P18 ?image.
  BIND(STRAFTER(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/") AS ?fileTitle)

  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:endpoint "commons.wikimedia.org";
                    wikibase:api "Generator";
                    wikibase:limit "once";
                    mwapi:generator "allpages";
                    mwapi:gapfrom ?fileTitle;
                    mwapi:gapnamespace 6; # NS_FILE
                    mwapi:gaplimit 1;
                    mwapi:prop "revisions";
                    mwapi:rvprop "content".
    ?contentmodel wikibase:apiOutput 'revisions/rev/@contentmodel'.
    ?contentformat wikibase:apiOutput 'revisions/rev/@contentformat'.
    ?content wikibase:apiOutput 'revisions/rev/text()' .
  }
}

Try it!

Dipsacus fullonum (talkcontribs)

There is no key "*". MWAPI request output in XML format from the API and uses the XPath query language to find the wanted elements in the XML output. The XML has the context as the text in a "rev" element that haves "revisions" as parent element, so you have to add the triple


  ?content wikibase:apiOutput 'revisions/rev/text()' .


to the "SERVICE wikibase:mwapi" call in your SPARQL query.

Zache (talkcontribs)

It worked! Thank you very much.

SCIdude (talkcontribs)

Is there an example of how to get the WD items of all members of a WP category recursively? Other tools maybe?

Zache (talkcontribs)

You can do it with PetScan if you need just a list of wikidata id:s

1.) Select target categories and wiki in "Categories" tab 2.) Set "use wiki" value to "Wikidata" in "Other sources" tab so it will fetch the wikidata ids 3.) Select preferred format in "Output" tab

Example query - https://petscan.wmflabs.org/?psid=17439495

Reply to "recursive category members?"

read imageinfo metadata?

11
Jura1 (talkcontribs)

Can this be done? e.g. with

      bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
      bd:serviceParam wikibase:api "Generator" .
      bd:serviceParam mwapi:generator "imageinfo" .
      bd:serviceParam mwapi:gcmprop "metadata" .      
      bd:serviceParam mwapi:gcmtitle "File:Iphone 3GS grass.jpg" .
Smalyshev (WMF) (talkcontribs)

If API returns it as generator, then MWAPI service should support it.

Jura1 (talkcontribs)

Does that mean it already does (if queried correctly)

or it should do so in the future (once developed)?

Smalyshev (WMF) (talkcontribs)

Check out https://w.wiki/3p7 - is this something you've been looking for?


Smalyshev (WMF) (talkcontribs)

Unfortunately, looks like it's a bit tricky to extract metadata itself as it returns multiple values and current MWAPI syntax allows only single value per row (since SPARQL doesn't have arrays or any other structures).

Jura1 (talkcontribs)
Jura1 (talkcontribs)
Smalyshev (WMF) (talkcontribs)

Try this one: https://w.wiki/3sq

Smalyshev (WMF) (talkcontribs)
Jura1 (talkcontribs)

It seems to work, at least to get the two fields ().

I was trying to get just 1-5 files per category, but that part didn't quite work out.

Jura1 (talkcontribs)

Is there a way to get only 1-5 results for each category from the categorymembers-generator?

Reply to "read imageinfo metadata?"
Vladimir Alexiev (talkcontribs)

This page and Wikidata Query Service/User Manual#MediaWiki API give examples using the following params:

gsrsearch, gsrlimit, gcmprop, gcmlimit

Where are they documented?

The page says "It is permissible to add input parameters not specified in the configuration, they will be passed to the service query. Please refer to the API documentation for the lists of parameters each service has". I searched in API:Query#Generators and can't find them there.

It would be very useful for SPARQL devs to have a full list of params listed on this page, maybe with links to their definitions in the MW API page.

Smalyshev (WMF) (talkcontribs)

These are the same parameters you put in actual API request, e.g. when using API sandbox. There's no full list of parameters, because each API has its own parameters and those can be anything. So what I would suggest is using API tool - like API sandbox - first to assemble the API call and ensure it works properly, and then copy the parameter names/values from there to MWAPI call in WDQS.

Vladimir Alexiev (talkcontribs)

It would be very useful if you could illustrate finding info in the API sandbox. Eg I wanted to see the params for "Generator" but the sandbox field "action" doesn't have such choice. API:Query#Generators doesn't mention "gsrsearch".

Please make it easier for folk who know SPARQL but not MWAPI to use this exension. Thanks in advance!

Reply to "Document more Input params"
Papuass (talkcontribs)

3 of 4 examples timed out for me, the fourth returned 0 rows. Is that expected?

Smalyshev (WMF) (talkcontribs)

No, that's not what is supposed to happen. I'll check. I think some queries need limits now that MWAPI supports continuation, that may be reason for some timeouts.

Smalyshev (WMF) (talkcontribs)

Examples now all work for me.

increase maximum numbe of elements returned

6
Jarekt (talkcontribs)

As it was discussed earlier on Wikidata, the query

SELECT (IRI(concat("https://commons.wikimedia.org/wiki/", ?creatorTemplate)) as ?creatorLink) ?creatorName ?categoryName ?commonsCatItem ?commonsCatItemLabel {
  SERVICE wikibase:mwapi { # list of all creator templates without Wikidata link
     bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:gcmtitle "Category:Creator templates without Wikidata link" .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmtype "page" .
     bd:serviceParam mwapi:gcmlimit "max" .
     bd:serviceParam mwapi:gcmsort "timestamp" .
     bd:serviceParam mwapi:gcmdir "descending" .
     ?creatorTemplate wikibase:apiOutput mwapi:title  .
  }
  hint:Prior hint:runFirst 1 . 
  SERVICE wikibase:mwapi { # get home category
     bd:serviceParam wikibase:api "Categories" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:titles ?creatorTemplate .
     bd:serviceParam mwapi:clshow "!hidden" .
     ?category wikibase:apiOutput mwapi:category  .
  }
  BIND(substr(?creatorTemplate,9) as ?creatorName ) .
  BIND(substr(?category,10)       as ?categoryName) .
  OPTIONAL { 
    ?commonsCatItem wdt:P373 ?categoryName . # category is linked from Wikidata
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
  }
  FILTER ( BOUND(?commonsCatItem) ) .
  FILTER ( ?commonsCatItem!=wd:Q24731821 ) .
<nowiki>}</nowiki>

Try it!

does not return all the results because the generator does not show all the pages from c:Category:Creator templates without Wikidata link. Can we fix it somehow? At the moment I can get different results if I remove gcmsort or gcmdir. I would like to catch all the pages that are in that directory which have home category linked from Wikidara through P373, so I can add link to Wikidata. The current number of pages returned is 500. I would need it be be at least 1300 for the query to work

Smalyshev (WMF) (talkcontribs)

I think the largest number of results the API can request is 500, this is the API limit. There's 5000 limit for bots, but this I assume requires login. Unfortunately, the service does not support continuations yet (making it in generic way, especially with generators, is kinda complex). I'll add it to the todo list. But in the meantime I assume the best way to work around this would be to fetch the list directly in the application and then issue a series of queries using VALUES clause.

Smalyshev (WMF) (talkcontribs)
Jarekt (talkcontribs)

I did not try it in this context but with other queries I tried there seems to be a limit on the size of the text of the query (number of characters?), that prevent long lists in VALUES clause. For example when I have externally generated list of q-codes and I want to look up some property for them, I have to do it in batches of less than 200 q-codes for the query to finish. I doubt I would be able to build a query with 1300 page names.

Smalyshev (WMF) (talkcontribs)

You don't have to put everything in one query. You can run several queries and combine the results.

Smalyshev (WMF) (talkcontribs)

Continuations are now supported for most API calls.

Preserve result order from MediaWiki API?

8
Summary by Smalyshev (WMF)

Implemented as: ?ordinal wikibase:apiOrdinal true .

Eloquence (talkcontribs)

When using wbsearchentities, like so:

https://www.wikidata.org/w/api.php?search=las&language=en&uselang=en&format=jsonfm&limit=25&action=wbsearchentities

the API returns the results ordered by by relevance. When obtaining these results via SPARQL and modifying them, the order is lost (example). This is understandable, but is there a way to preserve the original order, e.g., by transforming it into an ordinal for use by ORDER BY? If not, should the wbsearchentities API be modified to make it possible to obtain the score for each result?

The practical application here is to modify autocomplete results on-the-fly with a single query, which seems like a great use case for the MWAPI integration into the query service.

Smalyshev (WMF) (talkcontribs)

I'll look into it. Generally the SPARQL results are not ordered, but if they come from ordered source (e.g. MWAPI) it might be possible to preserve order maybe. I'll check.

Adding score should not be hard if the score is present in result's XML.

Eloquence (talkcontribs)

Thank you for taking a look! Unfortunately, the wbsearchentities API's XML output does not include a score that could be used for ordering. I think we'd either have to infer an ordinal from the sequence of results somehow, or perhaps optionally add the score to the output on the MediaWiki side.

Smalyshev (WMF) (talkcontribs)

Yes, the API of entity search does not allow for score currently :( And extracting ordinal number from XML seems non-trivial... I am not sure why results appear out of order - the service delivers them in order, but somewhere inside Blazegraph the order is lost. I'll look into why that happens.

Smalyshev (WMF) (talkcontribs)

It looks like the order breaks only when join (?item wdt:P31 ?instance) is applied... If you just call the service, the order is preserved. Which makes sense since joins are parallelized and do not guarantee preserving order. It then may be possible to just create simulated variable that returns ordinal - like "?position wikibase:apiOutput mwapi:ordinal" or something like that - for each result. That probably would allow to re-sort them after joins.

Eloquence (talkcontribs)

Something like that would be excellent, yes, and might help with other queries as well :)

Smalyshev (WMF) (talkcontribs)
Smalyshev (WMF) (talkcontribs)
There are no older topics