How the MWAPI is technically implemented? IE is it some Blazegraph extension or is there some externeal code in github etc?
Talk:Wikidata Query Service/User Manual/MWAPI
Jump to navigation
Jump to search
Reply to "MWAPI source code or configuration"
Reply to "EntitySearch returns empty result"
Reply to "using MWAPI with wcqs-beta"
Reply to "recursive category members?"
Reply to "read imageinfo metadata?"
Reply to "Document more Input params"
It's a Blazegraph extension that you can find here: https://github.com/wikimedia/wikidata-query-rdf/tree/master/blazegraph/src/main/java/org/wikidata/query/rdf/blazegraph/mwapi
Hi,
I tried to query wikidata with entietySearch and I get no result. A few weeks ago everything was working.
Also the first example in this article does not return any result.
Has anything changed or is this a temporary issue?
same here, entity search return empty. Even when one the examples on the article is used.
The WDQS team have identified the issue and are working on it - see the task here: https://phabricator.wikimedia.org/T263952.
A better place to contact the development team is on Wikidata here - they keep track of this page: https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_Service_and_search#Mwapi_service_not_working_for_the_last_couple_of_days_(September_30)
Hi now that https://wcqs-beta.wmflabs.org is up and running I was experimenting with how to combine SDC SPARQL queries with information stored in SQL database like category membership, presence of specific templates, etc. I could not fine any way with exception of wikibase:mwapi service, I tried
SELECT ?file ?wd ?fileStr {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
bd:serviceParam mwapi:gcmtitle "Category:Artworks with mismatching structured data P6243 property" .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmtype "page" .
bd:serviceParam mwapi:gcmlimit "max" .
bd:serviceParam mwapi:gcmsort "timestamp" .
?pageid wikibase:apiOutputItem mwapi:pageid.
?ns wikibase:apiOutput "@ns".
}
#?file schema:contentUrl ?url .
FILTER (?ns = "6") # files only
BIND (replace(str(?pageid),'http://www.wikidata.org/entity/','https://commons.wikimedia.org/entity/M') as ?fileStr)
BIND (str(?file) as ?fileStr)
?file wdt:P6243 ?wd .
<nowiki>}</nowiki>
but so far I did not managed to get it to work. I was thinking that since
SELECT ?file ?wd ?fileStr {
BIND (str(?file) as ?fileStr)
?file wdt:P6243 ?wd .
} limit 10
and
SELECT ?fileStr {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
bd:serviceParam mwapi:gcmtitle "Category:Artworks with mismatching structured data P6243 property" .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmtype "page" .
bd:serviceParam mwapi:gcmlimit "max" .
bd:serviceParam mwapi:gcmsort "timestamp" .
?pageid wikibase:apiOutputItem mwapi:pageid.
?ns wikibase:apiOutput "@ns".
}
#?file schema:contentUrl ?url .
FILTER (?ns = "6") # files only
BIND (replace(str(?pageid),'http://www.wikidata.org/entity/','https://commons.wikimedia.org/entity/M') as ?fileStr)
} limit 10
both create ?fileStr like "https://commons.wikimedia.org/entity/M9094174" than I can combine them in order to query SDC statements within a category. Any idea how to get this to work?
I think that just converting the FileStr to URI should make it a proper M-item for SDC. However, my example query below is pretty slow so i think that it may needs to be splitted to two (like here).
SELECT ?file ?p6243 {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
bd:serviceParam mwapi:gcmtitle "Category:Artworks with mismatching structured data P6243 property" .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmtype "page" .
bd:serviceParam mwapi:gcmlimit "max" .
bd:serviceParam mwapi:gcmsort "timestamp" .
?pageid wikibase:apiOutputItem mwapi:pageid.
?ns wikibase:apiOutput "@ns".
}
#?file schema:contentUrl ?url .
FILTER (?ns = "6") # files only
BIND (URI(replace(str(?pageid),'http://www.wikidata.org/entity/','https://commons.wikimedia.org/entity/M')) as ?file)
?file wdt:P6243 ?p6243
} limit 10
With help from other forums I did managed to get the query to work. See c:Commons:SPARQL_query_service/queries/examples#Wikidata_items_of_files_in_Category:Artworks_with_structured_data_with_redirected_P6243_property .
Hi, I tried to fetch revision like this, but i could not figure out how to access to actual content which should be under the key "*"
. Do you know how i should do that?
SOLVED: Example is now fixed based on answer below
SELECT * WHERE {
BIND(wd:Q42 AS ?item)
?item wdt:P18 ?image.
BIND(STRAFTER(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/") AS ?fileTitle)
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "commons.wikimedia.org";
wikibase:api "Generator";
wikibase:limit "once";
mwapi:generator "allpages";
mwapi:gapfrom ?fileTitle;
mwapi:gapnamespace 6; # NS_FILE
mwapi:gaplimit 1;
mwapi:prop "revisions";
mwapi:rvprop "content".
?contentmodel wikibase:apiOutput 'revisions/rev/@contentmodel'.
?contentformat wikibase:apiOutput 'revisions/rev/@contentformat'.
?content wikibase:apiOutput 'revisions/rev/text()' .
}
}
There is no key "*". MWAPI request output in XML format from the API and uses the XPath query language to find the wanted elements in the XML output. The XML has the context as the text in a "rev" element that haves "revisions" as parent element, so you have to add the triple
?content wikibase:apiOutput 'revisions/rev/text()' .
to the "SERVICE wikibase:mwapi" call in your SPARQL query.
It worked! Thank you very much.
Is there an example of how to get the WD items of all members of a WP category recursively? Other tools maybe?
You can do it with PetScan if you need just a list of wikidata id:s
1.) Select target categories and wiki in "Categories" tab 2.) Set "use wiki" value to "Wikidata" in "Other sources" tab so it will fetch the wikidata ids 3.) Select preferred format in "Output" tab
Example query - https://petscan.wmflabs.org/?psid=17439495
Can this be done? e.g. with
bd:serviceParam wikibase:endpoint "commons.wikimedia.org" . bd:serviceParam wikibase:api "Generator" . bd:serviceParam mwapi:generator "imageinfo" . bd:serviceParam mwapi:gcmprop "metadata" . bd:serviceParam mwapi:gcmtitle "File:Iphone 3GS grass.jpg" .
If API returns it as generator, then MWAPI service should support it.
Does that mean it already does (if queried correctly)
or it should do so in the future (once developed)?
Check out https://w.wiki/3p7 - is this something you've been looking for?
Unfortunately, looks like it's a bit tricky to extract metadata itself as it returns multiple values and current MWAPI syntax allows only single value per row (since SPARQL doesn't have arrays or any other structures).
I'm trying to fetch "model" (and "make") from . This to populate d:Property:P2009/d:Property:P2010 from categories at d:Property:P2033.
I tried a couple of ways at https://w.wiki/3rR
None worked.
Try this one: https://w.wiki/3sq
You can see the structure here: https://commons.wikimedia.org/w/api.php?action=query&format=xml&prop=imageinfo&generator=allpages&iiprop=metadata&gapprefix=Iphone%203GS%20grass.jpg&gapnamespace=6 and write the XPath query. Only condition is that it should return single node, it can not process multiple nodes in one variable now.
It seems to work, at least to get the two fields ().
I was trying to get just 1-5 files per category, but that part didn't quite work out.
Is there a way to get only 1-5 results for each category from the categorymembers-generator?
This page and Wikidata Query Service/User Manual#MediaWiki API give examples using the following params:
gsrsearch, gsrlimit, gcmprop, gcmlimit
Where are they documented?
The page says "It is permissible to add input parameters not specified in the configuration, they will be passed to the service query. Please refer to the API documentation for the lists of parameters each service has". I searched in API:Query#Generators and can't find them there.
It would be very useful for SPARQL devs to have a full list of params listed on this page, maybe with links to their definitions in the MW API page.
These are the same parameters you put in actual API request, e.g. when using API sandbox. There's no full list of parameters, because each API has its own parameters and those can be anything. So what I would suggest is using API tool - like API sandbox - first to assemble the API call and ensure it works properly, and then copy the parameter names/values from there to MWAPI call in WDQS.
It would be very useful if you could illustrate finding info in the API sandbox. Eg I wanted to see the params for "Generator" but the sandbox field "action" doesn't have such choice. API:Query#Generators doesn't mention "gsrsearch".
Please make it easier for folk who know SPARQL but not MWAPI to use this exension. Thanks in advance!
3 of 4 examples timed out for me, the fourth returned 0 rows. Is that expected?
No, that's not what is supposed to happen. I'll check. I think some queries need limits now that MWAPI supports continuation, that may be reason for some timeouts.
Examples now all work for me.
As it was discussed earlier on Wikidata, the query
SELECT (IRI(concat("https://commons.wikimedia.org/wiki/", ?creatorTemplate)) as ?creatorLink) ?creatorName ?categoryName ?commonsCatItem ?commonsCatItemLabel {
SERVICE wikibase:mwapi { # list of all creator templates without Wikidata link
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
bd:serviceParam mwapi:gcmtitle "Category:Creator templates without Wikidata link" .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmtype "page" .
bd:serviceParam mwapi:gcmlimit "max" .
bd:serviceParam mwapi:gcmsort "timestamp" .
bd:serviceParam mwapi:gcmdir "descending" .
?creatorTemplate wikibase:apiOutput mwapi:title .
}
hint:Prior hint:runFirst 1 .
SERVICE wikibase:mwapi { # get home category
bd:serviceParam wikibase:api "Categories" .
bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
bd:serviceParam mwapi:titles ?creatorTemplate .
bd:serviceParam mwapi:clshow "!hidden" .
?category wikibase:apiOutput mwapi:category .
}
BIND(substr(?creatorTemplate,9) as ?creatorName ) .
BIND(substr(?category,10) as ?categoryName) .
OPTIONAL {
?commonsCatItem wdt:P373 ?categoryName . # category is linked from Wikidata
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
FILTER ( BOUND(?commonsCatItem) ) .
FILTER ( ?commonsCatItem!=wd:Q24731821 ) .
<nowiki>}</nowiki>
does not return all the results because the generator does not show all the pages from c:Category:Creator templates without Wikidata link. Can we fix it somehow? At the moment I can get different results if I remove gcmsort or gcmdir. I would like to catch all the pages that are in that directory which have home category linked from Wikidara through P373, so I can add link to Wikidata. The current number of pages returned is 500. I would need it be be at least 1300 for the query to work
I think the largest number of results the API can request is 500, this is the API limit. There's 5000 limit for bots, but this I assume requires login. Unfortunately, the service does not support continuations yet (making it in generic way, especially with generators, is kinda complex). I'll add it to the todo list. But in the meantime I assume the best way to work around this would be to fetch the list directly in the application and then issue a series of queries using VALUES clause.
Created https://phabricator.wikimedia.org/T178712 to track it.
I did not try it in this context but with other queries I tried there seems to be a limit on the size of the text of the query (number of characters?), that prevent long lists in VALUES clause. For example when I have externally generated list of q-codes and I want to look up some property for them, I have to do it in batches of less than 200 q-codes for the query to finish. I doubt I would be able to build a query with 1300 page names.
You don't have to put everything in one query. You can run several queries and combine the results.
Continuations are now supported for most API calls.
When using wbsearchentities, like so:
the API returns the results ordered by by relevance. When obtaining these results via SPARQL and modifying them, the order is lost (example). This is understandable, but is there a way to preserve the original order, e.g., by transforming it into an ordinal for use by ORDER BY? If not, should the wbsearchentities API be modified to make it possible to obtain the score for each result?
The practical application here is to modify autocomplete results on-the-fly with a single query, which seems like a great use case for the MWAPI integration into the query service.
I'll look into it. Generally the SPARQL results are not ordered, but if they come from ordered source (e.g. MWAPI) it might be possible to preserve order maybe. I'll check.
Adding score should not be hard if the score is present in result's XML.
Thank you for taking a look! Unfortunately, the wbsearchentities API's XML output does not include a score that could be used for ordering. I think we'd either have to infer an ordinal from the sequence of results somehow, or perhaps optionally add the score to the output on the MediaWiki side.
Yes, the API of entity search does not allow for score currently :( And extracting ordinal number from XML seems non-trivial... I am not sure why results appear out of order - the service delivers them in order, but somewhere inside Blazegraph the order is lost. I'll look into why that happens.
It looks like the order breaks only when join (?item wdt:P31 ?instance) is applied... If you just call the service, the order is preserved. Which makes sense since joins are parallelized and do not guarantee preserving order. It then may be possible to just create simulated variable that returns ordinal - like "?position wikibase:apiOutput mwapi:ordinal" or something like that - for each result. That probably would allow to re-sort them after joins.
Something like that would be excellent, yes, and might help with other queries as well :)
Further tracking in https://phabricator.wikimedia.org/T177275
This is now implemented, see https://phabricator.wikimedia.org/T177275 and Wikidata query service/User Manual/MWAPI#Output
There are no older topics