Talk:Wikidata Query Service/User Manual/MWAPI

Jump to navigation Jump to search

About this board

Jura1 (talkcontribs)

Can this be done? e.g. with

      bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
      bd:serviceParam wikibase:api "Generator" .
      bd:serviceParam mwapi:generator "imageinfo" .
      bd:serviceParam mwapi:gcmprop "metadata" .      
      bd:serviceParam mwapi:gcmtitle "File:Iphone 3GS grass.jpg" .
Smalyshev (WMF) (talkcontribs)

If API returns it as generator, then MWAPI service should support it.

Jura1 (talkcontribs)

Does that mean it already does (if queried correctly)

or it should do so in the future (once developed)?

Smalyshev (WMF) (talkcontribs)

Check out https://w.wiki/3p7 - is this something you've been looking for?


Smalyshev (WMF) (talkcontribs)

Unfortunately, looks like it's a bit tricky to extract metadata itself as it returns multiple values and current MWAPI syntax allows only single value per row (since SPARQL doesn't have arrays or any other structures).

Jura1 (talkcontribs)
Jura1 (talkcontribs)
Smalyshev (WMF) (talkcontribs)

Try this one: https://w.wiki/3sq

Smalyshev (WMF) (talkcontribs)
Jura1 (talkcontribs)

It seems to work, at least to get the two fields ().

I was trying to get just 1-5 files per category, but that part didn't quite work out.

Jura1 (talkcontribs)

Is there a way to get only 1-5 results for each category from the categorymembers-generator?

Reply to "read imageinfo metadata?"
Vladimir Alexiev (talkcontribs)

This page and Wikidata Query Service/User Manual#MediaWiki API give examples using the following params:

gsrsearch, gsrlimit, gcmprop, gcmlimit

Where are they documented?

The page says "It is permissible to add input parameters not specified in the configuration, they will be passed to the service query. Please refer to the API documentation for the lists of parameters each service has". I searched in API:Query#Generators and can't find them there.

It would be very useful for SPARQL devs to have a full list of params listed on this page, maybe with links to their definitions in the MW API page.

Smalyshev (WMF) (talkcontribs)

These are the same parameters you put in actual API request, e.g. when using API sandbox. There's no full list of parameters, because each API has its own parameters and those can be anything. So what I would suggest is using API tool - like API sandbox - first to assemble the API call and ensure it works properly, and then copy the parameter names/values from there to MWAPI call in WDQS.

Vladimir Alexiev (talkcontribs)

It would be very useful if you could illustrate finding info in the API sandbox. Eg I wanted to see the params for "Generator" but the sandbox field "action" doesn't have such choice. API:Query#Generators doesn't mention "gsrsearch".

Please make it easier for folk who know SPARQL but not MWAPI to use this exension. Thanks in advance!

Reply to "Document more Input params"
Papuass (talkcontribs)

3 of 4 examples timed out for me, the fourth returned 0 rows. Is that expected?

Smalyshev (WMF) (talkcontribs)

No, that's not what is supposed to happen. I'll check. I think some queries need limits now that MWAPI supports continuation, that may be reason for some timeouts.

Smalyshev (WMF) (talkcontribs)

Examples now all work for me.

increase maximum numbe of elements returned

6
Jarekt (talkcontribs)

As it was discussed earlier on Wikidata, the query

SELECT (IRI(concat("https://commons.wikimedia.org/wiki/", ?creatorTemplate)) as ?creatorLink) ?creatorName ?categoryName ?commonsCatItem ?commonsCatItemLabel {
  SERVICE wikibase:mwapi { # list of all creator templates without Wikidata link
     bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:gcmtitle "Category:Creator templates without Wikidata link" .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmtype "page" .
     bd:serviceParam mwapi:gcmlimit "max" .
     bd:serviceParam mwapi:gcmsort "timestamp" .
     bd:serviceParam mwapi:gcmdir "descending" .
     ?creatorTemplate wikibase:apiOutput mwapi:title  .
  }
  hint:Prior hint:runFirst 1 . 
  SERVICE wikibase:mwapi { # get home category
     bd:serviceParam wikibase:api "Categories" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:titles ?creatorTemplate .
     bd:serviceParam mwapi:clshow "!hidden" .
     ?category wikibase:apiOutput mwapi:category  .
  }
  BIND(substr(?creatorTemplate,9) as ?creatorName ) .
  BIND(substr(?category,10)       as ?categoryName) .
  OPTIONAL { 
    ?commonsCatItem wdt:P373 ?categoryName . # category is linked from Wikidata
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
  }
  FILTER ( BOUND(?commonsCatItem) ) .
  FILTER ( ?commonsCatItem!=wd:Q24731821 ) .
<nowiki>}</nowiki>

Try it!

does not return all the results because the generator does not show all the pages from c:Category:Creator templates without Wikidata link. Can we fix it somehow? At the moment I can get different results if I remove gcmsort or gcmdir. I would like to catch all the pages that are in that directory which have home category linked from Wikidara through P373, so I can add link to Wikidata. The current number of pages returned is 500. I would need it be be at least 1300 for the query to work

Smalyshev (WMF) (talkcontribs)

I think the largest number of results the API can request is 500, this is the API limit. There's 5000 limit for bots, but this I assume requires login. Unfortunately, the service does not support continuations yet (making it in generic way, especially with generators, is kinda complex). I'll add it to the todo list. But in the meantime I assume the best way to work around this would be to fetch the list directly in the application and then issue a series of queries using VALUES clause.

Smalyshev (WMF) (talkcontribs)
Jarekt (talkcontribs)

I did not try it in this context but with other queries I tried there seems to be a limit on the size of the text of the query (number of characters?), that prevent long lists in VALUES clause. For example when I have externally generated list of q-codes and I want to look up some property for them, I have to do it in batches of less than 200 q-codes for the query to finish. I doubt I would be able to build a query with 1300 page names.

Smalyshev (WMF) (talkcontribs)

You don't have to put everything in one query. You can run several queries and combine the results.

Smalyshev (WMF) (talkcontribs)

Continuations are now supported for most API calls.

Preserve result order from MediaWiki API?

8
Summary by Smalyshev (WMF)

Implemented as: ?ordinal wikibase:apiOrdinal true .

Eloquence (talkcontribs)

When using wbsearchentities, like so:

https://www.wikidata.org/w/api.php?search=las&language=en&uselang=en&format=jsonfm&limit=25&action=wbsearchentities

the API returns the results ordered by by relevance. When obtaining these results via SPARQL and modifying them, the order is lost (example). This is understandable, but is there a way to preserve the original order, e.g., by transforming it into an ordinal for use by ORDER BY? If not, should the wbsearchentities API be modified to make it possible to obtain the score for each result?

The practical application here is to modify autocomplete results on-the-fly with a single query, which seems like a great use case for the MWAPI integration into the query service.

Smalyshev (WMF) (talkcontribs)

I'll look into it. Generally the SPARQL results are not ordered, but if they come from ordered source (e.g. MWAPI) it might be possible to preserve order maybe. I'll check.

Adding score should not be hard if the score is present in result's XML.

Eloquence (talkcontribs)

Thank you for taking a look! Unfortunately, the wbsearchentities API's XML output does not include a score that could be used for ordering. I think we'd either have to infer an ordinal from the sequence of results somehow, or perhaps optionally add the score to the output on the MediaWiki side.

Smalyshev (WMF) (talkcontribs)

Yes, the API of entity search does not allow for score currently :( And extracting ordinal number from XML seems non-trivial... I am not sure why results appear out of order - the service delivers them in order, but somewhere inside Blazegraph the order is lost. I'll look into why that happens.

Smalyshev (WMF) (talkcontribs)

It looks like the order breaks only when join (?item wdt:P31 ?instance) is applied... If you just call the service, the order is preserved. Which makes sense since joins are parallelized and do not guarantee preserving order. It then may be possible to just create simulated variable that returns ordinal - like "?position wikibase:apiOutput mwapi:ordinal" or something like that - for each result. That probably would allow to re-sort them after joins.

Eloquence (talkcontribs)

Something like that would be excellent, yes, and might help with other queries as well :)

Smalyshev (WMF) (talkcontribs)
Smalyshev (WMF) (talkcontribs)
There are no older topics