Topic on Help talk:CirrusSearch

How to export search result

3
Bennylin (talkcontribs)

Hi, I have a search result on wiki "articles without ref tags", and I want to dump/export the list of all the titles from that search. I tried API, but 500 is the limit of each call.

Can anyone help? Thanks beforehand.

DCausse (WMF) (talkcontribs)

Hi, sadly this is not possible.

You can try to make multiple calls to the API using pagination via the API:Continue parameter to gather more than 500 results.

But there will be limits there too, you can't paginate past the 10000th result.

Such limits are in place to protect the service because even using the continue parameter elasticsearch (the underlying search engine used by CirrusSearch) have to keep all the results from the start in memory.

A quick note regarding your query:

-insource:<ref>

The characters < and > will be ignored and what is actually run is

-insource:ref

and thus you might exclude pages that have the word ref used outside a <ref>, e.g. : https://id.wikipedia.org/wiki/Sumber_primer.

If you want to actually search for the < and > characters you have to use the regular expression syntax by wrapping you search text between a pair of / and escaping the < > characters with a \:

-insource:/\<ref\>/

But beware that the query above might not filter pages with named references <ref name="named ref"> or pages where the reference tag is added via a template.

Bennylin (talkcontribs)

Thank you for the answer and the correction!

Reply to "How to export search result"