Topic on Help talk:CirrusSearch

cirrussearch vs database backup dumps

2
69.191.241.48 (talkcontribs)

Hi - there are two types of dumps available for enwiki pages - monthly database dump structured in XML which you can subscribe to and weekly cirrussearch dumps, which are structured in JSON for bulk upload to elasticsearch. We're trying to diff the two dumps to see if they're comparable, but notice some articles are in the monthly XML dump not in the weekly cirrussearch dump. I'm having trouble finding an explanation in the main wikimedia homepage that clearly states the difference beteween these two enwiki dumps. Any additional information would be much appreciated.

I would post links, but am getting an error when trying to post, so please navigate to dumps.wikimedia.org and look for the extensions

cirrussearch dump: /other/cirrussearch/

xml dump: /enwiki/latest/

Ciencia Al Poder (talkcontribs)

Check if the "missing" articles in cirrus search dumps exist on the live wiki. If not, that means those articles got deleted after the monthly XML dumps but before the weekly cirrus search dumps

Reply to "cirrussearch vs database backup dumps"