Topic on Talk:Quarry

How to fetch revision text content

2
Summary by Frostly

Quarry does not have access to revision text. There are XML dumps which may have the information, and generally using the Action API to fetch revision text is a reasonable thing to do as well.

Ywats0ns (talkcontribs)

Hello, I'd like to do some analysis for all recent revision that contain a certain keyword/validate a regex. I have been able to fetch the recent revision up to 30 days with the action API, but this requires a lots of resource I guess, as I have to call the API for every revision. Is there a way to access this trough quarry or trough a dump of the revisions ?

Thanks a lot

BDavis (WMF) (talkcontribs)

Quarry does not have access to revision text as this part of the production database is not included in the wikitech:Wiki Replicas service. There are XML dumps which may have the information you are interested in. See meta:Data dumps for more information about that service. Generally using the Action API to fetch revision text is a reasonable thing to do as well. API:Etiquette has some advice on how to make your Action API requests more friendly to the servers by doing things like setting a custom user-agent that gives contact information and using the maxlag parameter to avoid piling on when the database is under greater than normal replication pressure.