Manual talk:Parameters to Special:Export
From MediaWiki.org
The links does not seem to actually work, instead returning always and only the latest revision of the page in question?
- I took a look at the source to Special:Export. It forces anything done with a GET to only get the most current version. I've added a couple of boxes to my own version of Special:Export so that the user can set the "limit" and "offset" parameters; I don't know how to change special pages, though. --En.jpgordon 00:20, 19 January 2007 (UTC)
Hi from Saintrain 19:39, 26 January 2007 (UTC)
- RFC: when limit=-n, dump the previoue n edits with the default being from the current edit.
- Where is the source? How do I use "my own version"?
Thanks Doug
Heya. The source I'm talking about is the MediaWiki source (I got it from the SourceForge project link over there in the resource box.) There's a comment in includes/SpecialExport.php saying // Default to current-only for GET requests, which is where the damage occurs. I imagine it's trying to throttle requests. So I instead made my version by saving the Special:Export page, tweaking it, and running it on my local machine; I only had to adjust one URL to get it to work right.
More fun, though; I wrote a little Python script to loop through and fetch entire article histories, a block of 100 revisions at a time (that being the hardwired limit), catenate them into one long XML, run it through anther filter, and then look at them with the History Flow Visualization Application from IBM.[1]. Pretty.
-
- Hi, I am trying to do the same thing, 100 at a time and then concatenating them for a full history - any way you could share the fixed export and python script? Thanks. Mloubser 11:52, 13 November 2007 (UTC)
We shouldn't need limit=-n, should we? Isn't that what dir and limit should provide? My only problem, though, has been figuring what offset to start with for a backward scan. --En.jpgordon 07:43, 27 January 2007 (UTC)
Thanks for responding.
- Mea culpa! I didn't even see "dir". Thanks.
- The reason I wanted to look at recent history was to find at which edit a particular vandalism happened to see what got vandalized.
- Is there a more straightforward way of looking for a particular word in the history?
Thanks, Doug. Saintrain 04:46, 29 January 2007 (UTC)
-
-
- Y'know, we almost have the tools to do that. The aforementioned historyflow tool knows that information; I just don't think there's a way to glean it from it. --En.jpgordon 00:08, 2 February 2007 (UTC)
-
Contents |
[edit] Move to Manual: namespace?
My feeling is that this page should be at Manual:Parameters to Special:Export rather than here. Tizio 16:04, 26 April 2007 (UTC)
[edit] Discussion
Hi, is there a way to get just the total number of edits an article has had over time? Thanks! —The preceding unsigned comment was added by 87.196.51.250 (talk • contribs) 20:55, 20 September 2007. Please sign your posts with ~~~~!
- As far as I can remember there is no way to get only this number (but I might be wrong). Anyway, this number can probably be easy calculated using the appropriate parameters to API. Tizio 10:02, 21 September 2007 (UTC)
[edit] Parameters no longer in use?
Using either the links provided in the article, or attempting to add my own parameters does not yield the desired results. I can only get the most recent version of the article, regardless of how I set parameters. I've tried it on several computers running Linux or windows, and at different IPs. Same problem, the parameters seem to be ignored. --Falcorian 06:59, 14 January 2008 (UTC)
- I've had it suggested to use curonly=0, but this also has no effect. --Falcorian
- I also found that the links given did not work, nor did any experiments creating my own urls to get the history. However, submitting the parameters via a ruby script did work. I don't know enough yet (about HTTP, html forms) to understand why this approach worked and the url approach did not, but anyway here is some code that successfully retrieved the last 5 changes to the page on Patrick Donner, and writes the output to a file:
res = Net::HTTP.post_form(URI.parse("http://en.wikipedia.org/w/index.php?"), {:title=> "Special:Export", :pages =>'Patrick_Donner', :action => "submit", :limit => 5, :dir => "desc"}) f = File.new("donner_output_last_5.txt", "w") f << res.body f.close
Hope this helps. I wish I knew enough to provide a more general solution. Andropod 00:44, 17 January 2008 (UTC)
- When you use the URL as in a browser, you are submitting via GET. In the above ruby script, you are using POST. This seems the solution, as for example:
curl -d "" 'http://en.wikipedia.org/w/index.php?title=Special:Export&pages=Main_Page&offset=1&limit=5&action=submit'
- worked for me. Before updating this page, I'd like to check this with the source code. Tizio 12:46, 21 January 2008 (UTC)
- Works for me as well, which is great! Now to crack open the python... --Falcorian 03:29, 26 January 2008 (UTC)
- For future reference, I get an access denied error when I try to use urllib alone in python to request the page. However, if I use urllib2 (which allows you to set a custom header), then we can trick Wikipedia into thinking we're Firefox and it will return the page as expected. --Falcorian 06:57, 26 January 2008 (UTC)
- Works for me as well, which is great! Now to crack open the python... --Falcorian 03:29, 26 January 2008 (UTC)
import urllib import urllib2 headers = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4'} # Needs to fool Wikipedia so it will give us the file params = urllib.urlencode({'title': 'Special:Export','pages': 'User:Falcorian', 'action': 'submit', 'limit': 2, }) req = urllib2.Request(url='http://en.wikipedia.org/w/index.php',data=params, headers=headers) f = urllib2.urlopen(req) print f.read()
[edit] Other parameters
I found these parameters in the source code:
- curonly
- appears to override the other parameters and makes only the current version exported
- listauthors
- export list of contributors (?) if $wgExportAllowListContributors is true
- wpDownload
- returns result as a file attachment: http://en.wikipedia.org/w/index.php?title=Special:Export&pages=XXXX&wpDownload
- templates
- images
- (currently commented out in the source code)
I don't know what listauthors does exactly, maybe it's disabled on wikien. Tizio 15:40, 21 January 2008 (UTC)
Also, variable $wgExportMaxHistory is relevant here. Tizio 15:43, 21 January 2008 (UTC)
Also missing: "history" has a different meaning when used in a POST request (use default values for dir and offset, $wgExportMaxHistory for limit). Tizio 15:49, 21 January 2008 (UTC)
[edit] Recursive downloading
Hi, is there some way (without writing my own script) to recursively download the subcategories inside of the categories? I don't want to download the whole wikipedia database dump to get the 10,000 or so pages I want. Thanks, JDowning 17:32, 13 March 2008 (UTC)
[edit] Export has changed
All the examples (and my script which has worked for months) return only the newest version now. Anyone have ideas? --Falcorian 05:58, 24 September 2008 (UTC)

