Manual talk:Parameters to Special:Export

From MediaWiki.org
Jump to: navigation, search

Contents

[edit] The addcat parameter does not work!

I tried to use a POST request generated by cURL with adding the parameter addcat and catname so that wikipedia could export a category of pages in one xml file. But I only got an xml filr lead me to the special: export page with all the page names listed in the blank, which was wired. I thought it might wanted me to download from the page, however, I got file did not exist page after clicked download... Any advice/help is welcome. Thanks in advance.

[edit] links do not seem to actually work

The links do not seem to actually work, instead returning always and only the latest revision of the page in question?

  • I took a look at the source to Special:Export. It forces anything done with a GET to only get the most current version. I've added a couple of boxes to my own version of Special:Export so that the user can set the "limit" and "offset" parameters; I don't know how to change special pages, though. --En.jpgordon 00:20, 19 January 2007 (UTC)
    • Why was it programmed to not allow GET to get anything but the most current version? Tisane 08:19, 23 February 2010 (UTC)

[edit] Where is the source? How do I use "my own version"?

  • RFC: when limit=-n, dump the previoue n edits with the default being from the current edit.
  • Where is the source? How do I use "my own version"?

Thanks Doug Saintrain 19:39, 26 January 2007 (UTC)

Heya. The source I'm talking about is the MediaWiki source (I got it from the SourceForge project link over there in the resource box.) There's a comment in includes/SpecialExport.php saying // Default to current-only for GET requests, which is where the damage occurs. I imagine it's trying to throttle requests. So I instead made my version by saving the Special:Export page, tweaking it, and running it on my local machine; I only had to adjust one URL to get it to work right.

More fun, though; I wrote a little Python script to loop through and fetch entire article histories, a block of 100 revisions at a time (that being the hardwired limit), catenate them into one long XML, run it through anther filter, and then look at them with the History Flow Visualization Application from IBM.[1]. Pretty.

Hi, I am trying to do the same thing, 100 at a time and then concatenating them for a full history - any way you could share the fixed export and python script? Thanks. Mloubser 11:52, 13 November 2007 (UTC)
We shouldn't need limit=-n, should we? Isn't that what dir and limit should provide? My only problem, though, has been figuring what offset to start with for a backward scan. --En.jpgordon 07:43, 27 January 2007 (UTC)


Thanks for responding.

Mea culpa! I didn't even see "dir". Thanks.

The reason I wanted to look at recent history was to find at which edit a particular vandalism happened to see what got vandalized.

Is there a more straightforward way of looking for a particular word in the history? Thanks, Doug. Saintrain 04:46, 29 January 2007 (UTC)

Y'know, we almost have the tools to do that. The aforementioned history flow tool knows that information; I just don't think there's a way to glean it from it. --En.jpgordon 00:08, 2 February 2007 (UTC)

[edit] Discussion

Hi, is there a way to get just the total number of edits an article has had over time? Thanks! —The preceding unsigned comment was added by 87.196.51.250 (talkcontribs) 20:55, 20 September 2007


As far as I can remember there is no way to get only this number (but I might be wrong). Anyway, this number can probably be easy calculated using the appropriate parameters to API. Tizio 10:02, 21 September 2007 (UTC)

[edit] Parameters no longer in use?

Using either the links provided in the article, or attempting to add my own parameters does not yield the desired results. I can only get the most recent version of the article, regardless of how I set parameters. I've tried it on several computers running Linux or windows, and at different IPs. Same problem, the parameters seem to be ignored. --Falcorian 06:59, 14 January 2008 (UTC)

I've had it suggested to use curonly=0, but this also has no effect. --Falcorian
I also found that the links given did not work, nor did any experiments creating my own urls to get the history. However, submitting the parameters via a ruby script did work. I don't know enough yet (about HTTP, html forms) to understand why this approach worked and the url approach did not, but anyway here is some code that successfully retrieved the last 5 changes to the page on Patrick Donner, and writes the output to a file:
res = Net::HTTP.post_form(URI.parse("http://en.wikipedia.org/w/index.php?"), 
  {:title=> "Special:Export", :pages =>'Patrick_Donner', :action => "submit", :limit => 5, :dir => "desc"})
f = File.new("donner_output_last_5.txt", "w")
f << res.body
f.close

Hope this helps. I wish I knew enough to provide a more general solution. Andropod 00:44, 17 January 2008 (UTC)

When you use the URL as in a browser, you are submitting via GET. In the above ruby script, you are using POST. This seems the solution, as for example:
curl -d "" 'http://en.wikipedia.org/w/index.php?title=Special:Export&pages=Main_Page&offset=1&limit=5&action=submit'

worked for me. Before updating this page, I'd like to check this with the source code. Tizio 12:46, 21 January 2008 (UTC)
Works for me as well, which is great! Now to crack open the python... --Falcorian 03:29, 26 January 2008 (UTC)
For future reference, I get an access denied error when I try to use urllib alone in python to request the page. However, if I use urllib2 (which allows you to set a custom header), then we can trick Wikipedia into thinking we're Firefox and it will return the page as expected. --Falcorian 06:57, 26 January 2008 (UTC)
import urllib
import urllib2
 
headers = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4'} # Needs to fool Wikipedia so it will give us the file
params = urllib.urlencode({'title': 'Special:Export','pages': 'User:Falcorian', 'action': 'submit', 'limit': 2, })
req = urllib2.Request(url='http://en.wikipedia.org/w/index.php',data=params, headers=headers)
f = urllib2.urlopen(req)
print f.read()
This doesn't work for me. It doesn't stop at 2 versions. Neither does dir=desc work. --89.138.43.146 15:15, 25 July 2009 (UTC)
I have tried all of the above with urllib2 and with getwiki.py but it seems to me that the limit parameter has stopped working? Is this the case? --EpicEditor 1:03, 30 September 2009 (UTC)

[edit] Other parameters

I found these parameters in the source code:

curonly 
appears to override the other parameters and makes only the current version exported
listauthors 
export list of contributors (?) if $wgExportAllowListContributors is true
wpDownload 
returns result as a file attachment: http://en.wikipedia.org/w/index.php?title=Special:Export&pages=XXXX&wpDownload
templates 
images 
(currently commented out in the source code)

I don't know what listauthors does exactly, maybe it's disabled on wikien. Tizio 15:40, 21 January 2008 (UTC)

Also, variable $wgExportMaxHistory is relevant here. Tizio 15:43, 21 January 2008 (UTC)

Also missing: "history" has a different meaning when used in a POST request (use default values for dir and offset, $wgExportMaxHistory for limit). Tizio 15:49, 21 January 2008 (UTC)

[edit] Recursive downloading

Hi, is there some way (without writing my own script) to recursively download the subcategories inside of the categories? I don't want to download the whole wikipedia database dump to get the 10,000 or so pages I want. Thanks, JDowning 17:32, 13 March 2008 (UTC)

[edit] Export has changed

All the examples (and my script which has worked for months) return only the newest version now. Anyone have ideas? --Falcorian 05:58, 24 September 2008 (UTC)

[edit] disable Special:Export to users non-sysop

Hello !

What's the best to do to disable Special:Export from some user rights ? Thanks--almaghi 14:40, 27 April 2009 (UTC)

See the main page, you have to change localsettings.php Rumpsenate 16:32, 15 July 2009 (UTC)

[edit] Bug report on special export

[2]

This might be misunderstanding. The description says "if the history parameter is true, then all versions of each page are returned." Rumpsenate 16:13, 15 July 2009 (UTC)

[edit] How to

I am trying to export en:Train to the Navajo Wikipedia (nv:Special:Import) as a test, but I am not having any luck. I don’t know much about commands or encoding, and I’m not certain that nv:Special:Import is properly enabled. I typed Train in the Export textbox and pressed export, and it opened a complex-looking page in my Firefox, but I can’t figure out what to do next. Nothing appears in nv:Special:Import. What am I missing? Stephen G. Brown 14:38, 17 September 2009 (UTC)

[edit] I want to limit how much history I get, but the link in the article doesn't work. 24 megabyte history too long for WIkia to upload

It says it imports the last 1000 edit histories. That results in a 24megabyte file for Total Annihilation! The Wikia can not import a file that large, so I need something smaller it can handle. I tried the link example in the article, but it doesn't work. Click on either of them, and you'll see neither produce the results they are suppose to. Anyone have any ideas how I can do this? I became the administrator of the Total Annihilation Wikia and am trying to import the edit histories over. Dream Focus 11:11, 25 November 2009 (UTC)

[edit] The article does not mention if any request throttling should be implemented.

What kind of request throttling should be used to avoid a block? --Dc987 21:59, 22 March 2010 (UTC)

[edit] Revisionid as a parameter?

I guess there is presently no way to use revisionid as a parameter? Tisane 01:54, 16 May 2010 (UTC)

It should be bundled, I think. --Diego Grez return fire 01:56, 16 May 2010 (UTC)
What do you mean by "bundled"? Tisane 03:00, 16 May 2010 (UTC)

[edit] Using History Flow with Special Export page

Hi, I am using History Flow (IBM) to see the history of editing article. It designed to get data from [[[Special:Export]]]. However the special export page made the "only current revision " as automatic choice.

How can I get the entire history of article on this stage. Can I change the set of special export?

Thanks for help! Zeyi 15:44, 24 May 2010 (UTC)

Does anyone have an answer to this old question? 193.190.244.18 16:01, 8 December 2010 (UTC)

[edit] New parameters

I did not include either pagelink-depth or images in my recent updates to the article because pagelink-depth appears to be broken (I know little about PHP, but $listdepth appears to be uninitialized in any file), and as stated in the code itself, images is deliberately disabled for the time being. I had typed up a description for pagelink-depth before realizing it didn't work anywhere, so I've included it below in case it's wanted in future.

pagelink-depth
Includes any linked pages to the depth specified. Hard-coded to a maximum depth of 5 unless overridden by $wgExportMaxLinkDepth.

RobinHood70 10:22, 22 November 2010 (UTC)

Actually, I've confirmed that this does work, and the wonderful people at bugzilla corrected the misperception of $listdepth not being initialized (it's initialized in the If statement - changed in a later revision to be more clear). Given that, I'm copying this back onto the page. – RobinHood70 talk 20:12, 22 November 2010 (UTC)
Personal tools
Namespaces
Variants
Actions
Site
Support
Download
Development
Communication
Print/export
Toolbox