Help:Export/zh

Wiki 页面可以被导出到一种特别的XML格式. 导出的XML可以被导入到另一个MediaWiki实例（条件是导入功能在该实例上被启用，且用户是该实例的系统管理员），或被用于其他用途，例如对内容进行分析. 要了解导出页面其他信息，而不是页面本身，请参见m:Syndication feeds ；要了解导入页面，请参见.

如何导出
有至少4种导出页面的方式：


 * 在Special:Export的框中粘贴文章的名称或使用$ page.
 * 备份脚本 将所有Wiki页面转储为XML文件.  仅适用于MediaWiki 1.5或更高版本.  您需要直接访问服务器才能运行此脚本.  维基媒体项目的转储定期以https://dumps.wikimedia.org/的形式提供.
 * 注意：您可能需要配置AdminSettings.php才能成功运行dumpBackup.php. 参见MediaWiki以获取更多信息.
 * 有OAI-PMH用于定期获取自特定时间以来已修改的页面的界面. 对于维基媒体项目，此界面不公开，参见.  OAI-PMH包含围绕实际导出文章的包装格式.
 * 使用Python维基百科机器人框架. 这不会在这里解释

默认情况下，仅包含页面的当前版本. 您可以选择使用日期，时间，用户名和编辑摘要获取所有版本. 可选地，还可以导出直接或间接调用的所有模板的最新版本.

此外，您可以复制SQL数据库. 这是在MediaWiki 1.5之前使数据库的转储可用的方式，这里不再对其进行解释.

使用“Special:Export”
例如，要导出命名空间的所有页面.

1 获取要导出的页面的名称
I feel an example is better because the description below feels quite unclear.


 * 1) Go to Special:Allpages and choose the desired article/file.
 * 2) Copy the list of page names to a text editor
 * 3) Put all page names on separate lines
 * 4) You can achieve that relatively quickly if you copy the part of the rendered page with the desired names, and paste this into say MS Word - use paste special as unformatted text - then open the replace function (CTRL+h), entering ^t in Find what, entering ^p in Replace with and then hitting Replace All button. (This relies on tabs between the page names; these are typically the result of the fact that the page names are inside td-tags in the html-source.)
 * 5) The text editor Vim also allows for a quick way to fix line breaks: after pasting the whole list, run the command :1,$s/\t/\r/g to replace all tabs by carriage returns and then :1,$s/^\n//g to remove every line containing only a newline character.
 * 6) Another approach is to copy the formatted text into any editor exposing the html. Remove all   and   tags and replace all   tags to    and   tags to    the html will then be parsed into the needed format.
 * 7) If you have shell and mysql access to your server, you can use this script:

mysql -umike -pmikespassword -hlocalhost wikidbname select page_title from wiki_page where page_namespace=0 EOF

''Note, replace mike and mikespassword with your own. Also, this example shows tables with the prefix wiki_''


 * 1) Prefix the namespace to the page names (e.g. 'Help:Contents'), unless the selected namespace is the main namespace.
 * 2) Repeat the steps above for other namespaces (e.g. Category:, Template:, etc.)

A similar script for PostgreSQL databases looks like this:

$ psql -At -U wikiuser -h localhost wikidb -c "select page_title from mediawiki.page"

''注意，替换你自己Wiki用户，数据库会提示你输入密码. This example shows tables without the prefix wiki_ and with the namespace specified as part of the table name.''

Alternatively, a quick approach for those with access to a machine with Python installed:


 * 1) Go to Special:Allpages and choose the desired namespace.
 * 2) Save the entire webpage as index.php.htm. Some wikis may have more pages than will fit on one screen of AllPages; you will need to save each of those pages.
 * 3) Run export_all_helper.py in the same directory as the saved file. You may wish to pipe the output to a file; e.g.   to send it to a file named "main".
 * 4) Save the page names output by the script.

2. Perform the export
and finally...
 * Go to Special:Export and paste all your page names into the textbox, making sure there are no empty lines.
 * Click 'Submit query'
 * Save the resulting XML to a file using your browser's save facility.
 * Open the XML file in a text editor. Scroll to the bottom to check for error messages.

Now you can use this XML file to perform an import.

Exporting the full history
A checkbox in the Special:Export interface selects whether to export the full history (all versions of an article) or the most recent version of articles. A maximum of 100 revisions are returned; other revisions can be requested as detailed in.

导出格式
The format of the XML file you receive is the same in all ways. It is codified in XML Schema at https://www.mediawiki.org/xml/export-0.10.xsd This format is not intended for viewing in a web browser. Some browsers show you pretty-printed XML with "+" and "-" links to view or hide selected parts. Alternatively the XML-source can be viewed using the "view source" feature of the browser, or after saving the XML file locally, with a program of choice. If you directly read the XML source it won't be difficult to find the actual wikitext. If you don't use a special XML editor "<" and ">" appear as &amp;lt; and &amp;gt;, to avoid a conflict with XML tags; to avoid ambiguity, "&amp;" is coded as "&amp;amp;".

In the current version the export format does not contain an XML replacement of wiki markup (see Wikipedia DTD for an older proposal). 你只得到你当编辑wikitext的第.

DTD
Here is an unofficial, short Document Type Definition version of the format. If you don't know what a DTD is just ignore it.

处理XML导出
许多工具可以处理导出的XML. If you process a large number of pages (for instance a whole dump) you probably won't be able to get the document in main memory so you will need a parser based on SAX or other event-driven methods.

You can also use regular expressions to directly process parts of the XML code. This may be faster than other methods but not recommended because it's difficult to maintain.

Please list methods and tools for processing XML export here:


 * Parse::MediaWikiDump is a perl module for processing the XML dump file.
 * m:Processing MediaWiki XML with STX - Stream based XML transformation
 * The m:IBM History flow project can read it after applying a small Python program, export-historyflow-expand.py.

Details and practical advice

 * To determine the namespace of a page you have to match its title to the prefixes defined in
 * 可能的限制
 * (保护页面)

为什么要导出
为什么不只是使用动态数据库下载呢？

假设您正在构建一个软件，在某些点上显示来自维基百科的信息. If you want your program to display the information in a different way than can be seen in the live version, you'll probably need the wikicode that is used to enter it, instead of the finished html.

Also if you want to get all of the data, you'll probably want to transfer it in the most efficient way that's possible. The Wikimedia servers need to do quite a bit of work to convert the wikicode into html. That's time consuming both for you and for the Wikimedia servers, so simply spidering all pages is not the way to go.

要访问XML中的任何文章，一次一个，链接到：

Special:Export/Title_of_the_article