Extension:TextExtracts

extracts
	返回指定页面的纯文本或有限HTML提取; 此模块不能用作generator。
前缀	ex
需要的权限	无
仅限Post？	否
产生帮助	当前

**MediaWiki扩展手册**
TextExtracts; 发行状态：稳定版
实现	API
描述	提供API来导出纯文本或有限HTML的页面提取
作者	Max Semenik (MaxSem留言)
兼容性政策	快照跟随MediaWiki发布。 master分支不向后兼容。
MediaWiki	>= 1.42
数据库更改	否
许可协议	GNU通用公眾授權條款2.0或更新版本
下載	下載扩展 ; Git [?]: 下载Git master; 浏览存储库 (Phabricator · GitHub); 提交历史; 存储库貢獻者 (GitHub); 代码复核;
	参数 $wgExtractsExtendOpenSearchXml; $wgExtractsRemoveClasses;
	使用的函数钩 ApiOpenSearchSuggest;
季度下載量	130 (Ranked 48th)
正在使用的公开wiki数	2,046 (Ranked 189th)
	前往translatewiki.net翻譯TextExtracts扩展
問題	开启的任务 · 报告错误

This page is a translated version of the page Extension:TextExtracts and the translation is 100% complete.

此扩展随附于MediaWiki 1.34及更高版本。因此您无需另外下载。但是，您仍需遵循此页面提供的其他使用说明。

This extension is under code stewardship review and not actively maintained (工單T256505). No new feature requests will be considered during this period.

对于在生产环境中获取摘要，维基媒体基金会推荐并使用Page Content Service。

TextExtracts扩展提供API来提取纯文本或有限HTML（HTML中的一些CSS样式被删除）的页面内容提取。

安裝

下载文件，并将其放置在您extensions/文件夹中的TextExtracts目录内。
开发者和代码贡献人员应从Git安装扩展，输入：cd extensions/ git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/TextExtracts
将下列代码放置在您的LocalSettings.php 的底部：
```
wfLoadExtension( 'TextExtracts' );
```
完成 – 在您的wiki上导航至Special:Version，以验证已成功安装扩展。

配置设置

$wgExtractsRemoveClasses是<tag>, <tag>.class, .<class>, #<id>的集合，会在提取时被排除。
例如，$wgExtractsRemoveClasses[] = 'dl';删除缩进的文本，通常用于摘要中不需要的非模板化的备忘。

extension.json定义了默认值，其中“noexcerpt”类是其中之一——可以将其添加到任何模板中以将该模板排除。
$wgExtractsExtendOpenSearchXml定义TextExtracts是否应将其摘录提供给Opensearch API模块。默认值为false。

API

以下文档是Special:ApiHelp/query+extracts的输出，由该网站（MediaWiki.org）运行的MediaWiki的预发行版本自动生成。

prop=extracts (ex)

(main | query | extracts)

This module requires read rights.
Source: TextExtracts
License: GPL-2.0-or-later

Returns plain-text or limited HTML extracts of the given pages.

https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:TextExtracts#API

Specific parameters:

Other general parameters are available.

exchars

How many characters to return. Actual text returned might be slightly longer.

Type: integer

The value must be between 1 and 1,200.

exsentences

How many sentences to return.

Type: integer

The value must be between 1 and 10.

exlimit

How many extracts to return. (Multiple extracts can only be returned if exintro is set to true.)

Type: integer or max

The value must be between 1 and 20.

Default: 20

exintro

Return only content before the first section.

Type: boolean (details)

explaintext

Return extracts as plain text instead of limited HTML.

Type: boolean (details)

exsectionformat

How to format sections in plaintext mode:

plain: No formatting.
wiki: Wikitext-style formatting (== like this ==).
raw: This module's internal representation (section titles prefixed with <ASCII 1><ASCII 2><section level><ASCII 2><ASCII 1>).

One of the following values: plain, raw, wiki

Default: wiki

excontinue

When more results are available, use this to continue. More detailed information on how to continue queries can be found on mediawiki.org.

Type: integer

Example:

Get a 175-character extract: api.php?action=query&prop=extracts&exchars=175&titles=Therion [open in sandbox]

另一个例子

api.php?action=query&prop=extracts&exchars=100&explaintext&titles=Earth [在Api沙盒中尝试]

结果

{
    "query": {
        "pages": {
            "9228": {
                "pageid": 9228,
                "ns": 0,
                "title": "Earth",
                "extract": "Earth, also called the world and, less frequently, Gaia, (or Terra in some works of science fiction)..."
            }
        }
    }
}

注意事项

使用API或会调用该API的软件时，需要注意一些事项

我们不推荐使用`exsentences`。它不适用于HTML提取，并且在很多情况下都不存在。例如，“Arm. gen. Ing. John Smith was a soldier.”（约翰·史密斯陆军上将是一名士兵。）将被视为4个句子。我们也没有计划修复这一问题。
行内图片会在响应中被移除（即使在HTML模式下也是如此）。这意味着，如果您正在使用Math 扩展并在首段中使用公式，它们可能不会出现在摘要输出中。
在HTML模式下，我们不能保证HTML格式是良好的。HTML结果可能是无效的或者存在格式错误。
在纯文本格式下：
- 引用可能不会被移除（参见phab:T197266）。
- 如果一个段落以HTML标签结尾（例如ref标签），新行将会被移除（参见phab:T201946）。
- 列表后的新行可能被移除（参见phab:T208132）。
- Articles must begin with the lead paragraph for an extract to be generated. The use of any template, and/or unclosed or empty HTML element may result in no preview for the article. E.g. "<div></div>hello" will give an empty extract.

常见问题

如何从页面预览中移除内容？

TextExtracts将删除所有带有.noexcerpt类标记的元素。这是由$wgExtractsRemoveClasses配置变量提供的，该变量同时定义了其他将被排除的元素。

参见

此扩展用于一个或多个维基媒体项目。这可能意味着扩展足够稳定、运作足够良好，可以用在这样的高流量的网站上。请在维基媒体的CommonSettings.php和InitialiseSettings.php配置文件中查找此扩展的名称以查看哪些网站安装了该扩展。特定wiki上的已安装的扩展的完整列表位于Special:Version页面。

此扩展在以下wiki农场/托管网站和/或软件包中提供：

這不是一份權威名單。即使某些wiki农场/托管网站和/或软件包未在这里列出，它们也可能提供此扩展。请检查你的wiki农场/托管网站或软件包以确认提供情况。