Extension:TextExtracts/zh

TextExtracts扩展提供API来提取纯文本或有限HTML（HTML中的一些CSS样式被删除）的页面内容提取.

配置设定

 * 是＆lt;tag>，＆lt;tag>.class，.＆lt;class>和＃＆lt;id>的数组，将从提取中排除.
 * 例如， 删除缩进的文本，通常用于摘要中不需要的非模板化的备忘.
 * extension.json定义了默认值，其中“noexcerpt”类是默认值-可以将其添加到任何模板中以将其排除
 * 定义TextExtracts是否应将其摘录提供给Opensearch API模块. 默认值为“false”

注意事项
There are various things to be aware of when using the API


 * We do not recommend the usage of `exsentences`. It does not work for HTML extracts and there are many edge cases for which it doesn't exist. For example "Arm. gen. Ing. John Smith was a soldier." will be treated as 4 sentences. We do not plan to fix this.


 * Inline images are stripped from the response (even in HTML mode). This means if you are using the Math extension and using formulae in your lead section they may not appear in the summary output.


 * In HTML mode we cannot guarantee well formed HTML. Resulting HTML may be invalid or malformed.


 * In plaintext mode:
 * citations may not be stripped (see T197266)
 * if a paragraph ends with an HTML tag e.g. ref tag, new lines may be dropped (see T201946),
 * new lines may be dropped after lists T208132

如何从页面预览中移除内容？
TextExtracts will strip any element that is marked with the class noexcerpt. This is provided by the global wgExtractsRemoveClasses.

参见

 * Page Content Service
 * Page Content Service