Extension talk:TextExtracts

About this board

2003:C2:3F21:FD00:A1BE:8BA5:2092:924F (talkcontribs)

Mediawiki 1.39.6, PHP 7.4.3, MySQL 8.0.36


Prior to 1.39.6, the PagePreview/Popup/TextExtracts either showed some text from the target article or it showed "..." The ellipsis always occured when there was _no text before the first heading_. But, if there was an associated image, the preview showed "..." on the left-hand side and the image on the right-hand.

Now with the upgrade, the preview is "Es gab ein Problem bei der Anzeige dieser Vorschau" / problems displaying the preview. No image being displayed.

How can we regain the previous behaviour?

2003:C2:3F21:FD00:134:BD68:4409:6542 (talkcontribs)

prop=extracts not working and send back Error 500

5
DAVY2018 (talkcontribs)

As the title mentioned, when I try to use Popups with TextExtract, the Popups often shows "There was issues displayding this preview".

When I check by using Chrome's function to check the code and console, it shows that there is a "500 Internal Server Error".

I have tried using API Sandbox to test every part of the api, and discover that once prop=extracts part was put in the api, it will send back Error Code 500. But when it was removed, no error will be given and the output remains normal.

Is there reason why this situation would happen and is there any possible ways to solve it?

P.S I have set short URL by apache2 according to the Tutorial in Mediawiki, while api.php is accessible and have no problem to access at all.

Thiemo Kreuz (WMDE) (talkcontribs)

Is this question about a self-hosted wiki? An error 500 could be anything. You would need to find the responsible error message in your server's log files. Manual:How to debug might help.

DAVY2018 (talkcontribs)

Yes, the wiki is a self-hosted wiki.

Thank you for your advice and I will try to figure it out by log files.

DAVY2018 (talkcontribs)

After debugging, it shows the following lines:

Fatal error: Declaration of TextExtracts\ExtractFormatter::onHtmlReady(string $html): string must be compatible with HtmlFormatter\HtmlFormatter::onHtmlReady($html) in /var/www/<my wiki name>/w/extensions/TextExtracts/includes/ExtractFormatter.php on line 66

Does this means that the extension php is having error?

DAVY2018 (talkcontribs)

Problem Solved after executing "composer require wikimedia/html-formatter".

Reply to "prop=extracts not working and send back Error 500"

No text extraction in SMW 4.02 + MW 1.39

11
Lotusccong (talkcontribs)

When I run this script https://www.tbpedia.org/api.php?action=query&prop=extracts&exchars=1000&titles=%E9%A6%96%E9%A0%81

It show the below extration message

{ "batchcomplete": "", "warnings": { "extracts": { "*": "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are listed at https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:TextExtracts#Caveats." } }, "query": { "pages": { "1": { "pageid": 1, "ns": 0, "title": "\u9996\u9801", "extract": "\n" } } } }

It seems that no text had been extract .

When I use Popups Extension, it will showed " There was issues displayding this preview:.

Joe Beaudoin Jr. Redux (talkcontribs)
  1. Did you ever solve this?
  2. The above link only shows the NewPP limit report commented-out text as an extract, which would explain the "Issues displaying this preview" error:

{

    "batchcomplete": "",

    "warnings": {

        "extracts": {

            "*": "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are listed at https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:TextExtracts#Caveats."

        }

    },

    "query": {

        "pages": {

            "1": {

                "pageid": 1,

                "ns": 0,

                "title": "\u9996\u9801",

                "extract": "<!-- \nNewPP limit report\nCached time: 20230324145510\nCache expiry: 3600\nReduced expiry: true\nComplications: []\n[SMW] In\u2010text annotation parser time: 0.002 seconds\nCPU time usage: 0.031 seconds\nReal time usage: 0.031 seconds\nPreprocessor visited node count: 9/1000000\nPost\u2010expand include size: 10/2097152 bytes\nTemplate argument size: 0/2097152 bytes\nHighest expansion depth: 2/100\nExpensive parser function count: 0/100\nUnstrip recursion depth: 0/20\nUnstrip post\u2010expand size: 0/5000000 bytes\n-->\n<!--\nTransclusion expansion time report (%,ms,calls,template)\n100.00%    0.000      1 -total\n-->"

            }

        }

    }

}

Lotusccong (talkcontribs)

Hi Joe,

The issues not resolved. What does this means "NewPP limit report commented-out text as an extrac" ?

Lotusccong (talkcontribs)

Hi Joe,

If you access to this link https://www.tbpedia.org/w/api.php?action=query&prop=extracts&exchars=1000&titles=%E9%A6%96%E9%A0%81

You will notice that the extract only showed the NewPP limit report commented. This cause the Popups extension said "There was an issues displaying this preview". See from here https://www.tbpedia.org/wiki/%E7%9B%A7%E5%8B%9D%E5%BD%A5%E6%96%87%E9%9B%86%E7%BF%BB%E8%AD%AF%E7%B6%AD%E5%9F%BA%E9%A4%A8

I reinstalled the MW with 1.39.3, PHP 8.0.28, SMW 4.1.1 , TextExtracts – (74baaa7) 17:23, 20 March 2023 , Previews – (010237d) 15:23, 21 March 2023, PageInages – (78537e6) 15:23, 21 March 2023 .

I am using the Short URL as well.

Initiately , the previews was working fine. Buy after I installed more Extensions until one of it ( Can't figure it whicj one), it caused this error. I removed the installed extensions the error still persist.

I suspected may be one of the extension that I installed with Composer has screwup the library ? Or there is a conflict if Javascripts ?

I thought if the issues is caused by the conflict of extensions, I just removed installed extension one by one but it doesn't work even I have removed it ( not load it from LocalSettings.php).

If the preview issues is caused by Popups extensions, then the Text extract API should be working.

I have enable the debug toolbar for easy troubleshooting.

Really apperciate if anyone can help to troubleshoot this issues.

Thanks in advanced.

Lotusccong (talkcontribs)

Today, when I check on the page 盧勝彥文集翻譯維基館 - 真佛百科 True Buddha Pedia (tbpedia.org) , Item 3 & 5 can showed the preview but not item 2 and 4.

This is really a puzzle to me why a day ago all 4 links can't show the preview, now can only show two out of four ?

Any clues what is went wrong ? It is due to cache ? Due to the page content ?

5 minutes later, All the links can't show the preview. This created more confusion for me. What is the root cause of not display the preview ? I didn;t make any changes on the configuration.

DAVY2018 (talkcontribs)

Sorry to reopen this old talk but I want to ask if there are any solution on this question?

Lotusccong (talkcontribs)

Basesd on my case, it seems that there is nothing wrong with TextExtracts or Popups . I notice that TextExtract will not extract any artcile that beging with heading . You need to have some text before the heading.

DAVY2018 (talkcontribs)

Yes TextExtract did not extract any article with heading as beginning. But for me, despite having text before headings, the TextExtract still cannot output anything, while Popups remained showing " There was issues displayding this preview."

That's why I would like to seek help from your past experience and see if it will be useful.

Lotusccong (talkcontribs)

If you don't mind, pls share the link and I can test it on my wiki site.

DAVY2018 (talkcontribs)

May I know if what link do you want me to share?

Lotusccong (talkcontribs)

The page that Popus showed "There was issues displaying this preview."

Reply to "No text extraction in SMW 4.02 + MW 1.39"

How to remove thumb caption from extracts?

2
Summary by Thiemo Kreuz (WMDE)

Unclear question.

Wess (talkcontribs)

We saw that thumb captions are shown in the extract if an image is in the first paragraph. Is there a way to remove it? tried to add "figure" + "figcaption" (MW 1.40) to wgExtractsRemoveClasses with no success. Manally adding the "noexcerpt" class to the image did work.

Thiemo Kreuz (WMDE) (talkcontribs)

On which wiki does this happen? What version of MediaWiki are you using?

<figure> is already part of the list of elements to remove. Since <figcaption> is inside of <figure> it will be removed as well. Maybe your wiki's configuration modifies $wgExtractsRemoveClasses in an unexpected way? Maybe your $wgParserEnableLegacyMediaDOM configuration changed, but TextExtracts wasn't updated?

How to remove one of default values in $ExtractsRemoveClasses?

2
Radouch (talkcontribs)

I use some kind of layout for pages of my wiki. It means that almost every page begins with the div tag.

Unfortunately, div is among default items in $ExtractsRemoveClasses array (defined in extension.json of this extension). So no text is displayed by Extension:Popups for those pages as content inside div element is ignored by TextExtracts.

I would like to remove div item from $ExtractsRemoveClasses in my LocalSettings.php, but I cannot find the right way to do it. Some ideas, please?

As a workaround, I removed div from extension.json, but I am sure it is a bad practice.

Joe Beaudoin Jr. Redux (talkcontribs)

Unfortunately, this is the only way to do this at this point... and you need to do it if you use the Citizen skin, as of this writing anyway.

Reply to "How to remove one of default values in $ExtractsRemoveClasses?"
Nardog (talkcontribs)

I used to use this API to get excerpts from Wiktionary on Wikipedia in JavaScript, but now (since a few weeks ago perhaps) it returns a "badtoken" error. I can use other APIs on Wiktionary from Wikipedia alright, including Parse, so this is odd.

Reply to "badtoken error"

$wgExtractsIncludeClasses needed

1
Krabina (talkcontribs)

It would be great if there was a parameter $wgExtractsIncludeClasses where classes could be defined that should be included in the text extracts. Often, I use some kind of div with styling informtion also for the first paragraph that will not be included in the extracts. If I want this to work, I always have to start with some plain text, which is quite unflexible.

Reply to "$wgExtractsIncludeClasses needed"

Return value of template parameter as summary

6
Summary by Jonathan3

No need - it uses parsed wikitext (i.e. HTML of page) - just needed to fix $ExtractsRemoveClasses to get it all to work for my pages.

Jonathan3 (talkcontribs)

How would I go about this? Most pages on my site are created from template calls (using Extension:Cargo) without any other text or headings. So no summary is extracted.

I see that it's possible to create a new API but to be able to do that I'd mostly need to copy an existing one :-)

Thiemo Kreuz (WMDE) (talkcontribs)

Is this about Popups, or about other usages of the TextExtracts API? It might be possible to customize the existing TextExtracts code so it supports your use-case better. Unfortunately, staff (like me) is probably not able to give a lot of support for customizations like this. If you are able to submit patches that make the TextExtracts extension work better with other extensions like Cargo, making it better for everyone, we can have a look at these patches.

Jonathan3 (talkcontribs)

Thanks again. It's about TextExtracts, which Popups on my site would use (I understand that WMF sites use something else). Cargo is only part of the background information, as my reason for having template-only pages (though it may be that most Cargo websites use it for infoboxes after introductory text, so mine may be a minority interest). If I can work it out I'll submit patches!

Jonathan3 (talkcontribs)

It turned out to be fairly easy. It works fine for me now after I got rid of "div" within "ExtractsRemoveClasses" in TextExtracts's extension.json file. Is there any way of making that change in LocalSettings.php instead?

(Initially I had wrongly assumed the extension looked at the raw wikitext but once I saw it used api.php?action=parse it became clearer...)

Thiemo Kreuz (WMDE) (talkcontribs)

No easy way, but it's possible:

$wgHooks['MediaWikiServices'][] = function () {
   global $wgExtractsRemoveClasses;
   $wgExtractsRemoveClasses = array_diff( $wgExtractsRemoveClasses, [ 'div' ] );
};
Jonathan3 (talkcontribs)

That seems to work - thanks!

STEM (talkcontribs)

Adding on my wiki 1.35.2 on IIS 10 but does not seem to strip out the headings. Latest 1.35 Extension is installed.

Typical page wiki markup for the first few lines has NOTITLE NOTOC Then a = Page Title Text = then "body" text. Just not picking up the body text for some reason unless there isn't a = Page Title Text = at the start of the article. ALSO - why can't this work on category pages?

After trying every fix below that could be applied to the LocalSetting Files, I threw is everything I could find but it's still not working except as above.

$wgExtractsRemoveClasses[] = array( 'dl', 'h1', 'h2', 'h3', 'div', '.mw-editsection', 'table', 'sup.reference', '.error', '.nomobile' ) ;


Here's a link to a page on the wiki in question. It has a mix of links that work and don't work as per my explanation above.

https://dardpi.ca/wiki/index.php?title=The_Last_Train#The_Last_Train

Sorry about posting below in a thread that was already basically finished.

Thiemo Kreuz (WMDE) (talkcontribs)

The problem is that your articles don't have an intro text before the first headline. But this is what the Popups extension expects. You need to disable this. The following combination kind of works for me:

$wgPopupsTextExtractsIntroOnly = false;
$wgExtractsRemoveClasses = [ 'h1', 'h2' ];

Note that most of the "classes" in your list are already in the configuration by default and don't need to be added another time.

STEM (talkcontribs)

Thank you for the reply. Unfortunately it didn't work any better although it seems the ExtractIntroOnly=False makes a lot of sense. Must be those pesky =Title= or ==Title== headlines. Maybe __NOTITLE__ or __NOTOC_ is causing some issues?


Actually I set up a number of test pages and discovered that if the =Headline=, ==Headline== was first on the page, it would generate the "..." extract.

https://nardpi.ca/wiki/index.php?hidebots=1&limit=50&days=7&enhanced=1&title=Special:RecentChanges&urlversion=2


Why does this extension not work on a Category page?

What I'm having a hard time understanding is the documentation that is aimed at programmers. For us configure-type guys it is not clear whatsoever. I would like to know if these items listed under:

prop=extracts (ex)

Can be used in localsettings and if so how? (example would be nice)

Thiemo Kreuz (WMDE) (talkcontribs)

I can't tell what's wrong, sorry. I tested your examples locally (esp. TextExtractTest and TextExtractTest4), and they work for me when I use the two configuration changes mentioned above.

Popups uses the TextExtracts API internally with some of the parameters described at Extension:TextExtracts#API. I guess you don't want to use the TextExtracts API manually, so this will probably not help. The only parameter that can be changed is the "intro" one, see "…IntroOnly" above.

Category pages usually don't contain anything that could be shown in a popup and are excluded by default. See Topic:W9vaqx3tknn5mp6v#flow-post-w9z1ow1y645g2giy.

STEM (talkcontribs)

You have been extremely helpful. Did you test my examples on an IIS 10 or Apache server? I'm IIS,on WIndwos Server 2019. What PHP version and wiki version? I'm PHP 7.4.19 and MW 1.35.2. If you check your Special:Version page, what do you see for a version of TextExtracts? Mine doesn't show a version at all.

https://nardpi.ca/wiki/index.php?title=Special:Version

Is there a critical PHP extension or configuration item I might be missing?

Just so were on the same page, this is my localsetting.php configuration:

//Popups Extension

wfLoadExtensions( [

    'TextExtracts',

    'PageImages',

    'Popups'

] );

$wgPopupsHideOptInOnPreferencesPage = true;

$wgPopupsReferencePreviewsBetaFeature = false;

//PageImages Extension - works with Popups

wfLoadExtension( 'PageImages' );

//Text Extracts Extension - works with Popups

wfLoadExtension( 'TextExtracts' );

$wgPopupsTextExtractsIntroOnly = false;

$wgExtractsRemoveClasses = [ 'h1', 'h2', 'h3' ];


Thanks!

Reply to ".. preview"

Hide equal characters and headings

8
Summary by Thiemo Kreuz (WMDE)
Spiros71 (talkcontribs)

Using the Popups extension MW 1.31. Headings (level 2/3) are displayed with equal signs left and right. Is there a way not to show them?

Here is an image:

I tried adding this in TextExtracts\includes\ExtractFormatter.php but it did not seem to work.

$text = preg_replace( "={2,}(.*?)={2,}", "", $text );

Surrounding the headings (found on the template) with <div class="noexcerpt"> , did partially solve the problem but resulted in the mobile version (MinervaNeu) headings not being expandable/collapsible.

Thiemo Kreuz (WMDE) (talkcontribs)

Do you have a link to the actual page where this happens?

Spiros71 (talkcontribs)

I added the following in ExtractsRemoveClasses in TextExtracts\extension.json and it seemed to fix it.

"h2",

"h3",

Thiemo Kreuz (WMDE) (talkcontribs)

I would love to know more about your setup. See, every Wikipedia article starts with a level 2 headline. Why don't we have this issue, but you have it? What's the difference?

Utilizing the configuration is a nice idea. However, you should not edit extension.json but add $wgExtractsRemoveClasses[] = 'h2' ;$wgExtractsRemoveClasses[] = 'h3'; to your LocalSettings.php.

This post was hidden by Thiemo Kreuz (WMDE) (history)
Spiros71 (talkcontribs)

Thanks so much for the feedback. The difference may be that the level 2 headings are declared in the templates. Level 3 are in page text.


Here is the wiki http://lsj.gr/

Setup:

MediaWiki 1.31.12

PHP 7.3.25 (fpm-fcgi)

MariaDB 10.3.27-MariaDB-log

ICU 50.2

Elasticsearch 5.6.13


Extensions:

API, CirrusSearch, Cite, Data Transfer, Elastica, Extension, Gadgets, LabeledSectionTransclusion, Lockdown, MobileFrontend, Normalizer, Nuke, Other, Page Forms, PageImages, Parser hooks, ParserFunctions, ParserHooks, Popups, Replace Text, Semantic Drilldown, Semantic MediaWiki, Semantic Result Formats, Special pages, TextExtracts,

Thiemo Kreuz (WMDE) (talkcontribs)

The template is indeed the reason. Thanks a lot for the follow-up! I was now able to replay the issue locally. I created phab:T271439 to discuss possible fixes.

Spiros71 (talkcontribs)

Good job, Thiemo!

Reply to "Hide equal characters and headings"