Extension talk:TextExtracts

About this board

Edit description

Previous page history was archived for backup purposes at Extension talk:TextExtracts/LQT Archive 1 on 2015-06-25.

Start a new topic

Popups/TextExtracts 1.39

2 comments • 11:10, 23 April 2024 1 day ago

2

2003:C2:3F21:FD00:A1BE:8BA5:2092:924F (talkcontribs)

Mediawiki 1.39.6, PHP 7.4.3, MySQL 8.0.36

Prior to 1.39.6, the PagePreview/Popup/TextExtracts either showed some text from the target article or it showed "..." The ellipsis always occured when there was _no text before the first heading_. But, if there was an associated image, the preview showed "..." on the left-hand side and the image on the right-hand.

Now with the upgrade, the preview is "Es gab ein Problem bei der Anzeige dieser Vorschau" / problems displaying the preview. No image being displayed.

How can we regain the previous behaviour?

09:15, 18 April 2024 6 days ago

2003:C2:3F21:FD00:134:BD68:4409:6542 (talkcontribs)

Topic can be closed, wrong place. See Topic:Y3eq158cl5otcgdt instead.

10:06, 23 April 2024 1 day ago

prop=extracts not working and send back Error 500

5 comments • 06:33, 6 April 2024 19 days ago

5

DAVY2018 (talkcontribs)

As the title mentioned, when I try to use Popups with TextExtract, the Popups often shows "There was issues displayding this preview".

When I check by using Chrome's function to check the code and console, it shows that there is a "500 Internal Server Error".

I have tried using API Sandbox to test every part of the api, and discover that once prop=extracts part was put in the api, it will send back Error Code 500. But when it was removed, no error will be given and the output remains normal.

Is there reason why this situation would happen and is there any possible ways to solve it?

P.S I have set short URL by apache2 according to the Tutorial in Mediawiki, while api.php is accessible and have no problem to access at all.

Reply Edited 14:45, 5 April 2024 19 days ago

Thiemo Kreuz (WMDE) (talkcontribs)

Is this question about a self-hosted wiki? An error 500 could be anything. You would need to find the responsible error message in your server's log files. Manual:How to debug might help.

Reply 16:03, 5 April 2024 19 days ago

DAVY2018 (talkcontribs)

Yes, the wiki is a self-hosted wiki.

Thank you for your advice and I will try to figure it out by log files.

Reply 04:57, 6 April 2024 19 days ago

DAVY2018 (talkcontribs)

After debugging, it shows the following lines:

Fatal error: Declaration of TextExtracts\ExtractFormatter::onHtmlReady(string $html): string must be compatible with HtmlFormatter\HtmlFormatter::onHtmlReady($html) in /var/www/<my wiki name>/w/extensions/TextExtracts/includes/ExtractFormatter.php on line 66

Does this means that the extension php is having error?

Reply 06:01, 6 April 2024 19 days ago

DAVY2018 (talkcontribs)

Problem Solved after executing "composer require wikimedia/html-formatter".

Reply 06:33, 6 April 2024 19 days ago

Reply to "prop=extracts not working and send back Error 500"

No text extraction in SMW 4.02 + MW 1.39

11 comments • 00:00, 6 April 2024 19 days ago

11

Lotusccong (talkcontribs)

When I run this script https://www.tbpedia.org/api.php?action=query&prop=extracts&exchars=1000&titles=%E9%A6%96%E9%A0%81

It show the below extration message

{ "batchcomplete": "", "warnings": { "extracts": { "*": "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are listed at https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:TextExtracts#Caveats." } }, "query": { "pages": { "1": { "pageid": 1, "ns": 0, "title": "\u9996\u9801", "extract": "\n" } } } }

It seems that no text had been extract .

When I use Popups Extension, it will showed " There was issues displayding this preview:.

Reply 10:41, 29 December 2022 1 year ago

Joe Beaudoin Jr. Redux (talkcontribs)

Did you ever solve this?
The above link only shows the NewPP limit report commented-out text as an extract, which would explain the "Issues displaying this preview" error:

{

"batchcomplete": "",

"warnings": {

"extracts": {

"*": "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are listed at https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:TextExtracts#Caveats."

}

},

"query": {

"pages": {

"1": {

"pageid": 1,

"ns": 0,

"title": "\u9996\u9801",

"extract": "\n"

}

Reply Edited 14:58, 24 March 2023 1 year ago

Lotusccong (talkcontribs)

Hi Joe,

The issues not resolved. What does this means "NewPP limit report commented-out text as an extrac" ?

Reply 15:44, 9 April 2023 1 year ago

Lotusccong (talkcontribs)

Hi Joe,

If you access to this link https://www.tbpedia.org/w/api.php?action=query&prop=extracts&exchars=1000&titles=%E9%A6%96%E9%A0%81

You will notice that the extract only showed the NewPP limit report commented. This cause the Popups extension said "There was an issues displaying this preview". See from here https://www.tbpedia.org/wiki/%E7%9B%A7%E5%8B%9D%E5%BD%A5%E6%96%87%E9%9B%86%E7%BF%BB%E8%AD%AF%E7%B6%AD%E5%9F%BA%E9%A4%A8

I reinstalled the MW with 1.39.3, PHP 8.0.28, SMW 4.1.1 , TextExtracts – (74baaa7) 17:23, 20 March 2023 , Previews – (010237d) 15:23, 21 March 2023, PageInages – (78537e6) 15:23, 21 March 2023 .

I am using the Short URL as well.

Initiately , the previews was working fine. Buy after I installed more Extensions until one of it ( Can't figure it whicj one), it caused this error. I removed the installed extensions the error still persist.

I suspected may be one of the extension that I installed with Composer has screwup the library ? Or there is a conflict if Javascripts ?

I thought if the issues is caused by the conflict of extensions, I just removed installed extension one by one but it doesn't work even I have removed it ( not load it from LocalSettings.php).

If the preview issues is caused by Popups extensions, then the Text extract API should be working.

I have enable the debug toolbar for easy troubleshooting.

Really apperciate if anyone can help to troubleshoot this issues.

Thanks in advanced.

Reply Edited 14:38, 11 April 2023 1 year ago

Lotusccong (talkcontribs)

Today, when I check on the page 盧勝彥文集翻譯維基館 - 真佛百科 True Buddha Pedia (tbpedia.org) , Item 3 & 5 can showed the preview but not item 2 and 4.

This is really a puzzle to me why a day ago all 4 links can't show the preview, now can only show two out of four ?

Any clues what is went wrong ? It is due to cache ? Due to the page content ?

5 minutes later, All the links can't show the preview. This created more confusion for me. What is the root cause of not display the preview ? I didn;t make any changes on the configuration.

Reply Edited 04:05, 12 April 2023 1 year ago

DAVY2018 (talkcontribs)

Sorry to reopen this old talk but I want to ask if there are any solution on this question?

Reply 07:06, 5 April 2024 19 days ago

Lotusccong (talkcontribs)

Basesd on my case, it seems that there is nothing wrong with TextExtracts or Popups . I notice that TextExtract will not extract any artcile that beging with heading . You need to have some text before the heading.

Reply 12:23, 5 April 2024 19 days ago

DAVY2018 (talkcontribs)

Yes TextExtract did not extract any article with heading as beginning. But for me, despite having text before headings, the TextExtract still cannot output anything, while Popups remained showing " There was issues displayding this preview."

That's why I would like to seek help from your past experience and see if it will be useful.

Reply 13:22, 5 April 2024 19 days ago

Lotusccong (talkcontribs)

If you don't mind, pls share the link and I can test it on my wiki site.

Reply 14:26, 5 April 2024 19 days ago

DAVY2018 (talkcontribs)

May I know if what link do you want me to share?

Reply Edited 14:38, 5 April 2024 19 days ago

Lotusccong (talkcontribs)

The page that Popus showed "There was issues displaying this preview."

Reply 00:00, 6 April 2024 19 days ago

Reply to "No text extraction in SMW 4.02 + MW 1.39"

How to remove thumb caption from extracts?

2 comments • 16:04, 5 April 2024 19 days ago

2

Summary by Thiemo Kreuz (WMDE)

Unclear question.

Wess (talkcontribs)

We saw that thumb captions are shown in the extract if an image is in the first paragraph. Is there a way to remove it? tried to add "figure" + "figcaption" (MW 1.40) to wgExtractsRemoveClasses with no success. Manally adding the "noexcerpt" class to the image did work.

08:28, 4 August 2023 8 months ago

Thiemo Kreuz (WMDE) (talkcontribs)

On which wiki does this happen? What version of MediaWiki are you using?

<figure> is already part of the list of elements to remove. Since <figcaption> is inside of <figure> it will be removed as well. Maybe your wiki's configuration modifies $wgExtractsRemoveClasses in an unexpected way? Maybe your $wgParserEnableLegacyMediaDOM configuration changed, but TextExtracts wasn't updated?

12:47, 21 August 2023 8 months ago

How to remove one of default values in $ExtractsRemoveClasses?

2 comments • 15:06, 24 March 2023 1 year ago

2

Radouch (talkcontribs)

I use some kind of layout for pages of my wiki. It means that almost every page begins with the div tag.

Unfortunately, div is among default items in $ExtractsRemoveClasses array (defined in extension.json of this extension). So no text is displayed by Extension:Popups for those pages as content inside div element is ignored by TextExtracts.

I would like to remove div item from $ExtractsRemoveClasses in my LocalSettings.php, but I cannot find the right way to do it. Some ideas, please?

As a workaround, I removed div from extension.json, but I am sure it is a bad practice.

Reply Edited 13:39, 8 February 2022 2 years ago

Joe Beaudoin Jr. Redux (talkcontribs)

Unfortunately, this is the only way to do this at this point... and you need to do it if you use the Citizen skin, as of this writing anyway.

Reply 15:06, 24 March 2023 1 year ago

Reply to "How to remove one of default values in $ExtractsRemoveClasses?"

badtoken error

One comment • 11:14, 28 November 2022 1 year ago

1

Nardog (talkcontribs)

I used to use this API to get excerpts from Wiktionary on Wikipedia in JavaScript, but now (since a few weeks ago perhaps) it returns a "badtoken" error. I can use other APIs on Wiktionary from Wikipedia alright, including Parse, so this is odd.

Reply 11:14, 28 November 2022 1 year ago

Reply to "badtoken error"

$wgExtractsIncludeClasses needed

One comment • 09:31, 20 September 2021 2 years ago

1

Krabina (talkcontribs)

It would be great if there was a parameter $wgExtractsIncludeClasses where classes could be defined that should be included in the text extracts. Often, I use some kind of div with styling informtion also for the first paragraph that will not be included in the extracts. If I want this to work, I always have to start with some plain text, which is quite unflexible.

Reply 09:31, 20 September 2021 2 years ago

Reply to "$wgExtractsIncludeClasses needed"

Return value of template parameter as summary

6 comments • 20:16, 1 June 2021 2 years ago

6

Summary by Jonathan3

No need - it uses parsed wikitext (i.e. HTML of page) - just needed to fix $ExtractsRemoveClasses to get it all to work for my pages.

Jonathan3 (talkcontribs)

How would I go about this? Most pages on my site are created from template calls (using Extension:Cargo) without any other text or headings. So no summary is extracted.

I see that it's possible to create a new API but to be able to do that I'd mostly need to copy an existing one :-)

Edited 14:10, 31 May 2021 2 years ago

Thiemo Kreuz (WMDE) (talkcontribs)

Is this about Popups, or about other usages of the TextExtracts API? It might be possible to customize the existing TextExtracts code so it supports your use-case better. Unfortunately, staff (like me) is probably not able to give a lot of support for customizations like this. If you are able to submit patches that make the TextExtracts extension work better with other extensions like Cargo, making it better for everyone, we can have a look at these patches.

06:50, 1 June 2021 2 years ago

Jonathan3 (talkcontribs)

Thanks again. It's about TextExtracts, which Popups on my site would use (I understand that WMF sites use something else). Cargo is only part of the background information, as my reason for having template-only pages (though it may be that most Cargo websites use it for infoboxes after introductory text, so mine may be a minority interest). If I can work it out I'll submit patches!

07:21, 1 June 2021 2 years ago

Jonathan3 (talkcontribs)

It turned out to be fairly easy. It works fine for me now after I got rid of "div" within "ExtractsRemoveClasses" in TextExtracts's extension.json file. Is there any way of making that change in LocalSettings.php instead?

(Initially I had wrongly assumed the extension looked at the raw wikitext but once I saw it used api.php?action=parse it became clearer...)

Edited 15:32, 1 June 2021 2 years ago

Thiemo Kreuz (WMDE) (talkcontribs)

No easy way, but it's possible:

$wgHooks['MediaWikiServices'][] = function () {
   global $wgExtractsRemoveClasses;
   $wgExtractsRemoveClasses = array_diff( $wgExtractsRemoveClasses, [ 'div' ] );
};

16:02, 1 June 2021 2 years ago

Jonathan3 (talkcontribs)

That seems to work - thanks!

18:54, 1 June 2021 2 years ago

.. preview

5 comments • 08:37, 1 June 2021 2 years ago

5

STEM (talkcontribs)

Adding on my wiki 1.35.2 on IIS 10 but does not seem to strip out the headings. Latest 1.35 Extension is installed.

Typical page wiki markup for the first few lines has NOTITLE NOTOC Then a = Page Title Text = then "body" text. Just not picking up the body text for some reason unless there isn't a = Page Title Text = at the start of the article. ALSO - why can't this work on category pages?

After trying every fix below that could be applied to the LocalSetting Files, I threw is everything I could find but it's still not working except as above.

$wgExtractsRemoveClasses[] = array( 'dl', 'h1', 'h2', 'h3', 'div', '.mw-editsection', 'table', 'sup.reference', '.error', '.nomobile' ) ;

Here's a link to a page on the wiki in question. It has a mix of links that work and don't work as per my explanation above.

https://dardpi.ca/wiki/index.php?title=The_Last_Train#The_Last_Train

Sorry about posting below in a thread that was already basically finished.

Reply 20:09, 30 May 2021 2 years ago

Thiemo Kreuz (WMDE) (talkcontribs)

The problem is that your articles don't have an intro text before the first headline. But this is what the Popups extension expects. You need to disable this. The following combination kind of works for me:

$wgPopupsTextExtractsIntroOnly = false;
$wgExtractsRemoveClasses = [ 'h1', 'h2' ];

Note that most of the "classes" in your list are already in the configuration by default and don't need to be added another time.

Reply 08:23, 31 May 2021 2 years ago

STEM (talkcontribs)

Thank you for the reply. Unfortunately it didn't work any better although it seems the ExtractIntroOnly=False makes a lot of sense. Must be those pesky =Title= or ==Title== headlines. Maybe __NOTITLE__ or __NOTOC_ is causing some issues?

Actually I set up a number of test pages and discovered that if the =Headline=, ==Headline== was first on the page, it would generate the "..." extract.

https://nardpi.ca/wiki/index.php?hidebots=1&limit=50&days=7&enhanced=1&title=Special:RecentChanges&urlversion=2

Why does this extension not work on a Category page?

What I'm having a hard time understanding is the documentation that is aimed at programmers. For us configure-type guys it is not clear whatsoever. I would like to know if these items listed under:

prop=extracts (ex)

Can be used in localsettings and if so how? (example would be nice)

Reply Edited 22:07, 31 May 2021 2 years ago

Thiemo Kreuz (WMDE) (talkcontribs)

I can't tell what's wrong, sorry. I tested your examples locally (esp. TextExtractTest and TextExtractTest4), and they work for me when I use the two configuration changes mentioned above.

Popups uses the TextExtracts API internally with some of the parameters described at Extension:TextExtracts#API. I guess you don't want to use the TextExtracts API manually, so this will probably not help. The only parameter that can be changed is the "intro" one, see "…IntroOnly" above.

Category pages usually don't contain anything that could be shown in a popup and are excluded by default. See Topic:W9vaqx3tknn5mp6v#flow-post-w9z1ow1y645g2giy.

Reply 06:43, 1 June 2021 2 years ago

STEM (talkcontribs)

You have been extremely helpful. Did you test my examples on an IIS 10 or Apache server? I'm IIS,on WIndwos Server 2019. What PHP version and wiki version? I'm PHP 7.4.19 and MW 1.35.2. If you check your Special:Version page, what do you see for a version of TextExtracts? Mine doesn't show a version at all.

https://nardpi.ca/wiki/index.php?title=Special:Version

Is there a critical PHP extension or configuration item I might be missing?

Just so were on the same page, this is my localsetting.php configuration:

//Popups Extension

wfLoadExtensions( [

'TextExtracts',

'PageImages',

'Popups'

] );

$wgPopupsHideOptInOnPreferencesPage = true;

$wgPopupsReferencePreviewsBetaFeature = false;

//PageImages Extension - works with Popups

wfLoadExtension( 'PageImages' );

//Text Extracts Extension - works with Popups

wfLoadExtension( 'TextExtracts' );

$wgPopupsTextExtractsIntroOnly = false;

$wgExtractsRemoveClasses = [ 'h1', 'h2', 'h3' ];

Thanks!

Reply Edited 08:37, 1 June 2021 2 years ago

Reply to ".. preview"

Hide equal characters and headings

8 comments • 08:27, 31 May 2021 2 years ago

8

Summary by Thiemo Kreuz (WMDE)

Tracked in Phabricator
Task T271439

Spiros71 (talkcontribs)

Using the Popups extension MW 1.31. Headings (level 2/3) are displayed with equal signs left and right. Is there a way not to show them?

Here is an image:

I tried adding this in TextExtracts\includes\ExtractFormatter.php but it did not seem to work.

$text = preg_replace( "={2,}(.*?)={2,}", "", $text );

Surrounding the headings (found on the template) with <div class="noexcerpt"> , did partially solve the problem but resulted in the mobile version (MinervaNeu) headings not being expandable/collapsible.

Reply Edited 18:18, 3 January 2021 3 years ago

Thiemo Kreuz (WMDE) (talkcontribs)

Do you have a link to the actual page where this happens?

Reply 10:44, 7 January 2021 3 years ago

Spiros71 (talkcontribs)

I added the following in ExtractsRemoveClasses in TextExtracts\extension.json and it seemed to fix it.

"h2",

"h3",

Reply 12:27, 7 January 2021 3 years ago

Thiemo Kreuz (WMDE) (talkcontribs)

I would love to know more about your setup. See, every Wikipedia article starts with a level 2 headline. Why don't we have this issue, but you have it? What's the difference?

Utilizing the configuration is a nice idea. However, you should not edit extension.json but add $wgExtractsRemoveClasses[] = 'h2' ;$wgExtractsRemoveClasses[] = 'h3'; to your LocalSettings.php.

Reply 13:37, 7 January 2021 3 years ago

This post was hidden by Thiemo Kreuz (WMDE) (history)

Spiros71 (talkcontribs)

Thanks so much for the feedback. The difference may be that the level 2 headings are declared in the templates. Level 3 are in page text.

Here is the wiki http://lsj.gr/

Setup:

MediaWiki 1.31.12

PHP 7.3.25 (fpm-fcgi)

MariaDB 10.3.27-MariaDB-log

ICU 50.2

Elasticsearch 5.6.13

Extensions:

API, CirrusSearch, Cite, Data Transfer, Elastica, Extension, Gadgets, LabeledSectionTransclusion, Lockdown, MobileFrontend, Normalizer, Nuke, Other, Page Forms, PageImages, Parser hooks, ParserFunctions, ParserHooks, Popups, Replace Text, Semantic Drilldown, Semantic MediaWiki, Semantic Result Formats, Special pages, TextExtracts,

Reply Edited 15:39, 7 January 2021 3 years ago

Thiemo Kreuz (WMDE) (talkcontribs)

The template is indeed the reason. Thanks a lot for the follow-up! I was now able to replay the issue locally. I created phab:T271439 to discuss possible fixes.