Jump to navigation Jump to search

About this board

Can Parsoid fully work without Internet connection?

DungLe94 (talkcontribs)

Assume that I have set up MediaWiki along with Parsoid properly. I don't import the `.xml.bz2` dump into MediaWiki. I also don't have Internet connection. Is it possible for Parsoid to convert the wikitext in the dump into a complete HTML, just as the Standard HTML REST API call from the wiki website does?

Reply to "Can Parsoid fully work without Internet connection?"

How to use thenets/parsoid in Docker in Windows 10?

DungLe94 (talkcontribs)

I've installed thenets/parsoid on Docker on Windows 10. I want to convert the text file F:\zim\pomme.txt to html. I tried

docker run --name myparsoid -d -t -i -v /f/zim:/zim thenets/parsoid:latest sh

type /zim/pomme.txt | docker exec myparsoid php bin/parse.php --wt2html --offline

but it returns an error

Microsoft Windows [Version 10.0.19042.928] (c) Microsoft Corporation. All rights reserved.

C:\Users\Akira>docker run --name myparsoid -d -t -i -v /f/zim:/zim thenets/parsoid:latest sh 7912b0cef8fba4244b2519f4f9603ec8e278b67bcc4fe08f4658721b98f941f3

C:\Users\Akira>type /zim/pomme.txt | docker exec myparsoid node bin/parse.js --wt2html --offline The syntax of the command is incorrect. internal/modules/cjs/loader.js:638

   throw err;

Error: Cannot find module '/bin/parse.js'

   at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15)
   at Function.Module._load (internal/modules/cjs/loader.js:562:25)
   at Function.Module.runMain (internal/modules/cjs/loader.js:831:12)
   at startup (internal/bootstrap/node.js:283:19)
   at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3)

Could you please shed some light on how to fix the error?

Reply to "How to use thenets/parsoid in Docker in Windows 10?"

Whitespace in headings?

Summary by Arlolra
RoySmith (talkcontribs)

What is the intended behavior when parsingehavior when parsing:

== Foo ==

I would have expected the whitespace around Foo to be preserved, but it's not. The example at Parsoid/API#POST 2 implies that it is, but when I try it, the whitespace is gone:

wget -q -O -  ''

<!DOCTYPE html>

<html prefix="dc: mw:" about=""><head prefix="mwr:"><meta property="mw:TimeUuid" content="b56a5f60-69b0-11eb-876b-49aa12313550"/><meta charset="utf-8"/><meta property="mw:pageId" content="66664679"/><meta property="mw:pageNamespace" content="2"/><link rel="dc:replaces" resource="mwr:revision/0"/><meta property="mw:revisionSHA1" content="9fa2ea02674418d1bab8d09bd0c639bcf220a57b"/><meta property="dc:modified" content="2021-02-08T01:54:45.000Z"/><meta property="mw:html:version" content="2.2.0"/><link rel="dc:isVersionOf" href="//"/><title>User:RoySmith/sandbox/parsoid-whitespace-example</title><base href="//"/><link rel="stylesheet" href="/w/load.php?lang=en&amp;modules=mediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Csite.styles&amp;only=styles&amp;skin=vector"/><meta http-equiv="content-language" content="en"/><meta http-equiv="vary" content="Accept"/></head><body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr"><section data-mw-section-id="0" id="mwAQ"></section><section data-mw-section-id="1" id="mwAg"><h2 id="Foo">Foo</h2></section></body></html>

SSastry (WMF) (talkcontribs)

Parsoid starts but fails to connect with curl

Summary by Arlolra

User disappeared

Johnjin216326 (talkcontribs)

OS is Fedora 31

I downloaded parsoid from bluespice wiki

ii. create service under /etc/system/system/parsoid.service


Description=Mediawiki Parsoid web service on node.js









    ExecStart=/usr/bin/nodejs /opt/parsoid /bin/server.js





iii. Under /opt/parsoid/config.yaml

worker_heartbeat_timeout: 300000


        level: info


      - module: lib/index.js

        entrypoint: apiServiceWorker


            localsettings: ./localsettings.js

iv. Under /opt/parsoid/localsettings.js


* This is an example configuration for a BlueSpiceWikiFarm setup

* In this case 'httpd' is used as wiki webserver machine name as it is in our

* docker environment.


'use strict';

    exports.setup = function(parsoidConfig) {

        parsoidConfig.dynamicConfig = function(domain) {

   var baseUrl = Buffer.from( domain, 'base64').toString();


        uri: baseUrl + '/api.php',

        domain: domain,

        strictSSL: false




The nodejs is at version 10 and parsoid is v0.10

Here's the output of curl

[root@wiki-server BlueSpice3]# curl

<!DOCTYPE html>

<#html lang="en">


<#meta charset="utf-8">




<#pre>Internal Server Error<#/pre>



(I've added a # in the bracket to show more info)

SELINUX is disabled, firewall is open and listening port 8000, although netstat doesn't show that parsoid service is using the port

[root@wiki-server BlueSpice3]# netstat -aon | grep 8000

tcp 0 0* LISTEN off (0.00/0/0)

httpd is configured with SSL domain certificate and https enabled.

Why does this fail?

Arlolra (talkcontribs)

Try restarting the Parsoid service and see what information is logged to syslog?

Parsoid is not working in non-english language

Summary by Arlolra

User disappeared (talkcontribs)

I tried several time to setup my wiki in spanish languate with MediaWiki 1.35.1, and always shows a Parasoid/Rest error curl 7, however when I choose english as the wiki languate, everything goes fine.

Hope this can be fixed, cause my users doesn't talk english.

Arlolra (talkcontribs)

How did you install Parsoid?

Parsoid - Memory exhaustion on a big page

Summary by Arlolra

Something in the user's setup is enforcing the limit. (talkcontribs)

Hello, I'm having some problems with mediawiki parsoid regarding memory exhaustion, can someone help me?

on a very big page (can't tell the exact size, but the original written on MS Word has more than 70 pages) I get the following issue

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 135168 bytes) in /var/www/html/mediawiki-1.35.1/vendor/wikimedia/parsoid/src/Html2Wt/WikitextSerializer.php on line 1683. Is an explode function

As you can see, it says the memory limit is 128M, but my phpinfo says 750M, configured via php.ini in several places to make sure (php.ini, php-fpm.conf)

from my phpinfo

memory_limit 750M 750M

here's a grep -r memory_limit on my /etc

php-fpm.d/www.conf:php_admin_value[memory_limit] = 750M

php.ini:memory_limit = 750M

so, both php.ini and fpm are configured with 750M

I already tryed to fix the memory_limit on the LocalSettings.php, but also no deal

PHP 7.4.14 (fpm-fcgi)

MediaWiki 1.35.1

Lua 5.1.5

ICU 65.1

MySQL 5.6.35-80.0-log

wikimedia/parsoid 0.12.1

Can someone help me? This is preventing me and my team to create long and important documents.

Thank you! (talkcontribs)

Oh, and I did stop/restart php, phpfpm and httpd. Even restarted the OS. (talkcontribs)

I also tryed to set ini_set( 'memory_limit', '750M' ); on wikitext2html and WikitextSerializer.php, inside serializeDOM, but it raises the same error

Arlolra (talkcontribs)

This isn't an inherent problem with Parsoid. On the WMF cluster, Parsoid runs with an ~1.4G memory limit, which it occasionally hits, but is certainly not limited to 128M

Something in your setup is enforcing that limit. Maybe it's the OS, maybe it's the HTTP server, or PHP configurations you're mentioning.

Try isolating it. You can run Parsoid on the command line with bin/parse.php

Pass it your large page and see if you run up against the memory limit there (talkcontribs)

Ok, i'll try that,

thank you

How to convert Wikitionary dump to html?

DungLe94 (talkcontribs)

I've just found from this link that Parsoid which is a perfect tool to convert Wikitionary dump to html. I've downloaded the latest dump from here. However, I could not find any instruction to use Parsoid on this offline dump. Could you please elaborate on this issue?

Thank you so much for your help!

Dung Le.

Arlolra (talkcontribs)

There hasn't been any effort to make Parsoid usable with those dumps.

There are often questions about how to use Parsoid offline. See past discussions,

But so far Parsoid is mostly useful when it has access to a MediaWiki API to fetch configuration and resolve templates.

If you wanted to use the source from the dump, you could do something like cat "text from source" | php bin/parse.php --domain --wt2html and that will output some html. Alternatively, you can use the titles from the dump and fetch the html from the REST API,

Reply to "How to convert Wikitionary dump to html?"

Parsoid with Kerberos and Auth_Remoteuser

Wikweng (talkcontribs)

Hello all,

I'm facing around with some problems with Parsoid and the Remoteuser Authentication with Kerberos.

First my setup:

Ubuntu 20.04.

Mediamywiki 1.31.12

PHP 7.4.3

BlueSpice 3.2.0

Parsoid 0.10.0

Kerberos SSO is working fine. Now the problem is, that when editing an article with Visual Editor, the page turns white and when trying to save an "HTTP 500" error appears. In the syslog I have an "401 Unauthorized" Error. I have the following configs. Parsoid is running on the same server as my Apache Webserver and is accessible at port 8000 (via cli and curl and also via browser). Also, when creating a new section or page, the Visual Editor is working.


Apache vhost:

<VirtualHost *:443>


  ServerAlias mywiki

   DocumentRoot /path/to/mediawiki

   <Directory /path/to/mediawiki>

       Options Indexes FollowSymLinks MultiViews

       AllowOverride None



   AuthType Kerberos

   AuthName "Kerberos Login"

   KrbAuthRealms mydomain.COM

   KrbServiceName HTTP/

   Krb5KeyTab /etc/apache2/kerberos/mykeytab.keytab

   KrbLocalUserMapping On #Strips @REALM

   KrbMethodNegotiate on

   KrbMethodK5Passwd off

   Require valid-user

   Require ip






$wgAuthRemoteuserUserName = function() {

   global $wgDBname;

   $user = '';

   if( isset( $_SERVER[ 'REMOTE_USER' ] ) ) {

       $user = $_SERVER[ 'REMOTE_USER' ] . '';


   if( isset( $_SERVER[ 'REMOTE_ADDR' ] ) && substr( $_SERVER[ 'REMOTE_ADDR' ], 0, 4 ) == '127.' ) {

       if( empty( $user ) ) {

           $user = $_COOKIE[$wgDBname.'304f3058RemoteToken'] . '';



   return $user;




// Creating base64 encoded path

$fullPath = $GLOBALS['wgServer'] . $GLOBALS['wgScriptPath'];

$encFullPath = base64_encode( $fullPath );

// Linking with Parsoid

$wgVirtualRestConfig['modules']['parsoid'] = array(

   // URL to the Parsoid instance

   // Use port 8142 if you use the Debian package

   'url' => 'http...', // I wasn't allowed to post it with "://"

   'domain' => $encFullPath,

   'forwardCookies' => true


$wgVisualEditorEnablemywikitext = true;


Parsoid config.yaml:

worker_heartbeat_timeout: 300000


  level: info


  - module: lib/index.js

  entrypoint: apiServiceWorker


       localsettings: ./localsettings.js


Parsoid localsettings.js:

'use strict';

exports.setup = function(parsoidConfig) {

   parsoidConfig.dynamicConfig = function(domain) {

       var baseUrl = Buffer.from( domain, 'base64').toString();


           uri: baseUrl + '/api.php',

           domain: mydomain,

           strictSSL: false





Maybe I miss the obvious here, but I'm facing around with this issue for a few days now and i think it is time to ask the community for help ;)

Reply to "Parsoid with Kerberos and Auth_Remoteuser"
S0ring (talkcontribs)

I used to install Parsoid for MW 1.31 (supported until June 2021) with:

# git clone --branch v0.10.0 /usr/lib/parsoid

but the URL is no longer valid, it returns 404. How to install it?

Arlolra (talkcontribs)
S0ring (talkcontribs)
S0ring (talkcontribs)

This works # git clone --branch v0.10.0

Arlolra (talkcontribs)
S0ring (talkcontribs)

This URL returns the following error:

# git clone --branch v0.10.0 /usr/lib/parsoid

Cloning into '/usr/lib/parsoid'...

fatal: not valid: is this a git repository?

Arlolra (talkcontribs)

TagWhiteList in PHP Parsoid

Summary by Arlolra

See AllowedLiteralTags

Dueni.f (talkcontribs)

Is there a TagWhiteList in PHP parsoid? I used it in JS parsoid for the Extension (added 'A' and 'IMG' to WikitextConstants.js) to show the images in VisualEditor.

Arlolra (talkcontribs)
Dueni.f (talkcontribs)

Exactly right. Thank you very much.