Topic on Talk:Parsoid

Eamercer (talkcontribs)

I have a MediaWiki installation with VisualEditor and Parsoid server.


This was working well until fairly recently. Something changed, but unfortunately, I can't identify what it was. Now when clicking the edit link on any page this error pops up:


Error loading data from server: apierror-visualeditor-docserver-http-error: (curl error: 52) Server returned nothing (no headers, no data). Would you like to retry?


MediaWiki: 1.30.0


Parsoid: 0.8.0


I get the same curl error if I try to use curl to test the parsoid server from the command line:

$ curl -L http: //localhost:8000/version

curl: (52) Empty reply from server


If I telnet to the port and type the GET command, I get a response:

$ telnet localhost 8000
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /version

HTTP/1.1 200 OK
X-Powered-By: Express
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=utf-8
Content-Length: 36
ETag: W/"24-4cl/tnkz8/d5N8Iw6BlbZIYKEhs"
Vary: Accept-Encoding
Date: Tue, 14 Apr 2020 16:55:33 GMT
Connection: close

{"name":"parsoid","version":"0.8.1"}Connection closed by foreign host.

Host is Centos release 7.7.1908


Arlolra (talkcontribs)

Seems like there's something wrong with curl, since both php and the bin are stumbling.

Maybe try seeing what it's sending?

nc -l 8002
# in another terminal
curl localhost:8002
Eamercer (talkcontribs)
$ nc -l 8002
GET / HTTP/1.1
User-Agent: curl/7.29.0
Host: localhost:8002
Accept: */*
Arlolra (talkcontribs)
Eamercer (talkcontribs)
$ curl -v http://localhost:8000/version
* About to connect() to localhost port 8000 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 8000 (#0)
> GET /version HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8000
> Accept: */*
> 
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server

Arlolra (talkcontribs)

Can you check the logs for anything of interest?

Eamercer (talkcontribs)

I'm not seeing anything in the system logs.

I also haven't been able to determine where the parsoid logs are or how to configure them.

I haven't increased the mediawiki logging, but since the problem can be demonstrated apart from the wiki, it didn't seem necessary at this point.

Arlolra (talkcontribs)

How did you install Parsoid?

Eamercer (talkcontribs)


git clone https://gerrit.wikimedia.org/r/p/mediawiki/services/parsoid
ln -s /usr/local/parsoid/ /opt/parsoid
npm install
cd ..
chown -Rv root:root parsoid
chmod -Rv u+rw,g+r,o+r parsoid
firewall-cmd --permanent --zone=public --add-port=8000/tcp
firewall-cmd --reload
cd parsoid/
cp config.example.yaml config.yaml
vim config.yaml
semanage port -m -t http_port_t -p tcp 8000
setsebool httpd_can_network_connect 0
setsebool -P httpd_can_network_connect 0
vi /etc/systemd/system/parsoid.service
systemctl start parsoid.service

config.yaml =

# This is a sample configuration file
#
# Copy this file to config.yaml and edit that file to fit your needs.
#
# Also see:
# - `npm start -- -h` for more information about passing config files via
#   the commandline.
# - lib/config/ParsoidConfig.js for all the properties that you can configure
#   here. Not all properties are documented here.

# The number of http workers (as opposed to `cpu_workers` below)
#num_workers: 1

worker_heartbeat_timeout: 300000

logging:
    level: info

#metrics:
#    type: log

services:
  - module: lib/index.js
    entrypoint: apiServiceWorker
    conf:
        # For backwards compatibility, and to continue to support non-static
        # configs for the time being, optionally provide a path to a
        # localsettings.js file.  See localsettings.example.js
        #localsettings: ./localsettings.js

        # Set your own user-agent string
        # Otherwise, defaults to:
        #   'Parsoid/<current-version-defined-in-package.json>'
        #userAgent: 'My-User-Agent-String'

        # Configure Parsoid to point to your MediaWiki instances.
        mwApis:
        - # This is the only required parameter,
          # the URL of you MediaWiki API endpoint.
          uri: 'http://localhost/migwiki/api.php'
          # The "domain" is used for communication with Visual Editor
          # and RESTBase.  It defaults to the hostname portion of
          # the `uri` property above, but you can manually set it
          # to an arbitrary string. It must match the "domain" set
          # in $wgVirtualRestConfig.
          domain: 'localhost'  # optional
          # To specify a proxy (or proxy headers) specific to this prefix
          # (which overrides defaultAPIProxyURI). Alternatively, set `proxy`
          # to `null` to override and force no proxying when a default proxy
          # has been set.
          #proxy:
          #    uri: 'http://my.proxy:1234/'
          #    headers:  # optional
          #        'X-Forwarded-Proto': 'https'
          # See below, defaults to true.
          #strictSSL: false

        # Enable using compute workers to parse requests.
        #useWorker: true
        # The number of workers in the pool spawned by each http worker to
        # call out for parsing.  Defaults to:
        #   ceil(number of cpus / `num_workers`) + 1
        #cpu_workers: 1

        # We pre-define wikipedias as 'enwiki', 'dewiki' etc. Similarly
        # for other projects: 'enwiktionary', 'enwikiquote', 'enwikibooks',
        # 'enwikivoyage' etc.
        # The default for this is false. Uncomment the line below if you want
        # to load WMF's config for wikipedias, etc.
        #loadWMF: true

        # A default proxy to connect to the API endpoints.
        # Default: undefined (no proxying).
        # Overridden by per-wiki proxy config in setMwApi.
        #defaultAPIProxyURI: 'http://proxy.example.org:8080'

        # Enable debug mode (prints extra debugging messages)
        #debug: true

        # Use the PHP preprocessor to expand templates via the MW API (default true)
        #usePHPPreProcessor: false

        # Use selective serialization (default false)
        #useSelser: true

        # Allow cross-domain requests to the API (default '*')
        # Sets Access-Control-Allow-Origin header
        # disable:
        #allowCORS: false
        # restrict:
        #allowCORS: 'some.domain.org'

        # Allow override of port/interface:
        #serverPort: 8000
        #serverInterface: '127.0.0.1'

        # Enable linting of some wikitext errors to the log
        #linting: true
        #linter:
        #  sendAPI: false # Send lint errors to MW API instead of to the log
        #  apiSampling: 10 # Sampling rate (1 / 10)

        # Require SSL certificates to be valid (default true)
        # Set to false when using self-signed SSL certificates
        # Note that this can also be applied per wiki in the mwApis above
        #strictSSL: false

        # Use a different server for CSS style modules.
        # Leaving it undefined (the default) will use the same URI as the MW API,
        # changing api.php for load.php.
        #modulesLoadURI: 'http://example.org/load.php'

Arlolra (talkcontribs)

What's in /etc/systemd/system/parsoid.service

Eamercer (talkcontribs)
Unit]
Description=Mediawiki Parsoid web service on node.js
Documentation=http://www.mediawiki.org/wiki/Parsoid
Wants=local-fs.target network.target
After=local-fs.target network.target

[Install]
WantedBy=multi-user.target

[Service]
Type=simple
User=root
Group=root
WorkingDirectory=/usr/local/parsoid
# EnvironmentFile=-/etc/parsoid/parsoid.env
ExecStart=/usr/bin/node /usr/local/parsoid/bin/server.js
KillMode=process
Restart=on-success
PrivateTmp=true
StandardOutput=syslog

Arlolra (talkcontribs)

Your config.yaml doesn't define any logging streams so it'll go to stdout

logging:
    level: info

which seems like systemd would send it to syslog

StandardOutput=syslog

Since you didn't see anything there, the curl request probably isn't even reaching the parsoid server.

Above, you demonstrated that telnet worked. Can you try with another tool like wget just to confirm it's only curl that's having the issue?

Arlolra (talkcontribs)

Also, you should confirm that there isn't any conflicting thing listening on 8000

Eamercer (talkcontribs)

wget also connects but receives nothing:

$ wget http://localhost:8000/version
--2020-04-16 16:31:21--  http://localhost:8000/version
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:8000... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2020-04-16 16:31:22--  (try: 2)  http://localhost:8000/version
Connecting to localhost (localhost)|::1|:8000... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

etc.

If I stop the parsoid service, all connections to 8000 just fail immediately.


Could I change the config.yaml to force logging to a specific file?

Arlolra (talkcontribs)

Something like the following should work,

logging:
  level: info
  name: parsoid
  streams:
    - level: info
      path: /tmp/parsoid.log
      type: file
Eamercer (talkcontribs)

I just ran


systemctl stop parsoid


strace -f -s80 npm start

Then I tried to get the version once via curl and once via telnet. Among the tons of system calls listed, the read of the GET from curl and from telnet are both seen.

$ grep 'GET \/version' /home/UPHS.PENNHEALTH.PRV/mercerea1/parsoid.script 
[pid 16579] read(14, "GET /version HTTP/1.1\r\nUser-Agent: curl/7.29.0\r\nHost: localhost:8000\r\nAccept: */"..., 65536) = 85
[pid 16590] read(14, "GET /version\r\n", 65536) = 14

Arlolra (talkcontribs)

Try running on a different port `PARSOID_PORT=4444 npm start`, maybe something is up with your firewall?

Try passing the exact same string in telnet

Eamercer (talkcontribs)

I can send:

GET /version HTTP/1.1

and it will respond. It does not close the connection immediately after responding as it does if I just send:

GET /version


If I send exactly what curl sends:

GET /version HTTP/1.1
User-Agent: curl/7.29.0
Host: localhost:8000
Accept: */*


the server immediately closes the connection. I also tried sending the GET with the User-Agent, Host, and Accept separately, and the behavior is the same if any of those lines are sent with the GET.

Arlolra (talkcontribs)

Did you try running on a different port, as above?

What version of nodejs are you running? And since you installed from git, which parsoid commit are you on?

In older versions of nodejs, it closes without a 400 error when encountering an http parse error, https://github.com/nodejs/node/commit/f2f391e575fc8072d10e1ad1601ef3f67f13a4db

You can try to catch that around here, https://github.com/wikimedia/parsoid/blob/master/lib/api/ParsoidService.js#L235

Add something like,

server.on('clientError', (err, conn) => {
  console.error(err);
});
Eamercer (talkcontribs)

Sorry. Yes, same behavior running on port 4444.


I've got nodejs version 6.12.3


I'm not sure how to get the parsoid commit. It's version 0.8.1 according to the package.json


Oddly, when I made the change to ParsoidService.js, it started responding to curl.


Now if I do

curl -L http://localhost:8000/version

It responds with:

{"name":"parsoid","version":"0.8.1"}

The console where parsoid is running outputs the following:

{ Error: Parse Error
    at Error (native) bytesParsed: 84, code: 'HPE_INVALID_TRANSFER_ENCODING' }

Of course, I immediately tried to edit a page in our wiki, which didn't work, but seemed to get further along before failing with "Error loading data from server: Could not connect to the server." During this, the console output the following:

{ Error: Parse Error
    at Error (native) bytesParsed: 911, code: 'HPE_UNEXPECTED_CONTENT_LENGTH' }
Arlolra (talkcontribs)

That's an obscure error that points here, https://github.com/nodejs/http-parser/blob/master/http_parser.c#L1902

I imagine what happened was the http-parser package in Centos was recently updated for, https://www.tenable.com/plugins/nessus/134238 https://bugzilla.redhat.com/show_bug.cgi?id=1800364 https://github.com/nodejs/http-parser/commit/7d5c99d09f6743b055d53fc3f642746d9801479b

Maybe it got backported incorrectly or something.

Try a newer version of nodejs or download one directly from https://nodejs.org/en/download/ or contact the packager at Centos.

Eamercer (talkcontribs)

There was an update to http-parser installed on March 25th.


I installed node v0.10.46. Now when I enter npm start, it fails:

$ sudo npm start

> parsoid@0.8.1 start /usr/local/parsoid-0.8.1
> service-runner


/usr/local/parsoid-0.8.1/node_modules/service-runner/service-runner.js:12
const cluster = require('cluster');
^^^^^
SyntaxError: Use of const in strict mode.
    at Module._compile (module.js:439:25)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:945:3

npm ERR! Linux 3.10.0-1062.18.1.el7.x86_64
npm ERR! argv "/usr/bin/node" "/bin/npm" "start"
npm ERR! node v0.10.46
npm ERR! npm  v2.15.1
npm ERR! code ELIFECYCLE
npm ERR! parsoid@0.8.1 start: `service-runner`
npm ERR! Exit status 8
npm ERR! 
npm ERR! Failed at the parsoid@0.8.1 start script 'service-runner'.
npm ERR! This is most likely a problem with the parsoid package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     service-runner
npm ERR! You can get information on how to open an issue for this project with:
npm ERR!     npm bugs parsoid
npm ERR! Or if that isn't available, you can get their info via:
npm ERR! 
npm ERR!     npm owner ls parsoid
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR!     /usr/local/parsoid-0.8.1/npm-debug.log

/usr/local/parsoid-0.8.1/npm-debug.log:

0 info it worked if it ends with ok
1 verbose cli [ '/usr/bin/node', '/bin/npm', 'start' ]
2 info using npm@2.15.1
3 info using node@v0.10.46
4 verbose run-script [ 'prestart', 'start', 'poststart' ]
5 info prestart parsoid@0.8.1
6 info start parsoid@0.8.1
7 verbose unsafe-perm in lifecycle true
8 info parsoid@0.8.1 Failed to exec start script
9 verbose stack Error: parsoid@0.8.1 start: `service-runner`
9 verbose stack Exit status 8
9 verbose stack     at EventEmitter.<anonymous> (/usr/lib/node_modules/npm/lib/utils/lifecycle.js:217:16)
9 verbose stack     at EventEmitter.emit (events.js:98:17)
9 verbose stack     at ChildProcess.<anonymous> (/usr/lib/node_modules/npm/lib/utils/spawn.js:24:14)
9 verbose stack     at ChildProcess.emit (events.js:98:17)
9 verbose stack     at maybeClose (child_process.js:766:16)
9 verbose stack     at Process.ChildProcess._handle.onexit (child_process.js:833:5)
10 verbose pkgid parsoid@0.8.1
11 verbose cwd /usr/local/parsoid-0.8.1
12 error Linux 3.10.0-1062.18.1.el7.x86_64
13 error argv "/usr/bin/node" "/bin/npm" "start"
14 error node v0.10.46
15 error npm  v2.15.1
16 error code ELIFECYCLE
17 error parsoid@0.8.1 start: `service-runner`
17 error Exit status 8
18 error Failed at the parsoid@0.8.1 start script 'service-runner'.
18 error This is most likely a problem with the parsoid package,
18 error not with npm itself.
18 error Tell the author that this fails on your system:
18 error     service-runner
18 error You can get information on how to open an issue for this project with:
18 error     npm bugs parsoid
18 error Or if that isn't available, you can get their info via:
18 error
18 error     npm owner ls parsoid
18 error There is likely additional logging output above.
19 verbose exit [ 1, true ]

Arlolra (talkcontribs)

nodejs v0.10 is quite old. You're coming from v6.x, I think you meant to download v10

Eamercer (talkcontribs)

You are correct, of course!


I just installed nodejs v 10.20.1 and everything is working again.


Thank you so much for your assistance and patience!


Arlolra (talkcontribs)

No problem, I was curious