Topic on Talk:Parsoid

help with Command line testing of the Parsoid node

19
Summary by Revansx

discovered that my CLI troubleshooting techniques were using a second web-server that was not configured properly. It needed php enabled for the localhost calls. This issue is probably only helpful to folks in an enterprise server environment with an apache policy agent and overbearing enforcement of http to https on all incoming requests.

Revansx (talkcontribs)

Please help, I am trying to troubleshoot my Parsoid node from the command line.
 
  From the CLI of my Centos7 host:
 
  1) when I type netstat -plntu
I am able to confirm that Parsoid is running on port 8000

tcp6      0      0 :::8000                :::*                   LISTEN      27853/node


2) when I type: curl http://127.0.0.1/mywiki/api.php
the response is the expected html page code of the api.php page
 
 
3) when I type curl http://127.0.0.1:8000/version
it responds with the appropriate Parsoid version info:

{"name":"parsoid","version":"0.7.1+git","sha":"a47a89845a93b4cd1fa961494d156f2555ce2531"}


  4) but when I run: curl -L http://127.0.0.1:8000/localhost/v3/page/html/Main_Page/
it fails with:

error: Failed to parse the JSON response for Config Request path: /localhost/v3/page/html/Main_Page/
lib/index.js: Failed to parse the JSON response for Config Request
   at ConfigRequest.ApiRequest._handleBody (/opt/parsoid/lib/mw/ApiRequest.js:470:12)
   at ConfigRequest.ApiRequest._requestCB (/opt/parsoid/lib/mw/ApiRequest.js:421:8)
   at Request.self.callback (/opt/parsoid/node_modules/request/request.js:186:22)
   at emitTwo (events.js:106:13)
   at Request.emit (events.js:191:7)
   at Request.<anonymous> (/opt/parsoid/node_modules/request/request.js:1163:10)
   at emitOne (events.js:96:13)
   at Request.emit (events.js:188:7)
   at IncomingMessage.<anonymous> (/opt/parsoid/node_modules/request/request.js:1085:12)
   at IncomingMessage.g (events.js:292:16)
   at emitNone (events.js:91:20)
   at IncomingMessage.emit (events.js:185:7)
   at endReadableNT (_stream_readable.js:974:12)
   at _combinedTickCallback (internal/process/next_tick.js:80:11)
   at process._tickCallback (internal/process/next_tick.js:104:9)

my config.yaml is configured as:

uri: 'http://127.0.0.1/mywiki/api.php'
domain: 'localhost'  

and my localsettings.php is configured as:

if ( $REMOTE_ADDR == '127.0.0.1' ) {  $wgGroupPermissions['*']['read'] = true; $wgGroupPermissions['*']['edit'] = true; }
wfLoadExtension( 'VisualEditor' );
$wgDefaultUserOptions['visualeditor-enable'] = 1;
$wgDefaultUserOptions['visualeditor-editor'] = "visualeditor";
$wgHiddenPrefs[] = 'visualeditor-enable';
$wgDefaultUserOptions['visualeditor-enable-experimental'] = 1;
$wgVirtualRestConfig['modules']['parsoid'] = array(
'url' => 'http://localhost:8000',
'domain' => 'localhost',
'prefix' => 'localhost'
);
$wgVisualEditorAvailableNamespaces = [
    NS_MAIN  => true,
    NS_TALK  => true,
    NS_USER  => true,
    "_merge_strategy" => "array_plus"
];
$wgSessionsInObjectCache = true;
$wgVirtualRestConfig['modules']['parsoid']['forwardCookies'] = true;

here is npm version:

/opt/parsoid/ #npm version
{ parsoid: '0.8.0+git',
  npm: '3.10.10',
  ares: '1.10.1-DEV',
  http_parser: '2.7.1',
  icu: '50.1.2',
  modules: '48',
  node: '6.12.3',
  openssl: '1.0.2k-fips',
  uv: '1.10.2',
  v8: '5.1.281.111',
  zlib: '1.2.7' }

Questions:

  1. What could cause the "Failed to parse the JSON response for Config Request path" error?
  2. What other information could be relevant to the command line testing of the Parsoid
  3. What other tests can I run?
  4. Where are the Parsoid errors being logged?
Revansx (talkcontribs)
Arlolra (talkcontribs)

1) The request is being responded to with something other than JSON.

The config request is similar to, http://127.0.0.1/pbswiki/api.php?action=query&meta=siteinfo&format=json

Does that work? Does it return parseable JSON? Are there any warnings or additional characters leaking in from your LocalSettings.php?

2) Above, the path you tested in 2. was mywiki but your config.yaml says pbswiki

3) Try disabling all your extensions so that it's just Parsoid talking to the MediaWiki api and if that works, slowly re-enabled them.

4) Depends how you installed / are running the service. The default is just stderr though.

Revansx (talkcontribs)

1a. yes. "http://127.0.0.1/mywiki/api.php?action=query&meta=siteinfo&format=json" works, but

1b. No. the response is not JSON, it appears to be the html of the API page..

... any idea why that happens?

2. *derp* .. ignore that please.. pbswiki is indeed the actual install name.. I'm tying to avoid sharing the actual site name when I share error messages and configuration settings.. i'm not always as thorough as I wish I was..

3. will do.

4. hmm.. i haven't done anything specific for a log file.. it is whatever the defaults are.. there isn't anything in config.yaml that sets a log file name and path. should there be?

Revansx (talkcontribs)

so... somewhat unrelated, I attempted to get the raw wiki text of an artcile from cURL using:

curl http://127.0.0.1/mywiki/index.php?title=Test&action=raw

and it resulted with just the "index.php" html which I thought was strange because (regardless of parsoid) curl should be able to do that easily (right?)... and so I looked-up how to send POST data with a CURL commands and tried this:

curl --data "title=Test&action=raw" http://127.0.0.1/mywiki/index.php

to which the response was:

<html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.12.2</center>
</body>
</html>

...could this "405 Not Allowed" be happening to the JSON request? or is this nothing.

Revansx (talkcontribs)

In order to assist in debugging, I ran

[/opt/parsoid] #npm install log

per: "https://www.npmjs.com/package/log". Now I need to learn how to get it make log entries and how/where to read them.

Arlolra (talkcontribs)

From 1b above, it seems like your MediaWiki instance isn't installed correctly. Maybe whatever your webserver sitting in front of it is (Apache?) isn't passing the querystring parameters?

Whatever the case, you need to resolve that first before trying to get Parsoid to work. That config request should return JSON.

Revansx (talkcontribs)

ok.. that gives me something new to figure out.. Do you by chance know how to perform a "wikitext to html" conversion with "nodejs" at the command line?

Arlolra (talkcontribs)

With Parsoid, yes, it's echo "my '''wikitext'''" | node bin/parse --wt2html --offline

Revansx (talkcontribs)

That worked!!! :-)

[/opt/parsoid] #echo "my '''wikitext'''" | node bin/parse --wt2html --offline

produced

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head prefix="mwr: http://en.wikipedia.org/wiki/Special:Redirect/"><meta charset="utf-8"/><meta property="mw:pageNamespace" content="0"/><meta property="isMainPage" content="true"/><meta property="mw:html:version" content="1.6.0"/><link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/Main%20Page"/><title></title><base href="//en.wikipedia.org/wiki/"/><link rel="stylesheet" href="//en.wikipedia.org/w/load.php?modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Csite.styles%7Cext.cite.style%7Cext.cite.styles%7Cmediawiki.page.gallery.styles&only=styles&skin=vector"/><!--[if lt IE 9]><script src="//en.wikipedia.org/w/load.php?modules=html5shiv&only=scripts&skin=vector&sync=1"></script><script>html5.addElements('figure-inline');</script><![endif]--></head><body data-parsoid='{"dsr":[0,18,0,0]}' lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr"><p data-parsoid='{"dsr":[0,17,0,0]}'>my <b data-parsoid='{"dsr":[3,17,3,3]}'>wikitext</b></p>
</body></html>

.. Thank you!!
 
and so then I tried:

[/opt/parsoid] #echo "<html><head><title>Test</title></head><body><table><tr><td>A</td><td>B</td></tr></table></body></html>" | node bin/parse --html2wt --offline

which produced

{|
|A
|B
|}

Yay!

Revansx (talkcontribs)

ok.. now I'm convinced the original problem is an issue with my servers configuration.. stay tuned!

Arlolra (talkcontribs)

But note, the --offline prevents fetching any info from the MediaWiki API so you'll want to get the above fixed to use it in the context of your wiki.

Revansx (talkcontribs)

gotcha! .. it's still a great test.. I'm hoping that I can capture here all of the command line tests that one can run to systematically troubleshoot a nodejs, parsoid, VE issue and quickly show where the problem lies. If I understand right, you are saying I can remove the "--offline" switch and replace it with wiki specific info and do the same thing, right?

Arlolra (talkcontribs)

I'm saying templates and images, etc. won't resolve in offline mode. The parser is more useful when it has a mediawiki api to fetch info from.

Revansx (talkcontribs)

ok, so.. i used your example to create this api url for wikipedia:

https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&format=json

and it produced this result:

{"batchcomplete":"","query":{"general":{"mainpage":"Main Page","base":"https://en.wikipedia.org/wiki/Main_Page","sitename":"Wikipedia","logo":"//en.wikipedia.org/static/images/project-logos/enwiki.png","generator":"MediaWiki 1.31.0-wmf.22","phpversion":"5.6.99-hhvm","phpsapi":"srv","hhvmversion":"3.18.6-dev","dbtype":"mysql","dbversion":"10.0.34-MariaDB","imagewhitelistenabled":"","langconversion":"","titleconversion":"","linkprefixcharset":"","linkprefix":"","linktrail":"/^([a-z]+)(.*)$/sD","legaltitlechars":" %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\\x80-\\xFF+","invalidusernamechars":"@:","fixarabicunicode":"","fixmalayalamunicode":"","git-hash":"fd29ac30b061c6e0c6590eed98f700a4e829a424","git-branch":"wmf/1.31.0-wmf.22","case":"first-letter","lang":"en","fallback":[],"fallback8bitEncoding":"windows-1252","writeapi":"","maxarticlesize":2097152,"timezone":"UTC","timeoffset":0,"articlepath":"/wiki/$1","scriptpath":"/w","script":"/w/index.php","variantarticlepath":false,"server":"//en.wikipedia.org","servername":"en.wikipedia.org","wikiid":"enwiki","time":"2018-02-26T02:08:54Z","misermode":"","uploadsenabled":"","maxuploadsize":4294967296,"minuploadchunksize":1024,"galleryoptions":{"imagesPerRow":0,"imageWidth":120,"imageHeight":120,"captionLength":"","showBytes":"","mode":"traditional","showDimensions":""},"thumblimits":[120,150,180,200,220,250,300,400],"imagelimits":[{"width":320,"height":240},{"width":640,"height":480},{"width":800,"height":600},{"width":1024,"height":768},{"width":1280,"height":1024}],"favicon":"//en.wikipedia.org/static/favicon/wikipedia.ico","centralidlookupprovider":"CentralAuth","allcentralidlookupproviders":["CentralAuth","local"],"interwikimagic":"","magiclinks":{"ISBN":"","PMID":"","RFC":""},"categorycollation":"uca-default-u-kn","wmf-config":{"wmfMasterDatacenter":"eqiad"},"citeresponsivereferences":"","linter":{"high":["deletable-table-tag","html5-misnesting","misc-tidy-replacement-issues","multiline-html-table-in-list","multiple-unclosed-formatting-tags","pwrap-bug-workaround","self-closed-tag","tidy-font-bug","tidy-whitespace-bug","unclosed-quotes-in-heading"],"medium":["bogus-image-options","fostered","misnested-tag","multi-colon-escape"],"low":["missing-end-tag","obsolete-tag","stripped-tag"]},"mobileserver":"https://en.m.wikipedia.org","pageviewservice-supported-metrics":{"pageviews":{"pageviews":""},"siteviews":{"pageviews":"","uniques":""},"mostviewed":{"pageviews":""}},"readinglists-config":{"maxListsPerUser":100,"maxEntriesPerList":1000,"deletedRetentionDays":30}}}}

would you expect this to be similar to the result of the cURL command on my 127.0.0.1 command?

Revansx (talkcontribs)

It just dawned on me that I need to enable php on nginx :-) .. as strange as this sounds.. I run 2 web servers:

  1. I run nginx for loop-back/localhost api calls
  2. and I run apache for all external client user access.

I do this because my server has a Security policy that says that all incoming calls to the server must have a session that is validated by a remote authentication server. This means that any server-side activity (like parsoid) using apache port 80 or 443 gets caught-up in the redirect in an attempt to force the user to achieve a validated session. I have no control over this. So, I am attempting to solve the problem by running 2 web-servers. External visitors with validated sessions use SSL apache on port 443 and localhost calls use nginx on port 8080. If that sounds crazy, then all I can say, is, I'm all ears if you know a better way to get parsoid running on a private enterprise wiki with a security policy that enforces remotely secured sessions. please let me know! :-)

Arlolra (talkcontribs)
Revansx (talkcontribs)

Yes. I solved the problem. My enterprise server environment is tricky to say the least. I have a Policy Agent running that intercepts all traffic and forwards it to a remote identity provider which enforces the session header and then routes the user back to the site for auto-login consuming the session attributes via the Remote_Auth extension.. this has been a very tricky obstacle for me in developing good command line troubleshooting techniques. I slugged through getting the parsoid service working as a sole site, but now I'm trying to install multiple sites and alter my config.yaml to host multiple wikis.. that's when everything broke on me and I wanted to troubleshoot the parsoid service from the CLI irrespective of the wiki site(s). That's why I set-up the nginx server.. so that I could have a web-server that I could talk to from the CLI that didn't have the hassles of the Policy Agent to contend with.

Revansx (talkcontribs)

the command

[/opt/parsoid]# curl -L http://127.0.0.1:8000/localhost/v3/page/html/Test

now produces the expected response:

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/" about="https://mywiki.mycompany.com/smw/Special:Redirect/revision/20526"><head prefix="mwr: https://mywiki.mycompany.com/smw/Special:Redirect/"><meta charset="utf-8"/><meta property="mw:pageNamespace" content="0"/><meta property="mw:pageId" content="7458"/><link rel="dc:replaces" resource="mwr:revision/20496"/><meta property="dc:modified" content="2018-02-22T19:40:20.000Z"/><meta property="mw:revisionSHA1" content="1e0bc0a9623dd93bb05ca2e0d4b3862b6859e406"/><meta property="mw:html:version" content="1.6.0"/><link rel="dc:isVersionOf" href="https://mywiki.mycompany.com/smw/Test"/><title>Test</title><base href="https://mywiki.mycompany.com/smw/"/><link rel="stylesheet" href="//mywiki.mycompany.com/mywiki/load.php?modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Csite.styles%7Cext.cite.style%7Cext.cite.styles%7Cmediawiki.page.gallery.styles&amp;only=styles&amp;skin=vector"/><!--[if lt IE 9]><script src="//mywiki.mycompany.com/mywiki/load.php?modules=html5shiv&only=scripts&skin=vector&sync=1"></script><script>html5.addElements('figure-inline');</script><![endif]--></head><body data-parsoid='{"dsr":[0,10,0,0]}' lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr"><section data-mw-section-id="0" data-parsoid="{}"><p data-parsoid='{"dsr":[0,10,0,0]}'>test 1 2 3</p></section></body></html>
  • my nodejs now works at the command line (per the above command)
  • both my VisualEditors now work in both wikis hosted on the same server with a single parsoid node.

... thanks!