Topic on Talk:Parsoid

Using Parsoid independently

10
Summary by Arlolra

Try parsing with --mock

AB1908 (talkcontribs)

Hi folks. I recently got Parsoid running and ran into a bit of trouble. I wanted to generate (only) HTML from wikitext while using two extensions, Cite and Spoiler, with a few adjustments for Cite. However, I'm unable to find a way to reliably get the output I want. Pandoc and alternative parsers listed on MediaWiki have incomplete support at best and incorrect output at worst and other markup tools/languages like Markdown or AsciiDoc aren't versatile enough. It seems that, to modify Cite's settings, I need to use system messages for which I'd need MediaWiki set up, something I'd like to avoid as all I want is a parser. Is my current goal of using only Parsoid to generate HTML impossible? If not, how am I supposed to go about it? Am I misunderstanding things? I've jumped through a lot of hoops and tested a bunch of different parsers. I've exhausted all my options before asking here; I really wish to avoid writing HTML as much as I can and wikitext is the best lightweight markup language I've found, so I'd appreciate any help.

SSastry (WMF) (talkcontribs)

As long as you don't have templates and images, you can make this work. However, Parsoid does not know about the Spoiler extension and so you would need a Parsoid-native implementation of it. But, once that is in place, you can use the bin/parse.php script with the --mock option to parse your wikitext and generate HTML. It operates without MediaWiki.

If you do have templates and images, then, you will need to update the mocks to provide expected output for those templates / media requests (which is workable if you have a small number of them).

AB1908 (talkcontribs)

This is excellent. Thank you very much for the help and for the work on the parser in general (I honestly didn't expect a reply from the team lead). I have a few more questions if you could bear with me:

1. Where would I find a parsoid-native implementation of an extension? How do I "enable" it, so to speak, for Parsoid? Is it the source code of the extension?

2. What are "mocks"? I couldn't find much information on them. The --help for parse.js doesn't list them and I couldn't find documentation on them either. I am using an up-to-date copy of the source.

3. I would like to customise the output of the Cite extension which can be done via system messages. How do I do that without using MediaWiki? Do I have to change configurations elsewhere? (non essential, can be ignored)

Arlolra (talkcontribs)

For now, native extensions are in the Parsoid repo, in src/Ext/ and lib/ext/. They will eventually move to the respective extension repos.

Mocks weren't a concept in Parsoid/JS. You'd want to use the --offline flag for the same purpose. But keep in mind that Parsoid/JS is only getting security fixes and is rapidly approaching EOL. You should consider working with Parsoid/PHP

AB1908 (talkcontribs)

> For now, native extensions are in the Parsoid repo, in src/Ext/ and lib/ext/. They will eventually move to the respective extension repos.


I assume this means using the Spoiler extension is off the table. Regardless, thanks again for taking the time to help.

Arlolra (talkcontribs)

No, if you wrote a native implementation of it, there's an extension registration mechanism. The code can live wherever you want.

SSastry (WMF) (talkcontribs)

If https://github.com/Telshin/Spoilers/blob/master/SpoilersHooks.php is the spoiler extension, that should be fairly simple to implement. Take a look @ Parsoid/Extension API#Examples (but will require the master branch of Parsoid). If you are working with the JS version, it will still be simple, but you will have to take a look at Cite's ref.js file and the toDOM method there.

But anyway, yes, you will need to write that code before you can use the spoiler extension with Parsoid.

SSastry (WMF) (talkcontribs)

For example, see below:

[subbu@earth:~/work/wmf/parsoid] echo -e "a <ref>boo</ref>\n\n*a\n*b" | php bin/parse.php --mock --body_only --pageName 'Testing' --normalize=parsoid

<p>a <sup class="mw-ref" id="cite_ref-1" rel="dc:references" typeof="mw:Extension/ref" data-mw='{"name":"ref","attrs":{},"body":{"id":"mw-reference-text-cite_note-1"}}'><a href="./Testing#cite_note-1"><span class="mw-reflink-text">[1]</span></a></sup></p>
<ul>
<li>a</li>
<li>b</li>
</ul>
<div class="mw-references-wrap" typeof="mw:Extension/references" data-mw='{"name":"references","attrs":{},"autoGenerated":true}'>
<ol class="mw-references references">
<li id="cite_note-1"><a href="./Testing#cite_ref-1" rel="mw:referencedBy"><span class="mw-linkback-text">↑ </span></a> <span id="mw-reference-text-cite_note-1" class="mw-reference-text">boo</span></li>
</ol>
</div>
AB1908 (talkcontribs)

The php script fails for me stating that ../vendor/autoload.php was not found. This is a total non-issue for me though as I have the JS version working.

Arlolra (talkcontribs)

You need to composer install before running the php script.