Extension:ChessBrowser/PGN schema

Description of standard

PHP parser

 * Proof of concept parser should be converted to it's own class.
 * Existing "else if" blocks can be made into sub parsers.
 * Import format for parser deviates from 8.2.1. The standard implies that moves can be directly adjacent, e.g. Re5Qd6Kc3. Import format for this parser requires at least one space between adjacent symbol tokens.

Developer notes

 * PHP should take import format and deliver export format.
 * JavaScript can be simplified by only accepting export format PGN. As the JavaScript is delivered by the extension, it should only take input that the PHP places on the page.
 * Parsing output can probably be done token by token without regexes
 * If extension is later expanded to resolve T239438, additional PHP function can take the export format and translate data for JavaScript

HTML
The html for the display is currently built by the front-end (javascript), and somewhat controlled by the "config", e.g., if the config stipulates "delay", front end does not display "faster/slower" buttons.

Discuss:


 * Should the back-end (php) provide the html?
 * Is the current html produced by the script good UX?
 * Is it possible to get UX feedback?

CSS
ideally, it should be project override-able, and provided by the extension.

JavaScript parser
Written by Kipod, but will need modified as heavy parsing is moved to PHP. Parsing in JS may eventually be made obsolete (See T239438).

RFC: Proposal for export format from server to front-end script
This is loosely based on the artifacts generated by calling "analyzePgn" on the javascript viewer.

Data will be passed from PHP to JavaScript by way of a JSON object. Some config values may be taken as user input by way of XML attributes in the wikitext, for example  or . The PHP parser will output a with an attribute, , whose value is the JSON object. The JavaScript module can then retrieve this JSON object and perform the necessary client-side operations. The JavaScript gadget which the extension's module is based on supports multiple games on the same page. In these cases, the JSON could contain an array of "games" instead of a single game, with a single game being a special case where the array has a length of 1. The content of this JSON object is as follows:

The  field contains the tag pairs from the PGN input. They must include the seven tag roster (spec 8.1.1) and may include arbitrarily many additional tag pairs.
 * Metadata

The  field provides enough data to draw each piece, i.e., type and color.
 * Pieces

The  field contains a sequence of board states. One entry of the array, a board or, would be like a FEN string. Unlike FEN which does not keep track of where a piece came from, a superFEN would associate a unique piece (specified in the  field) with its position on the board. SuperFEN would still maintain other features of FEN such as: sequences of empty squares are condensed to an integer and each piece is represented by a character. It could be enhanced by not using line breaks at all or describing a single line of 64 entries. Of course, this is but one option. the important thing is that each "board" will carry enough information to know which pieces (indexes in the "pieces" array) are in the game, and which square each of them occupies.
 * Boards

The  field describes the game in Standard Algebraic Notation as an array of PGN integer and symbol tokens (spec 7). As an alternative, the PHP could generate the SAN as HTML rather than passing the information to the JavaScript to be drawn. The  field contains a dictionary of comments (spec 5) where the key is the ply and the value is the comment.
 * SAN
 * Comments

By using consistent individual pieces, very little logic is required to draw the board (tell all the pieces in this board to be in their intended places, and all those which are not in this board, to hide. this logic is identical or similar to existing script).
 * Rationale

Comments

 * Note: Full pgn standard allows foc "comments" interspersed among the notation. Current javascript does not supports nesting, and does not support alternatives (i.e., the user can not see the actual positions for algebraic notations in comments). This limitation is mainly because of the parsing. If the php-based pgn parser supports nesting and alternatives, it should be fairly straightforward to augment the data structure, and write a contract which will make it fairly easy for the front-end to show alternatives too. -- kipod
 * In general this seems reasonable. We shouldn't pass user input straight into the output config object for security reasons. Since PGN operates on the level of tokens, it's better to specify the SAN as a token sequence rather than as chopped up SAN. This means that the JavaScript only needs to figure out what to do with a token of that type since it's guaranteed a token. If it were given something like "1.e4", that's technically 3 tokens that need to be parsed, whereas "1","e4" is an integer (move) token, and "e4" is a symbol (san) token. I'm interested in the superFEN idea, but wonder what it buys us that FEN doesn't. If we have a sequence of FEN positions, could we not reverse engineer what piece needs moved? It seems that our job is made slightly easier because of the SAN data: we already know what movement gets us from one FEN state to another. Plus we get the data for the FEN tab for free. If we did go the superFEN route, we'd need to come up with a 36 character set to specify each individual piece in the superFEN and associate it with the pieces array. Not hard, but it would differ from FEN's, and we'd need to convert it to FEN or drop support for giving FEN for any board position. Wugapodes (talk) 04:06, 1 December 2019 (UTC)
 * i played a bit with this idea (i.e., back-end passes the "parsing" result as array of FENs). it _is_ doable - consecutive array of FEN indeed contains enough information, but one has to ask, what is the value? admittedly, converting raw FENs to the actual data needed is somewhat lighter work than simply parsing the PGN, but the diff is not as impressive as one might think. maybe going from ~300 lines of code to 100+ lines. is it worth it?
 * i was thinking that one of the advantages of splitting the work is that this will basically allow the same "front-end" to be used with quite a few other games similar to chess, such as, say, shogi, checkers, and other turn-base board games where the board is a matrix of "squares", and position is governed by "file/row" combo. however, if we take the "give me a series of fen" approach, i am not at all sure this has any advantage over "give me algebraic notation" approach, i.e., what we actually do now.
 * maybe the best approach is something like "give me augmented PGN" (or rather augmented ASN): the main challenge with working with ASN is that "e4" does not tell you which piece moved to e4: you know it's a pawn (no [RNBQK] prefix), but you have to work out which pawn is it. if the backend will pass instead "e2e4", i.e. state origin square explicitly, the "analyze" part becomes a breeze - actually, easier than working with consecutive FENs, and less data.
 * if we want to also output the FEN (TBH, this is not a hard requirement, and we can simply drop the "fen" tab, which has dubious value anyway), the "board-to-fen" routine is short and sweet: this is how current boardToFen looks like: so saving us the need to generate FEN is no big deal. of course, this is not a real FEN - it only contains the part that describes the board, not the remainder, which tells you whose turn it is, who can still castle, etc., but the current script does not give this information, and nobody complained so far, and if we go with "series of FEN", the script will have to strip this remainder anyway... peace - קיפודנחש (talk) 23:16, 3 December 2019 (UTC)

i18n
The server can preload the different strings used by the front-end as mw.messages entries.

This will allow projects to stuff the translations "manually" for languages whose translations are not yet integrated with the extension, or to override specific strings.

strings that can be translated:


 * hints ("title=") for all the buttons
 * ui components. ATM, the script displays 3 textual titles, for the tabs (FEN, Notation, Metadata)
 * (optional) transformation for file and row legends
 * (optional) transformation for piece designation (RNBQK), possibly for move numbering (e.g., for ar, ۱۲۳۴۵۶۷۸۹۰)