User:OrenBochman/ParserNG/Preprocessor

Quick Preprocessor
Here is wikitext preprocessor written in antlr. It is aimed at parsing for search rather than parsing for rendering It supports recursive template and math expressions in template, some parser function and some magic words

based on
 * http://en.wikipedia.org/wiki/Help:Calculation
 * http://en.wikipedia.org/wiki/Help:Calculation

test input
/* basic */ aa cc

/* parametrised */

/* nested */

/* core parser functions */ HEAVENS TO BETSY! heavens to betsy!

/* ext parser functions */

__NEWSECTIONLINK__ __NONEWSECTIONLINK__ __NOGALLERY__ __HIDDENCAT__ __INDEX__ __NOINDEX__

untest input
 //////////////////////////////////////////////////untested ///////////////////////////////////////////////////////////////

/* magic words - behavior switches*/

__NEWSECTIONLINK__ __NONEWSECTIONLINK__ __NOGALLERY__ __HIDDENCAT__ __INDEX__ __NOINDEX__

/* magic words - page variables */

/*Variables: Date and time*/ 2024 August August August 27 Tuesday     {LOCALYEAR}} /*Variables: Stats */ /*Variables: Stats no commas */

/* Parser Functions: Metadata */

/* Parser Functions: Formatting*/ string string STRING String 3,333 00xyz xyz00 1 is 2 iss

=untested on=

Behavior switches
done now

Variables
For documentation, refer to the Variables section of the MediaWiki page.
 *   (page title including namespace)
 *   (page title excluding namespace)
 *   (page title excluding current subpage and namespace - effectively the parent page without the namespace.)
 *   (subpage part of title)
 *   (associated non-talk page)
 *   (associated talk page)
 *   (namespace of current page)
 * , </tt> (associated non-talk namespace)
 *  </tt> (associated talk namespace)
 * , </tt> etc. (equivalents encoded for use in MediaWiki URLs)

The above can all take a parameter, to operate on a page other than the current page.


 *  </tt>
 *  </tt>
 *  </tt>
 *  </tt>
 *  </tt> (current MediaWiki version)
 *  </tt> (latest revision to current page)
 * , , , , , </tt> (date, time, editor at last edit)


 *  2024, August, August, August, 27,, , Tuesday, , , , </tt> (current date/time variables)
 *  2024 </tt> etc. (as above, based on site's local time)


 * , , , , , , , </tt> (statistics on English Wikipedia; add :R</tt> to return numbers without commas)

Parser functions
These are documented at the main documentation page unless otherwise stated.

Metadata

 *  </tt> (size of page in bytes)
 *  </tt> (protection level for given action on the current page)
 *  </tt> (number of pages in the given category)
 * <tt> </tt> (number of users in a specific group)

Add <tt>|R</tt> to return numbers without commas.

Formatting

 * <tt> string </tt> (convert to lower case)
 * <tt> string </tt> (convert first character to lower case)
 * <tt> STRING </tt> (convert to upper case)
 * <tt> String </tt> (convert first character to upper case)
 * <tt> NaN </tt> (format a number with comma separators; add <tt> | R</tt> to unformat a number)
 * <tt> </tt> (formats a date according to user preferences; a default can be given as an optional case-sensitive second parameter for users without date preference; can convert a date from an existing format to any of <tt>dmy</tt>, <tt>mdy</tt>, <tt>ymd</tt> or <tt>ISO 8601</tt> formats, with the user's preference overriding the specified format)
 * <tt> xyz </tt>, <tt> xyz </tt> (pad with zeros to the right or left; an alternative padding string can be given as a third parameter; the alternative padding string may be truncated if its length does not evenly divide the required number of characters)
 * <tt> NaN iss </tt> (produces alternative text according to whether n is greater than 1)
 * <tt> </tt> (for date/time formatting; also <tt>#timel</tt> for local time. Covered at the extension documentation page.)
 * <tt> </tt> (produces alternative text according to the gender specified by the given user in his/her preferences)
 * <tt> </tt> (equivalent to an HTML tag or pair of tags; can be used for nesting references)

Paths

 * <tt> </tt>, <tt>  </tt> (relative path to the title)
 * <tt> </tt>, <tt>  </tt> (absolute path to the title, without a protocol prefix)
 * <tt> </tt>, <tt>  </tt> (absolute path to the title, with a protocol prefix)
 * <tt> </tt> (absolute URL to a media file)
 * <tt> </tt> (input encoded for use in URLs)
 * <tt> </tt> (input encoded for use in URL section anchors)
 * <tt>     </tt> (name for the namespace with index n; use <tt>  </tt> for the equivalent encoded for MediaWiki URLs)
 * <tt> </tt> (converts a relative file path to absolute; see the extension documentation)
 * <tt> </tt> (splits title into parts; see the extension documentation)

Conditional expressions

 * These are covered at the extension documentation page. Some parameters are optional.

caveats:

 * 1) it is based on identifiers and not general strings
 * 2) tested input is limited.
 * 3) it should support type tags
 * 4) it should support comments  and <nowiki ></nowiki>.
 * 5) math support can be improved to consider precedence