Extension:StackFunctions

What the StackFunctions Extension Is
The StackFunctions extension implements a programming language which is basically PostScript without graphics.

What It Is Useful For
This extension can be considered as an alternative to the combination of ParserFunctions with StringFunctions (and maybe other extensions) or to Winter. Advantages are:
 * When using conditional expressions with ParserFunctions or Winter, wikitext in the false-branch is parsed before evaluating the condition. If that text contains complex templates, such superfluous parsing may take much time. With StackFunctions, wikitext is parsed only if needed for the output.
 * To execute loops, you would need either inefficient auxiliary templates with a limited number of runs or LoopFunctions which also have limits. StackFunctions is able to execute loops with any number of iterations and any depth of nesting in a relatively efficient way.
 * If you have text which on the one hand you'd like to display between ..  and on the other hand to use as an argument to ParserFunctions, you need to build more or less complex structures using ExpandAfter. StackFunctions can handle this in a much easier way.
 * StackFunctions offer a Turing-complete programming language which can be used to implement any algorithm (with more or less effort in a more or less efficient fashion) without need for additional extensions.
 * StackFunctions easily handle complex data structures like arrays and dictionaries, which may also be nested to any level.
 * My personal experience is that StackFunctions generally tend to execute faster than a combination of other existing extensions, especially when the latter solution would need a large number of auxiliary templates which are evaluated many times.

How Io Install It

 * Ensure your PHP has multibyte functions enabled. If you do not have the possibility to do that, you might decide to do without multibyte support. In that case, just delete all occurences of "mb_" from the extension source code.


 * Save the extension source code as a file extensions/StackFunctions/StackFunctions.php.


 * Apply the patch to Parser.php.

require_once( "extensions/StackFunctions/StackFunctions.php" );
 * Add the following line to your LocalSettings.php:

$wfStackFunctionsEnableQuery = true;
 * If you want to enable, also add the following line to your LocalSettings.php:

How To Use It
Preliminary Remark: programming with a stack processor like this is a matter of taste. You might alternatively find it extremely cool or totally unusable. You have been warned.

Syntax
You can use this extension either as a parser extension or as a parser function extension.

Parser Extension
As a parser extension, the syntax is:

 your_stackfunctions_code 

This syntax is the preferrable one because:


 * You don't have to bother with the problems listed below for the parser function syntax.
 * From theory it is likely that this executes faster than the parser function syntax because the code is not parsed before passing it to StackFunctions. (However, I didn't yet write so much code that I can confirm this from observation.)
 * This syntax may contain precompiled code instead of source code.

If you want to evaluate magic words, templates or parser functions, you can do this within the code using the members of the via the statusdict and parsetemplate/showtemplate operators.

The only things that cannot be done with this syntax are:
 * Usage of template parameters , , ...  in StackFunctions code in a template. This might be made available in the future.
 * Using the output of StackFunctions code as a template parameter of parser function parameter.
 * Print section headings. They will be printed, but won't be counted correctly for the TOC.

Parser Function Extension
As a parser function extension, the syntax is:

When using this syntax, you must pay attention to some issues. The basic rule is that any   structures are parsed by the MediaWiki parser before any StackFunctions code is executed. There is a number of consequences:


 * Take care not to have any   in your code. If you have consecutive braces, put a space in between.


 * You cannot use a literal | character inside  </tt> because the parser would interpret it as a separator for an additional parameter passed to #sf:</tt>. Within a string constant, you can write its octal representation \174</tt> instead.


 * If you put a template, magic constant or parser function in a string literal, it will first be evaluated and then the result passed to the StackFunctions code. Use parsetemplate/showtemplate if you want to evaluate it only while executing the StackFunctions code. In particular, you should do so within conditional branches so that the parser spends time only on those templates which are actually needed.


 * When using html tags within string literals, take into account that the MediaWiki parser will interpret them before invoking StackFunctions which is probably not what you want. Use \074, \076</tt> instead of &lt;, &gt;</tt> to avoid this.

Important Note
Code in the two syntax forms is executed at different times. In the 1.9.2 MediaWiki parser, first all code in the xml-like syntax is executed, then all code in the function syntax. This implies that, for instance, an object put on the stack by code in function syntax will never be seen by code in xml-like syntax. As this behaviour depends on the internal realization of MediaWiki, you should not trust it rmeains the same in the future. Therefore I strongly discourage mixing the two syntaxes on the same page (including all templates used on it) if the pieces of code should see each other's activity (leaving objects on the stack, opening dictionaries etc.)

About PostScript
StackFunctions are basically an implementation of PostScript without graphical operators, with a few modifications and with some MediaWiki-specific extensions as explained in detail below. I'm not going to explain PostScript here; you might refer to the following:
 * The PostScript Wikipedia article for a very basic introduction.
 * The PostScript Language Tutorial and Cookbook for a good tutorial.
 * The PostScript Language Reference, third edition for a complete reference.

Implemented PostScript Operators
Chapter 8.1 of the PostScript Language Reference, third edition gives a summary of PostScript operators by category. The following are implemented in StackFunctions:


 * Operand Stack Manipulation Operators: all.
 * Arithmetic and Math Operators: all except rrand</tt>.
 * Array Operators: all.
 * Dictionary Operators: all except maxlength, errordict, $error, globaldict</tt>.
 * String Operators: all except token</tt>.
 * Relational, Boolean, and Bitwise Operators: all.
 * Control Operators: all except stop, stopped, countexecstack, execstack, quit, start</tt>.
 * Type, Attribute, and Conversion Operators: all except executeonly, noaccess, readonly, rcheck, wcheck, cvrs</tt>.
 * Miscellaneous Operators: all except executive, echo, prompt</tt>.

In addition, the show</tt> operator has been implemented: it simply outputs its argument to the MediaWiki parser. In other words, the argument of show must be wikitext (not html), which may contain any kind of wiki features, including templates etc.

Differences to PostScript
There are a few things I implemented differently from PostScript because I believe this way they fit better to the needs of the MediaWiki developer:


 * The whole implementation supports multibyte character sets.
 * The show operator accepts any kind of argument (even though only strings and numbers provide useful results).
 * The string versions of get and put accept one-character strings instead of ASCII numbers. Otherwise it would be difficult to cope with multibyte characters.

Other things are not (yet) implemented because it would require some effort to implement them while I consider them less useful for the MediaWiki developer:


 * Radix numbers, such as 8#1777 16#FFFE 2#1000, are currently not supported.
 * Literal string objects can be specified as (..) only. Hexadecimal data, enclosed in, and ASCII base-85 data, enclosed in <~ and ~>, are currently not supported.
 * Single escaped parentheses with string are currently not supported (because that would make the parser more complex and probably slower). Use \050, \051</tt> if you need unbalanced parentheses within strings.
 * The executable attribute has been implemented for arrays only, and the readable/writable attributes have not been implemented at all. I think there is little point in such attributes for MediaWiki programming, and the only way to implement them would have been to represent any kind of data (including numbers and strings) as arrays in PHP. This would have made the StackFunctions code larger and slower.

Finally, some differences are due to the nature of PHP which is different from what a PostScript engine needs:
 * Test for equality of composite opjects checks whether the elements contain the same values, not whether they refer to the same object. I wouldn't know how to implement the latter in PHP.
 * Substrings are not part of string objects, but independent objects. This implies, for instance, that the copy operator for strings leaves on the stack a new string rather than a substring of the original string.
 * The operator serialnumber</tt> returns the IP adress of the webserver as a string. If you can think of any more useful usage for this operator, please let me know.
 * The loop operators for, forall, loop, repeat</tt> perform a bind</tt> on their procedure argument. If other copies of the procedure exist on the stack or in some dictionary, they will reflect this. I cannot imagine a reasonable application where this behaviour would cause a problem.

For all these differences, comments and suggestions are welcome.

Almost all typechecks have been implemented as in PostScript. Furthermore, the concept that composite objects on the stack are references has been implemented just as in PostScript. For instance, operators like dup create a new reference to the same composite object rather than copying a value.

Strings
String handling in PostScript is cumbersome, so there is a need of adding some operators. However, there is a danger of adding a large collection of partially redundant operators which doesn't ease programming, either. Therefore, my current strategy is to add an operator only when I'm very sure I really need it.


 * concat : string string concat string
 * Concatenate two strings.


 * explode : separator string explode array
 * Wrapper for PHP's explode function.


 * id2namespace : int id2namespace string
 * Convert namespace id to canonical name.


 * implode : separator array implode string
 * Wrapper for PHP's implode function.


 * namespace2id : string namespace2id int
 * Convert canonical name to namespace id. This uses <tt>Namespace::getCanonicalIndex</tt>, hence the input string must be lowercase.


 * tolower : string tolower string
 * Convert string to lowercase.


 * toupper : string toupper string
 * Convert string to uppercase.

Serialization

 * serialize : any serialize string
 * Provide an string representation of the argument by applying the php functions <tt>serialize, gzdeflate, base64_encode</tt>. The result is hence a compressed image of the argument consisting entirely in printable characters. This is useful mainly to generate precompiled code.


 * unserialize : string unserialize any
 * Convert a result of the serialize operator back to its original object.

Prologs

 * prolog : string prolog –
 * Execute StackFunctions code stored in the page indicated in string. The page should contain exactly one <tt> ..  </tt> pair containing the StackFunctions code; anything outside these tags is ignored. If prolog is executed several times with the same argument, only the first one is evaluated. This saves parsing time when a template containing StackFunctions code is used many times on the same page: then you can store definitions (for instance, macros and dictionaries) on a separate page which is parsed only once. Note that for reasons of performance, "same" argument means literal identity; two different arguments which refer to the same page are recognized as different.


 * You can set the parameter <tt>$wfStackFunctionsPrologNamespace</tt> to set a default namespace where prolog pages are searched (if no explicit namespace is specified). It defaults to the project namespace. I recommand to create a custom namespace for prologs.


 * Prolog pages are searched in the project namespace by default if no explicit namespace is specified.


 * Prolog pages may contain precompiled code instead of source code.

Template Evaluation

 * parsetemplate : simple string parsetemplate string array string parsetemplate string dict string parsetemplate string
 * In the first form, simple denotes an argument of any type which is neither an array nor a dictionary. As a result, the string <tt> </tt> is passed to the parser for template substitution and the result pushed on the stack.
 * In the second form, the same is done with <tt> </tt> where any0 .. anyn are the elements of the array.
 * In the third form, the same is done with <tt> </tt> where key0 .. keyn are the keys and val0 .. valn the corresponding values in the dictionary.
 * In all three forms, the result is eveluated by the parser before execeution of StackFunctions code continues. This means that you can examine the result to see what is substituted. Note that parsetemplate also works with parser functions.


 * showtemplate : simple string showtemplate – array string showtemplate – dict string showtemplate –
 * This works the same way as parsetemplate; the difference is that the result, instead of being pushed onto the stack, is written to the output.

Database Querying
Database querying allows you to access directly the database MediaWiki runs on. This might be not particularly useful on the basic MediaWiki installation. It becomes interesting when you want to display other data stored in the same database and accessible to the MediaWiki database user, or in connection with MediaWiki extensions which store data in additional tables (like DataTable).

Note that this might be a security issue. Most data in the database are accessible anyway, but for instance, the email addresses of registered users might not be meant to be accessible to anybody.


 * query : dict query array true
 * or false
 * Query the MediaWiki Database with a SQL statement. The dictionary may contain the following keys:


 * Hence, the only required key is /from. The result is returned as an array of rows, where each row is either an array, a dictionary (where keys are column names) or a simple type, depending on the value of /return. Note that in the latter case only the first column is considered, and it is converted to the requested type.

The System Dictionary
As in PostScript, the system dictionary contains the definitions of all built-in operators.

The Status Dictionary

 * For each magic word in MediaWiki, there is a key in the status dictionary containing the current value of this magic word. For instance,

statusdict /pagename get


 * supplies the current page name.


 * There are additional entries pageid and namespaceid containing the numeric IDs of the current page and its namespace. This is useful when querying the database for something related to the current page. However, note that the database query might supply IDs as integers or strings; as StackFunctions are strictly typed, a string will not be recognized as equal to the statusdict items pageid or namespaceid which are integers. The safest solution is to always convert IDs queried from the database to integers before doing any comparison.

Precompiled Code
You can use the serialize operator to convert a procedure to a string representation. Converting the string back to the object is much faster than creating the object by parsing the source code. Therefore, within <tt>&lt;sf&gt;..&lt;/sf&gt;</tt> and within prologs, the following syntax is accepted as an alternative to source code:

<tt>
 * %Z
 * output of the serialize operator

</tt>

For instance, you can store the actual source code for a prolog on a talk page for the actual prolog page, transform the code into a procedure by putting braces around it, apply the serialize operator and display the result using the pstack operator. Then you copy the displayed result into the actual prolog page.

While the serialize operator works with any kind of argument, this syntax works only when the argument is a procedure.

Note that the result depends on the internal representation of data within StackFunctions. You might need to recreate the serialized code when updating to future versions of StackFunctions.

Design Considerations

 * Remain as close as possible to the PostScript programming language. : As that language has been developed and used for many years, it has achieved a high degree of conceptual soundness and completeness. My personal experience until now confirms that anything you need can be expressed with the given operators and underlying concepts.


 * A drawback is that some parts, first of all string handling, are rather cumbersome while it would be easy to create something much simpler with php. But is it less easy to do this in a sound and complete way that is easy to understand, to document and to use. As a compromise, future versions of StackFunctions are likely to contains additional string functions for purposes like concatenation, replacement and such.


 * Create an extension that executes fast. : As StackFunctions are an interpreter run itself in an interpreter language, it is a priori slow. As my personal experience tells that the usefulness of a MediaWiki installation heavily depends on the time it takes to display pages, I tried to write StackFunctions in a way to execute as fast as possible; suggestions for further enhancement are particularly welcome. This implies relatively poor debugging support (for instance, when an error occurs, the stack is shown after the error has already occurred, so the arguments which caused the error are not shown any more).

Representation of Data
Booleans numbers, strings and null are implemented with the corresponding PHP types. Arrays, dictionaries, marks, names and built-in operators are represented by arrays where the component "t" shows the type, the component "v" the value (an array or string), and the component "a" in some cases additional arguments.