Extension:StackFunctions/Reference

From MediaWiki.org
Jump to navigation Jump to search

Implemented PostScript operators[edit]

Chapter 8.1 of the PostScript Language Reference, third edition gives a summary of PostScript operators by category. The following are implemented in StackFunctions:

  • Operand Stack Manipulation Operators: all.
  • Arithmetic and Math Operators: all except rrand.
  • Array Operators: all.
  • Dictionary Operators: all except maxlength, errordict, $error, globaldict.
  • String Operators: all.
  • Relational, Boolean, and Bitwise Operators: all.
  • Control Operators: all except stop, stopped, countexecstack, execstack, quit, start.
  • Type, Attribute, and Conversion Operators: all except executeonly, noaccess, readonly, rcheck, wcheck, cvrs.
  • Miscellaneous Operators: all except executive, echo, prompt.

In addition, the following operators are implemented:

  • show simply outputs its argument to the MediaWiki parser. In other words, the argument of show must be wikitext (not html), which may contain any kind of wiki features, including templates etc.

Differences to PostScript[edit]

The implementation basically follows PostScript concepts. In particular, almost all typechecks have been implemented as in PostScript. Furthermore, the concept that composite objects on the stack are references has been implemented just as in PostScript. For instance, operators like dup create a new reference to the same composite object rather than copying a value.

The following is a complete list of differences I'm currently aware of.

Unsupported features[edit]

Some things are not (yet) supported because it would require some effort to implement them while I consider them less useful for the MediaWiki developer:

  • Literal string objects can be specified as (..) or <..> only. ASCII base-85 data, enclosed in <~ and ~> is currently not supported.
  • Radix numbers, such as 8#1777 16#FFFE 2#1000, are not supported.
  • The executable attribute has been implemented for arrays, names and strings only, and the readable/writable attributes have not been implemented at all. I think there is little point in such attributes for MediaWiki programming, and the only way to implement them would have been to represent any kind of data (including numbers) as arrays in PHP. This would have made the StackFunctions code significantly larger and slower.

Differently implemented features[edit]

There are a few things I implemented differently from PostScript because I believe this way they fit much better the needs of the MediaWiki developer:

  • The whole implementation supports multibyte character sets.
  • The show operator accepts any kind of argument (even though only strings and numbers provide useful results).
  • The string versions of get and put accept one-character strings instead of ASCII numbers. Otherwise it would be difficult to cope with multibyte characters.
  • The realtime and usertime parameters return floats instead of integers, thus providing higher precision. As most operators don't distinguish between integers and floats, the deviation from the PostScript standard is minimal.

Some differences are due to the nature of PHP which is different from what a PostScript engine needs:

  • The executable attribute is part of the object itself except for strings. For instance, if you make an array executable using the cvx operator, any other reference to the array will become executable as well. To implement this differently, all PHP code which copies objects would need to behave type-dependent, making the whole code much larger and slower. This does not apply to strings because strings and executable strings are stored differently (see Internals).
  • Test for equality of composite objects checks whether the elements contain the same values, not whether they refer to the same object. I wouldn't know how to implement the latter in PHP.
  • Substrings are not part of string objects, but independent objects. This implies, for instance, that the copy operator for strings leaves on the stack a new string rather than a substring of the original string. I don't know any way to efficiently implement the PostScript behaviour.
  • The operator serialnumber returns the static member ExtStackFunctions::$mSerialNumber which defaults to an empty string and can be set to anything useful in LocalSettings.php.
  • For efficiency, the loop operators for, forall, loop, repeat perform a bind on their procedure argument. If other copies of the procedure exist on the stack or in some dictionary, they will reflect this. I cannot imagine a reasonable application where this behaviour would cause a problem.

Additional exceptions[edit]

recursionoverflow 
Thrown in case of infinite recursions like
/x { x } def x

Additional operators[edit]

Strings[edit]

String handling in PostScript is cumbersome, so there is a need to add some operators. However, there is a danger of adding a large collection of partially redundant operators which wouldn't ease programming, either. Therefore, my current strategy is to add an operator only when I'm very sure I really need it.

concat 
string string concat string
Concatenate two strings.
dbkey2text 
string dbkey2text string
Convert the DB key form of a title (with underscores) to its text representation (with spaces).
explode 
separator string explode array
Wrapper for PHP's explode function.
getpagecontent 
string getpagecontent string
Get the raw content of the page indicated in string. Throw an exception if the page does not exist.
id2namespace 
int id2namespace string
Convert namespace id to canonical name.
implode 
separator array implode string
Wrapper for PHP's implode function.
namespace2id 
string namespace2id int
Convert canonical name to namespace id. This uses Namespace::getCanonicalIndex, hence the input string must be lowercase.
pcrematch 
subject_string pattern_string prcematch post match pre true (if found)
subject_string pattern_string prcematch string false (if not found)
Search pattern_string in subject_string, interpreting pattern_string as a Perl Compatible Regular Expression. The return values are the same as for the PostScript operator search.
pcrereplace 
subject_string pattern_string to_string pcrereplace
Replace all occurrences of pattern_string with to_string in subject_string, interpreting pattern_string as a Perl Compatible Regular Expression.
replace 
subject_string from_string to_string replace
Replace all occurrences of from_string with to_string in subject_string
text2dbkey 
string text2dbkey string
Convert the text form of a title (with spaces) to its DB key representation (with underscores).
tolower 
string tolower string
Convert string to lowercase.
toupper 
string toupper string
Convert string to uppercase.
vprintf 
string array vprintf
Format the data in array according to the format string and show the result.
vsprintf 
string format_string array vprintf string
Format the data in array according to format_string and store the result in string.

Serialization[edit]

serialize 
any serialize string
Provide a string representation of the argument. The argument is first serialized with the PHP function serialize and compressed with a configurable function, then an HMAC is prepended, and the result is converted to a printable representation using base64_encode. The result is hence an authenticated compressed image of the argument consisting entirely in printable characters. This is useful mainly to generate precompiled code.
unserialize 
string unserialize any
Convert a result of the serialize operator back to its original object. If the HMAC is not valid, an invalidaccess exception occurs. This ensures that no PHP code from extraneous sources can be executed in your MediaWiki instance.

The following parameters for serialization can be customized in LocalSettings.php:

Parameter Default Purpose
ExtStackFunctions::$mAuthKey $wgSecretKey Key used for HMAC generation
ExtStackFunctions::$mCompress gzdeflate Compression function
ExtStackFunctions::$mCompressArg 9 Additional argument for compression
ExtStackFunctions::$mDecompress gzinflate Decompression function

gzinflate has been chosen as a default because it seems to be slightly faster than gzuncompress or bzdecompress, but this might be different in your case.

Prologs[edit]

prolog 
string prolog
Execute StackFunctions code stored in the page indicated in string. The page should contain exactly one <pre>..</pre> pair containing the StackFunctions code; anything outside these tags is ignored. If prolog is executed several times with the same argument, only the first one is evaluated. This saves parsing time when a template containing StackFunctions code is used many times on the same page: then you can store definitions (for instance, macros and dictionaries) on a separate page which is executed only once. Note that for reasons of performance, "same" argument means literal identity; two different arguments which refer to the same page are recognized as different. Due to its nature, a prolog should not create any output; therefore any output created by a prolog is silently discarded.
You can set the parameter ExtStackFunctions::$mPrologNamespace to set a default namespace where prolog pages are searched (if no explicit namespace is specified). It defaults to the project namespace. I recommand to create a custom namespace for prologs; see Extension:StackFunctions/Install.
Prolog pages may contain precompiled code instead of source code.

Template evaluation[edit]

parsetemplate 
simple string parsetemplate string
array string parsetemplate string
dict string parsetemplate string
In the first form, simple denotes an argument of any type which is neither an array nor a dictionary. As a result, the string {{string|simple}} is passed to the parser for template substitution and the result pushed on the stack.
In the second form, the same is done with {{string|any0|..|anyn}} where any0 .. anyn are the elements of the array.
In the third form, the same is done with {{string|key0=val0|..|keyn=valn}} where key0 .. keyn are the keys and val0 .. valn the corresponding values in the dictionary.
In all three forms, the result is eveluated by the parser before execution of StackFunctions code continues. This means that you can examine the result to see what is substituted. Note that parsetemplate also works with parser functions instead of templates.
showtemplate 
simple string showtemplate
array string showtemplate
dict string showtemplate
This works the same way as parsetemplate; the difference is that the result, instead of being pushed onto the stack, is written to the output.

Note that parsing of templates is a very complex process and therefore rather slow, especially when the templates contain other templates. One of the motivations of developing StackFunctions was to provide a more performant alternative. Therefore, rather then evaluating a template, you might consider replacing it with StackFunctions code wherever feasible. For instance, Magic words can directly be read from the status dictionary.

Cache[edit]

disablecache 
disablecache
Disables the cache for this page, which means that the page will recalculated on every access. Useful for pages whose contents are meant to depend on rapidly changing data like random numbers or time of the day.

Database querying[edit]

Database querying allows you to access directly the database MediaWiki runs on. This might be not particularly useful on the basic MediaWiki installation. It becomes interesting when you want to display other data stored in the same database and accessible to the MediaWiki database user, or in connection with MediaWiki extensions which store data in additional tables (like DataTable).

Note that this is likely to raise security concerns. Most data in the database are accessible anyway, but for instance, the email addresses of registered users might not be meant to be accessible to anybody. Therefore, this feature is disabled by default. To enable it, set ExtStackFunctions::$mEnableQuery = true; in your LocalSettings.php.

query 
dict query array true
or false
Query the MediaWiki Database with a SQL statement. The dictionary may contain the following keys:
/select columns to select, default *
/from tables to select from
/where where clause (optional)
/groupby group by clause (optional)
/having having clause (optional)
/orderby order by clause (optional)
/return return type (optional, default /dicttype)
Hence, the only required key is /from. The result is returned as an array of rows, where each row is either an array, a dictionary (where keys are column names) or a simple type, depending on the value of /return. Note that in the latter case only the first column is considered, and it is converted to the requested type.

The system dictionary[edit]

As in PostScript, the system dictionary contains the definitions of all built-in operators.

The status dictionary[edit]

  • For each magic word in MediaWiki, there is a key in the status dictionary containing the current value of this magic word. For instance,
statusdict /pagename get
supplies the current page name.
  • There are additional entries pageid and namespaceid containing the numeric IDs of the current page and its namespace. This is useful when querying the database for something related to the current page. However, note that the database query might supply IDs as integers or strings; as StackFunctions are strictly typed, a string will not be recognized as equal to the statusdict items pageid or namespaceid which are integers. The safest solution is to always convert IDs queried from the database to integers before doing any comparison.
  • Furthermore, there is a key args. In the parser function syntax, it refers to an array containing the additional parameters. In the tag syntax, it refers to a dictionary containing the parameters given as parameter=value in the opening tag.

Precompiled code[edit]

You can use the serialize operator to convert a procedure to a string representation. My tests show that converting the string back to the object is about three times faster than creating the object by parsing the source code (and it seems that compression contributes to this gain). Therefore, the following syntax is accepted as an alternative to source code in prologs:

%Z
output of the serialize operator

For instance, you can store the actual source code for a prolog on a subpage for the actual prolog page, transform the code into a procedure by putting braces around it, apply the serialize operator and display the result using the pstack operator. Then you copy the displayed result into the actual prolog page.

While the serialize operator works with any kind of argument, this syntax works only when the argument is a procedure.

Note that the result depends on the internal representation of data within StackFunctions as well as on your compression algorithm and ExtStackFunctions::$mAuthKey. This implies that the code is portable between MediaWiki installations only if the have the same PHP, MediaWiki and StackFunctions versions and the same ExtStackFunctions::$mAuthKey. You should recreate the serialized code when updating to future versions of MediaWiki or StackFunctions.

Exceptions[edit]

When exceptions occur, a debug information vaguely similar to ghostscript is shown, for instance:

StackFunctions error:  typecheck  in  pop_num

Operand(s):  (a)

Operand Stack:

(objects)
(test)
(any)

Backtrace:  sf_tag  execute  op_add  pop_num

Dictionary Stack:  -dict:119-  -dict:0-

The meaning of the elements is as follows.

error 
The error name as in PostScript.
in 
The php function where the error occurred. This is not necessarily a function directly related to an operator, it can also be an auxiliary function. For instance, many arithmetic operators call pop_num to pop a number from the stack or raise an exception if there is no number.
Operand(s) 
The operand(s) that immediately triggered the exception.
Operand Stack 
The operand(s) still on the stack when the exception occurred. This does never include the just mentioned operand(s) that triggered the exception. Note that there can be operands involved which are not displayed at all; for instance, in the above example, the string (a) was the second argument to the add operator. The first is not in the list of operand(s) that immediately triggered the exception because it was OK, and neither on the stack because it has already been consumed.
Backtrace 
The backtrace of PHP function calls, starting from the first function belonging to StackFunctions. This backtrace definitely contains the operator.
Dictionary Stack 
The dictionary stack at the time when the exception occurred.