User:9cfilorux/syntaxhighlight

From mediawiki.org

Introduction[edit]

Syntax-highlighting works by detecting various types of strings and characters and assigning spans with different classes to them; the span classes call for sets of colours that vary depending on the language. There are several subsets of classes beginning with the same two characters; these subsets often appear to be categories of similar aspects of syntax, and the characters in their names appear to be derived from their functions. Some classes have seemingly universal meaning, such as br0 and coMULTI; one, ln-xtra, is always used in the same way--for specific highlighted lines--and always has the same background colour (#ffc): it will not be discussed further.

This page attempts to document the mechanics of Extension:SyntaxHighlight GeSHi - that is, the exact code involved in highlighting different parts of the syntax. I'm sure this is already documented somewhere, but I doubt it's on mediawiki.org and I have no idea where else I would look for it.

It's a thoroughly incomplete work in progress (you'll notice the vast majority of the languages aren't covered), as well as a squirrelly mess. I may finish it someday, but I probably won't. The original point was just to have a bit of fun, anyway; any resulting usefulness is an accident.

PHP[edit]

(Maybe this should be a table...)

  • co1 co2 co3 co4 coMULTI
    Comments (co): detects strings beginning with /* (coMULTI -- at least semi-universal comment syntax, hence the name), # (co2), // (co1)
  • re0
    Variable names: detects strings beginning with $
  • br0
    Bracket characters: [] (brackets), () (parentheses), {} (braces)
    The meaning of br0, when it is present, appears to be universal. As such, its usage will not be discussed further where it appears.
  • st0 st_h
    Values for variables: st0 detects strings enclosed by '', st_h detects strings enclosed by ""
  • sy0 sy1
    sy0 seems to highlight all punctuation characters other than ones that take different classes. The usage of sy1 is unclear.
  • kw1 kw2 kw3 kw4
    Various words with different meanings -- haven't yet figured out how to categorise them
  • es0 es1 es2 es3 es4
    Stuff in double quotes - first four are for certain things that begin with backslashes (various letters and numbers, respectively); es4 is for variables inside quotes
  • nu0 nu8 nu12 nu19
    Numbers: nu0 is for ordinary number strings, nu8 is for numbers that require commas (detects strings of digits that contain a comma for every three digits; commas are marked with sy0), nu19 is for number strings containing decimal points (must have only one point)
  • me1 me2
    Text following two colons

Example usage (PHP)[edit]

//This line is highlighted so it takes class ln-xtra
<?php #This string takes class kw2, and if you put it partway through the text the part above it won't get highlighted
#This is a comment with a sharp sign so it takes class co2
/* This comment is enclosed by slashes and asterisks so it takes class coMULTI */
$foo #This takes re0
'This string is in single quotes and takes class st_h'
"This string is in double quotes and takes class st0"
"$foo" #This takes es4
7878 #These numbers take class nu0
0.9 #This takes nu19
000,000,000 #This takes nu8
?> #This string also takes kw2 but if you put text after it it won't be highlighted
<? #so we need another thingy
require_once if do else elseif echo #These strings take kw1
function var global #These strings take kw2
array die isset #These use kw3
true false null __FILE__ #These use kw4
{} [] () #These take br0
!@%&*+=- #These take sy0
"\t takes class es1 when it is inside double quotes"
"So do \\, \f, \n, \r, \v, \" and \$"
"\1, \2, \3, \4, \5, \6, \7 and \0 take es3"
::This takes me2

Javascript[edit]

  • nu0
    Numbers: used for all numbers, unlike in PHP where different classes are used when decimal points or commas are involved
  • st0
    Anything inside single quotes
  • br0
    Same as in PHP
  • sy0
    Ditto, except that Javascript also uses it for ^ whereas PHP doesn't
  • kw1 kw2 kw3
    kw1 is used for words such as var, function, do and if; kw2 is used for true and false; kw3 is unknown
  • co1 co2 coMULTI
    Comments, much as in PHP -- co1 is slashes, coMULTI is slashes and asterisks
  • me1
    Words following a full stop
  • es0
    Used for all characters that have a meaning in PHP when preceded by a backslash and contained within double quotes. No distinction is made for numbers.

Example usage (Javascript)[edit]

//This comment uses co1
/* This comment uses coMULTI */
23928 2.1 222,111 // These use nu0
var function do if // These use kw1
() [] {} //These use br0
;=^:%! //These use sy0
'This uses st0'
"So does this"
.This.uses.me1
true false //These use kw2
"\t takes class es0 when it is inside double quotes"
"So do \\, \f, \n, \r, \v, \", \$, \1, \2, \3, \4, \5, \6, \7 and \0"

Bash[edit]

  • co0 co1 co2 co3 co4
    co0 is for sharp signs
  • re0 re1 re2 re4 re5
    re1 is strings beginning with $; re5 is strings beginning with -
  • br0
    Universal class that will not be discussed further (except in languages such as Brainfuck).
  • st0 st_h
    Same as in PHP
  • sy0
    Several punctuation characters -- / ! @ * % &
  • kw1 kw2 kw3
    Various functions -- respectively, function, do and if; git, clone and bash; fg
  • es1 es2 es3 es4
    es1 appears to be used as in PHP; es2 appears to be used like es4 in PHP (i.e. dollar-sign strings (variables?) inside double quotes)

Example usage (Bash)[edit]

#This takes co0
git clone bash mv #These take kw2
/!@*%& #These take sy0
fg #This takes kw3
function do if #These take kw1
'This takes st_h'
"This takes st0"
$foo #This takes re1
{}[]() #These take br0
"\t takes class es1 when it is inside double quotes"
"So do \f, \n, \r, \v, \" and \$"
"$This takes es2"
-foo #This takes re5

CSS[edit]

  • co1 co2 coMULTI
    co1 is for at-rules (strings beginning with @); coMULTI has its usual meaning
  • re0 re1 re2 re3
    re0 is for ids (strings beginning with #); re1 is for classes (beginning with .); re2 is for pseudo-elements (beginning with :); both must be followed by a { character for the classes to be assigned
  • st0
    Strings inside single-quotes and double-quotes (they have the same meaning)
  • sy0
    Highlights punctuation characters that have meaning, as always -- ^ * + : ; > (note: not <)
  • kw1 kw2
    kw1 is attributes that can be set such as content (highlighted with kw1 and not re0 even when followed by a #), color, background-image, etc (though not all attributes are recognised as such; the subset is smaller than it should be): kw2 is various standard values for attributes, such as url and block
  • es0 es2
    es2 is for certain characters preceded by backslashes inside quotes, similarly to PHP and Bash though more limited
  • nu0
    Numbers

Example usage (CSS)[edit]

/* This takes coMULTI */
@This takes co1
1234 /* This takes nu0 */
#foo /* This takes re0 */
'This takes st0'
"This takes st0 too"
^*+:> /* These take sy0 */
.foo { /* This takes re1 */
:foo { /* This takes re2 */
color background background-image content display /* These take kw1 */
url block /* These take kw2 */
()[]{} /* These take br0 */
"\f, \1, \2, \3, \4, \5, \6, \7 and \0 take es2 when they are inside double quotes"

Lua[edit]

  • co1 co2 coMULTI
    co1 is comments between pairs of hyphens (--), which appear to be the only usable comment syntax in Lua. This begs the question of what co2 and coMULTI could possibly be used for.
  • br0
  • st0
    Text inside single and double quotes
  • sy0
  • kw1 kw2 kw3 kw4
    kw1 is for functions such as do, function, if, elseif; kw3 is for functions such as print; kw4 is unknown
  • es0 es1 es2
  • nu0

Example usage (Lua)[edit]

-- This is a comment and takes co1 --
print -- This takes kw3 --
do function if elseif -- These take kw1 --
%^*%#.,/<> -- These take sy0 --
"This takes st0"
'This takes st0 too'
12345 -- This takes nu0 --
"\t takes class es1 when it is inside double quotes"
"So do \\, \f, \n, \r, \v and \""
"\1, \2, \3, \4, \5, \6, \7 and \0 take es2"

Brainfuck[edit]

  • co1
    Comments: any character is a comment except + - < > . , [ ] (there really ought to be others, though, seeing as how there're so many classes. innit?)
  • br0
    This is used only for [], unlike the vast majority of other languages
  • st0
  • sy0 sy1 sy2 sy3 sy4
    sy0 is + -; sy2 is < >; sy3 is . ,

Example usage (Brainfuck)[edit]

This is regular old text so it is a comment and takes co1
+- These take sy0
<> These take sy2
,. These take sy3
[] These take br0

C[edit]

  • co1 co2 coMULTI
    co1 is strings preceded by two slashes; co2 is strings preceded by a sharp sign
  • br0
  • st0
    Both single and double quotes
  • sy0
    A variety of punctuation characters
  • kw1 kw2 kw3 kw4
    kw1 and kw2 are for certain function words
  • es0 es1 es2 es3 es4 es5
    es1 is for certain letters and other characters preceded by backslashes inside quotes; es5 is for numbers in such a condition
  • nu0 nu6 nu8 nu12 nu16 nu17 nu18 nu19
    nu0 is for ordinary numbers (including those with commas); nu16 is for decimals

Example usage (C)[edit]

//This takes co1
#This takes co2
/* This takes coMULTI */
()[]{} //These take br0
!%^&*+|<>:?/ //These take sy0
'This takes st0'
"So does this"
if do //These take kw1
function //These take kw2
12345 //This takes nu0
1.2 //This takes nu16
"\t takes class es1 when it is inside quotes"
"So do \\, \f, \n, \r, \v and \""
"\1, \2, \3, \4, \5, \6, \7 and \0 take es5"