User:9cfilorux/syntaxhighlight

Introduction
Syntax-highlighting works by detecting various types of strings and characters and assigning spans with different classes to them; the span classes call for sets of colours that vary depending on the language. There are several subsets of classes beginning with the same two characters; these subsets often appear to be categories of similar aspects of syntax, and the characters in their names appear to be derived from their functions. Some classes have seemingly universal meaning, such as  and  ; one, , is always used in the same way--for specific highlighted lines--and always has the same background colour : it will not be discussed further.

This page attempts to document the mechanics of Extension:SyntaxHighlight GeSHi - that is, the exact code involved in highlighting different parts of the syntax. I'm sure this is already documented somewhere, but I doubt it's on mediawiki.org and I have no idea where else I would look for it.

It's a thoroughly incomplete work in progress (you'll notice the vast majority of the languages aren't covered), as well as a squirrelly mess. I may finish it someday, but I probably won't. The original point was just to have a bit of fun, anyway; any resulting usefulness is an accident.

PHP
(Maybe this should be a table...)
 * co1 co2 co3 co4 coMULTI
 * Comments (co): detects strings beginning with  (  -- at least semi-universal comment syntax, hence the name),   (co2),   (co1)
 * re0
 * Variable names: detects strings beginning with
 * br0
 * Bracket characters: [] (brackets), (parentheses), {} (braces) The meaning of , when it is present, appears to be universal. As such, its usage will not be discussed further where it appears.
 * st0 st_h
 * Values for variables:  detects strings enclosed by '',   detects strings enclosed by ""
 * sy0 sy1
 * seems to highlight all punctuation characters other than ones that take different classes. The usage of  is unclear.
 * kw1 kw2 kw3 kw4
 * Various words with different meanings -- haven't yet figured out how to categorise them
 * es0 es1 es2 es3 es4
 * Stuff in double quotes - first four are for certain things that begin with backslashes (various letters and numbers, respectively); es4 is for variables inside quotes
 * nu0 nu8 nu12 nu19
 * Numbers:  is for ordinary number strings,   is for numbers that require commas (detects strings of digits that contain a comma for every three digits; commas are marked with  ),   is for number strings containing decimal points (must have only one point)

Javascript

 * nu0
 * Numbers: used for all numbers, unlike in PHP where different classes are used when decimal points or commas are involved
 * st0
 * Anything inside single quotes
 * br0
 * Same as in PHP
 * sy0
 * Ditto, except that Javascript also uses it for ^ whereas PHP doesn't
 * kw1 kw2 kw3
 * kw1 is used for words such as var, function, do and if; kw2 is used for true and false; kw3 is unknown
 * co1 co2 coMULTI
 * Comments, much as in PHP -- co1 is slashes, coMULTI is slashes and asterisks
 * me1
 * Words following a full stop
 * es0
 * Used for all characters that have a meaning in PHP when preceded by a backslash and contained within double quotes. No distinction is made for numbers.

Bash

 * co0 co1 co2 co3 co4
 * co0 is for sharp signs
 * re0 re1 re2 re4 re5
 * re1 is strings beginning with $; re5 is strings beginning with -
 * br0
 * Universal class that will not be discussed further (except in languages such as Brainfuck).
 * st0 st_h
 * Same as in PHP
 * sy0
 * Several punctuation characters -- / ! @ * % &
 * kw1 kw2 kw3
 * Various functions -- respectively, function, do and if; git, clone and bash; fg
 * es1 es2 es3 es4
 * es1 appears to be used as in PHP; es2 appears to be used like es4 in PHP (i.e. dollar-sign strings (variables?) inside double quotes)

CSS

 * co1 co2 coMULTI
 * co1 is for at-rules (strings beginning with @); coMULTI has its usual meaning
 * re0 re1 re2 re3
 * re0 is for ids (strings beginning with #); re1 is for classes (beginning with .); re2 is for pseudo-elements (beginning with :); both must be followed by a { character for the classes to be assigned
 * st0
 * Strings inside single-quotes and double-quotes (they have the same meaning)
 * sy0
 * Highlights punctuation characters that have meaning, as always -- ^ * + : ; > (note: not <)
 * kw1 kw2
 * kw1 is attributes that can be set such as content (highlighted with kw1 and not re0 even when followed by a #), color, background-image, etc (though not all attributes are recognised as such; the subset is smaller than it should be): kw2 is various standard values for attributes, such as  and
 * es0 es2
 * es2 is for certain characters preceded by backslashes inside quotes, similarly to PHP and Bash though more limited
 * nu0
 * Numbers

Lua

 * co1 co2 coMULTI
 * co1 is comments between pairs of hyphens (--), which appear to be the only usable comment syntax in Lua. This begs the question of what co2 and coMULTI could possibly be used for.
 * br0
 * st0
 * Text inside single and double quotes
 * sy0
 * kw1 kw2 kw3 kw4
 * kw1 is for functions such as do, function, if, elseif; kw3 is for functions such as print; kw4 is unknown
 * es0 es1 es2
 * nu0

Brainfuck

 * co1
 * Comments: any character is a comment except + - < > ., [ ] (there really ought to be others, though, seeing as how there're so many classes. innit?)
 * br0
 * This is used only for [], unlike the vast majority of other languages
 * st0
 * sy0 sy1 sy2 sy3 sy4
 * sy0 is + -; sy2 is < >; sy3 is . ,

C

 * co1 co2 coMULTI
 * co1 is strings preceded by two slashes; co2 is strings preceded by a sharp sign
 * br0
 * st0
 * Both single and double quotes
 * sy0
 * A variety of punctuation characters
 * kw1 kw2 kw3 kw4
 * kw1 and kw2 are for certain function words
 * es0 es1 es2 es3 es4 es5
 * es1 is for certain letters and other characters preceded by backslashes inside quotes; es5 is for numbers in such a condition
 * nu0 nu6 nu8 nu12 nu16 nu17 nu18 nu19
 * nu0 is for ordinary numbers (including those with commas); nu16 is for decimals