User:9cfilorux/syntaxhighlight

Introduction
Syntax-highlighting works by detecting various types of strings and characters and assigning spans with different classes to them; the span classes call for sets of colours that vary depending on the language. There are several subsets of classes beginning with the same two characters; these subsets often appear to be categories of similar aspects of syntax, and the characters in their names appear to be derived from their functions. Some classes have seemingly universal meaning, such as  and  ; one, , is always used in the same way--for specific highlighted lines--and always has the same background colour : it will not be discussed further.

This page attempts to document the mechanics of Extension:SyntaxHighlight GeSHi - that is, the exact code involved in highlighting different parts of the syntax. I'm sure this is already documented somewhere, but I doubt it's on mediawiki.org and I have no idea where else I would look for it.

It's a thoroughly incomplete work in progress (you'll notice the vast majority of the languages aren't covered), as well as a squirrelly mess. I may finish it someday, but I probably won't. The original point was just to have a bit of fun, anyway; any resulting usefulness is an accident.

PHP
(Maybe this should be a table...)
 * co1 co2 co3 co4 coMULTI
 * Comments (co): detects strings beginning with  (  -- at least semi-universal comment syntax, hence the name),   (co2),   (co1)
 * re0
 * Variable names: detects strings beginning with
 * br0
 * Bracket characters: [] (brackets), (parentheses), {} (braces) The meaning of , when it is present, appears to be universal. As such, its usage will not be discussed further where it appears.
 * st0 st_h
 * Values for variables:  detects strings enclosed by '',   detects strings enclosed by ""
 * sy0 sy1
 * seems to highlight all punctuation characters other than ones that take different classes. The usage of  is unclear.
 * kw1 kw2 kw3 kw4
 * Various stuff involving variables; kw1 is for require_once, if, do and other such stuff, kw2 is for function, var and whatnot, kw3 is for array and kw4 is for true/false
 * es0 es1 es2 es3 es4
 * Stuff in double quotes - first four are for certain things that begin with backslashes (various letters and numbers, respectively); es4 is for variables inside quotes
 * nu0 nu8 nu12 nu19
 * Numbers:  is for ordinary number strings,   is for numbers that require commas (detects strings of digits that contain a comma for every three digits; commas are marked with  ),   is for number strings containing decimal points (must have only one point)

Javascript

 * nu0
 * Numbers: used for all numbers, unlike in PHP where different classes are used when decimal points or commas are involved
 * st0
 * Anything inside single quotes
 * br0
 * Same as in PHP
 * sy0
 * Ditto, except that Javascript also uses it for ^ whereas PHP doesn't
 * kw1 kw2 kw3
 * kw1 is used for words such as var, function, do and if; kw2 is used for true and false; kw3 is unknown
 * co1 co2 coMULTI
 * Comments, much as in PHP -- co1 is slashes, coMULTI is slashes and asterisks
 * me1
 * Words following a full stop

Bash

 * co0 co1 co2 co3 co4
 * co0 is for sharp signs
 * re0 re1 re2 re4 re5
 * re1 is strings beginning with $
 * br0
 * Universal class that will not be discussed further (except in languages such as Brainfuck).
 * st0 st_h
 * Same as in PHP
 * sy0
 * Several punctuation characters -- / ! @ * % &
 * kw1 kw2 kw3
 * Various functions -- respectively, function, do and if; git, clone and bash; fg
 * es1 es2 es3 es4
 * es1 appears to be used as in PHP; es2 appears to be used like es4 in PHP (i.e. dollar-sign strings (variables?) inside double quotes)

CSS

 * co1 co2 coMULTI
 * co1 is for at-rules (strings beginning with @); coMULTI has its usual meaning
 * re0 re1 re2 re3
 * re0 is for ids (strings beginning with #); re1 is for classes (beginning with .); re2 is for pseudo-elements (beginning with :); both must be followed by a { character for the classes to be assigned
 * st0
 * Strings inside single-quotes and double-quotes (they have the same meaning)
 * sy0
 * Highlights punctuation characters that have meaning, as always -- ^ * + : ; > (note: not <)
 * kw1 kw2
 * kw1 is attributes that can be set such as content (highlighted with kw1 and not re0 even when followed by a #), color, background-image, etc (though not all attributes are recognised as such; the subset is smaller than it should be): kw2 is various standard values for attributes, such as  and
 * es0 es2
 * Unknown
 * nu0
 * Numbers

Lua

 * co1 co2 coMULTI
 * br0
 * st0
 * sy0
 * kw1 kw2 kw3 kw4
 * es0 es1 es2
 * nu0

Brainfuck

 * co1
 * br0
 * st0
 * sy0 sy1 sy2 sy3 sy4

C

 * co1 co2 coMULTI
 * br0
 * st0
 * sy0
 * kw1 kw2 kw3 kw4
 * es0 es1 es2 es3 es4
 * nu0