User:9cfilorux/syntaxhighlight

''This page attempts to document the mechanics of Extension:SyntaxHighlight GeSHi - that is, the exact code involved in highlighting different parts of the syntax. I'm sure this is already documented somewhere, but I doubt it's on mediawiki.org and I have no idea where else I would look for it.''

''It's a thoroughly incomplete work in progress (you'll notice the vast majority of the languages aren't covered), as well as a squirrelly mess. I may finish it someday, but I probably won't. The original point was just to have a bit of fun, anyway; any resulting usefulness is an accident.''

Introduction
Syntax-highlighting works by detecting various types of strings and characters and assigning spans with different classes to them; the span classes call for sets of colours that vary depending on the language. There are several subsets of classes beginning with the same two characters; these subsets often appear to be categories of similar aspects of syntax, and the characters in their names appear to be derived from their functions. Some classes have seemingly universal meaning, such as  and  ; one,   is always used in the same way--for specific highlighted lines--and always has the same background colour : it will not be discussed further.

PHP
(Maybe this should be a table...)
 * co1 co2 co3 co4
 * Comments (co): detects strings beginning with  (  -- at least semi-universal comment syntax, hence the name),   (co2),   (co1)
 * re0
 * Variable names: detects strings beginning with
 * br0
 * Bracket characters: [] (brackets), (parentheses), {} (braces) The meaning of , when it is present, appears to be universal. As such, its usage will not be discussed further where it appears.
 * st0 st_h
 * Values for variables:  detects strings enclosed by '',   detects strings enclosed by ""
 * sy0 sy1
 * seems to highlight all punctuation characters other than ones that take different classes. The usage of  is unclear.
 * kw1 kw2 kw3 kw4 (various stuff involving variables; kw1 is for require_once, if, do and other such stuff, kw2 is for function, var and whatnot, kw3 is for array and kw4 is for true/false)
 * es0 es1 es2 es3 es4 (Stuff in double quotes - first four are for certain things that begin with backslashes (various letters and numbers, respectively); es4 is for variables inside quotes)
 * nu0 (numbers)

Javascript

 * nu0 (numbers?)
 * st0 (variable thingies?)
 * br0 (same as in PHP)
 * sy0 (same)
 * kw1 kw2 kw3 (various function things)
 * co1 co2 coMULTI (comments)
 * me1 (something about stuff that comes after a full stop)

Bash

 * co1 co2 co3 co4
 * re0
 * br0
 * st0 st_h
 * sy0 (slashes)
 * kw1 kw2 kw3
 * es1 es2 es3 es4

CSS

 * co1 co2 coMULTI (co1 is for @-rules, co2 is carbon dioxide and coMULTI is comments like usual)
 * re0 re1 re2 re3 (stuff with sharpsigns, i.e. ids and colours; classes; pseudo-classes; who even knows)
 * br0 (brackets)
 * st0 (quotey things)
 * sy0 (colons and intestines and stuff)
 * kw1 kw2 (thingy classes you can set; url and whatnot)
 * es0 es2 (bleh)
 * nu0 (numbers like always yeah)

Lua

 * co1 co2 coMULTI
 * br0
 * st0
 * sy0
 * kw1 kw2 kw3 kw4
 * es0 es1 es2
 * nu0

Brainfuck

 * co1
 * br0
 * st0
 * sy0 sy1 sy2 sy3 sy4

C

 * co1 co2 coMULTI
 * br0
 * st0
 * sy0
 * kw1 kw2 kw3 kw4
 * es0 es1 es2 es3 es4
 * nu0