Extension Syntax

This page is used to discuss and vote on a syntax for extensions to the MediaWiki software. These include:
 * LaTeX (mathematics, formulas, chess..)
 * musical notes (Lilypond)
 * plots and timelines (gnuplot, ploticus, EasyTimeline)
 * SVG->PNG rendering
 * hieroglyphs (WikiHiero)
 * map rendering
 * 3d models rendering
 * anything else that's cool and useful

There will be a brainstorming phase of 7 days, after which voting will start on these proposals (on April 4). The voting phase will last seven days (until April 11).

= Issues =


 * Find an easy to use and intuitive syntax
 * Be consistent with already used syntax
 * Be able to use data between the tag or from on other page
 * If possible, don't forbid to use special caracters as data (ie.  may forbid to use }} as data)

= Proposals =

Erik's proposal
We should use an XML-like syntax for extensions:
 * 1) &lt;math&gt;insert code here&lt;/math&gt;
 * 2) &lt;music&gt;insert code here&lt;/music&gt;
 * 3) &lt;hiero&gt;insert code here&lt;/hiero&gt;

Long code segments could be moved into the template namespace and transcluded in the standard manner, e.g. Template:Beethoven's 9th Symphony (sample) would contain the &lt;music&gt;...&lt;/music&gt; code, and could be transcluded using.

Arguments for:
 * Many people are already familiar with XML-style syntax
 * Indeed, many people are already familiar with using $$ $$, and many articles already include this syntax.
 * It is immediately obvious over long segments of code what class a particular segment belongs to (i.e. you can look at the bottom of a music segment and know that it is music code, because there's a &lt;/music&gt; closing tag)
 * It is easy to remember
 * It is sufficiently unique to avoid parsing problems (as opposed to a short sequence of control characters, which might conflict with any present or future extensions)
 * It is easy to standardize the parser for it
 * It is wiki-like, as opposed to a more complex &lt;rend class="music"&gt; syntax, which is more suitable for programmers than users
 * It is consistent with existing uses of different kinds of brackets
 * [ is associated with links ( [ext. link] int. link ); { is associated with "transclusion" (e.g., and the up-coming  templates ); < is already associated with HTML and HTML-style formatting tags (such as  , , , , ... ). The extension syntax is marking a special formatting rule for the enclosed text, and therefore fits with the other uses of.

Arguments against:
 * In short segments of code the redundant closing tag can be annoying
 * Needs to be localized
 * Easy using Tim's MagicWord class, which also allows for synonyms, e.g. the English version could be valid in all translations to allow for easy copy & pasting &mdash; but this has the extra disadvantage of confusing users ("what's the difference between "math" and "qxyz"?) and making for inconsistency
 * Gives the false impression of being real HTML or XHTML
 * It is real XML (XML does not require a DTD, arbitrary tags can be defined).
 * I rather dislike the asociation that it's XML. We should think of it as an arbitrary syntax convention styled after XML, because otherwise we'll start thinking of it as a hierarchical element structure, which is not the case with wiki-markup in general. (In the same way, Lilypond's authors stress that it's not TeX, even though it looks similar) - IMSoP 16:38, 31 Mar 2004 (UTC)
 * In turn gives the false impression that Wikipedia allows all of HTML
 * Non-sequitur. That is true for HTML-elements, not for XML elements that aren't part of HTML.
 * Gives the false impression of nestability ( $$ x^2 =  $$ )
 * The nesting argument can be applied to any syntax that has different opening and closing tags. Nestings will be possible where they make sense, just like you used the &lt;nowiki&gt; tag to create the example above, a case of nesting. Nestings are not possible / have no effect in HTML where they do not make sense.
 * Rather hard to type
 * Easy to go wrong and mess up a whole page ( x^2 $$ )
 * We can auto-fix that.
 * Generally just as easy to spot and fix; besides, the same is true of or  or any of the other HTML/XML-like tags we already use.
 * Goes counter to the trend of instituting a wiki-like syntax for everything (like tables); if tags-that-look-like-HTML were what we are looking for, we wouldn't have needed the wiki table syntax.
 * The reason we created a table syntax is that tables consist of many, many tags (table, headings, rows, cells, all with their own closing tags). In this case there is always just a single pair of tags. There is no trend to replace "everything" with our own syntax (e.g. we do not replace HTML table parameters, we do not replace CSS and DIV tags), nor is this an argument for doing so in this instance.
 * Difficult to read, because the tags distract and don't give an intuitive visual appearance of encapsulation (parenthesisation, or whatever you call it)
 * The tags are the most widely recognized form of labeled encapsulation there is. Unlabeld encapsulation is confusing over large segments, and more difficult to learn because the brain recognizes and remembers new words easier than new symbols.
 * New extensions could potentially clash with other kinds of markup which use the same syntax
 * There are not likely to be enough extensions and enough other formatting tags that this will become a problem

Ivo's Proposal
I think the XML-Syntax like ist no good solution. You have to write a long tag often for a smal formula. I would prefer something like


 * 1) [$ formula $] and
 * 2) [$$ formular $$].

The first expression stands for normal formulars appearing inside text the other one for formulars that should appear centered in an extra-line (but are notated in that text since there belonging to it). The $-sign is the normal symbol in TeX for formulars inside text the "$$-sign" for formulars that should appear in an extra-line. Since wie use TeX to notate formulars this would be a consitent solution. Of course we can use other braces than "[" and "]".


 * How to differentiate the type of data (tex, score, hieroglyph, etc.)? A &#9774; ineko 17:15, 28 Mar 2004 (UTC)
 * Thats just the proposal for math-markup (TeX). I did'nt say anything about score, hieroglyphs, etc.

Pros : Cons :
 * And if we need ]] in markup ? Regular expression to match \[\[.*\]\] is not a "parser". Taw 18:50, 28 Mar 2004 (UTC)
 * Sorry, I did not really understand the problem, could you explain it more detailed...?

Oh, and there is another point. I would like to see a small change for the headings-markup. Instead of "=== heading ===" we should use just "=== heading" since this allways apears in an extra-line and do not need parethis (? right word ?). Its from the formating-position the same like "#" for enumeration or "*" for itemizing (its describing a complete block not text inside a block).

Magnus' proposal
I think we should stick to the existing syntax that has proven to be understandable by most users. can produce a PNG or an SVG, depending on user settings or browser identification. That can utilize goodies like thumbnail generation etc. An image is an image is an image, after all.


 * I'd like to see the extension part go away in image links. The link should be [[image:xyz]] so that whatever format happens to be best can be used.  Currently one can't change the format of an image without uploading to a new name (with the new extension) and then changing all the links.  Replacement of an image could be done with a special 'replace image' link on each image page. Audin 04:07, 30 Mar 2004 (UTC)

for more complex structures (hiero, music), I suggest or. The first variant will use "stuff" directly as data, while the second one will use the data stored in stuff. That way, a complicated timeline can get its own "article" (I suggest a "data:" namespace), while a few hieroglyphs can be entered directly.


 * Good idea I think, but ':' and '::' are easy to confuse. A &#9774; ineko 02:55, 29 Mar 2004 (UTC)


 * Agree, what about for page references ? Or  ?


 * Good. Also see my "Magnus' proposal alternative" below. A &#9774; ineko

Pros :
 * Simple syntax
 * Consistent with existing syntax
 * – parser is already in place
 * Allows for easy handling of large amounts of raw data without cluttering the article source

Cons :
 * Needs to be localized (easy using Tim's MagicWord class, which also allows for synonyms, e.g. the English version could be valid in all translations to allow for easy copy & pasting)
 * And if we need }} in markup ? Regular expression to match is not a "parser". Taw 18:50, 28 Mar 2004 (UTC)
 * Why not? I fail to see the problem here. If you want to display two curly braces in the text, put them in &lt;nowiki&gt; tags. -- Stw 10:45, 29 Mar 2004 (UTC)
 * 2 curly braces in embedded markup, not in text: ( \frac{10^{15}}{\pi} ). That's very common combination in TeX. One of the reasons why I chose  was because there's no chance in hell that  $$  would appear in math markup. Taw 01:56, 30 Mar 2004 (UTC)

Aoineko' proposals
In fact, I fully support, but like we are in a brainstorming phase, I put here some ideas.


 * I see no advantages on the 5 proposals that follow. Many people have some basic knowledge of html or at least encountered it here every now and then, so why make confusing variations on it with different brackets or different closing tag style. Erik Zachte 01:21, 31 Mar 2004 (UTC)


 * I tried to found alternative to avoid the cons express on your proposal. But it's true that use <> may be confusing for people familiar with HTML. A &#9774; ineko

Forum like

 * [math]...[/math]
 * [hiero]...[/hiero]
 * [music]...[/music]

Forum like (with no /)

 * [math]...[math]
 * [hiero]...[hiero]</tt>
 * [music]...[music]</tt>

Forum like (with same end marker)

 * [math]...[end]</tt>
 * [hiero]...[end]</tt>
 * [music]...[end]</tt>

Erik's proposal alternative 1
no / end marker
 * &lt;math&gt;...&lt;math&gt;</tt>
 * &lt;music&gt;...&lt;music&gt;</tt>
 * &lt;hiero&gt;...&lt;hiero&gt;</tt>

Erik's proposal alternative 2
alway same end marker
 * &lt;math&gt;...&lt;end&gt;</tt>
 * &lt;music&gt;...&lt;end&gt;</tt>
 * &lt;hiero&gt;...&lt;end&gt;</tt>

Magnus' proposal alternative 1
directly as data (like  </tt>)
 *  math:... </tt>
 *  hiero:... </tt>
 *  music:... </tt>

from data page (like  </tt>)
 * 

</tt>
 * <tt></tt>
 * <tt></tt>

Magnus' proposal alternative 2

 * <tt>

</tt>
 * <tt></tt>
 * <tt></tt>

Where the software check if foo is a valid page (data:foo). If true, parse the data page; If not, parse the text in tags.

Magnus' proposal alternative 3
directly as data
 * <tt>

</tt>
 * <tt></tt>
 * <tt></tt>

from data page (like #redirect )
 * <tt></tt>
 * <tt></tt>
 * <tt></tt>

Magnus' proposal alternative 4
To be able to use }} inside the code.
 * <tt>

or  </tt>
 * <tt>  or    </tt>
 * <tt>  or   </tt>

Uli's Proposal
I had suggested a thing like that some time ago in the discussion on navigation bars. As I understand, we have some issues, that should be covered:


 * Navigational data (like theme-rings, article grouping ("history of germany, part 1..)) which should not be rendered at the position they are placed in the text, but instead at a - probably skin specific - position, and possibly only in certain situations (for example, theme rings should not be rendered in a print view).
 * Defining short non-textual data within articles (Hieroglyphs), to be rendered at the position where they are placed
 * Including long non-textual data, to be rendered at the position where they are placed
 * Possibly also including long non-textual data, to be rendered at a specific position (I'm thinking at those large information tables in the upper right corner of states, cities, elements and so on)
 * Probably we want to have some sort of parameters to pass to a transcluded item

So, suppose we get a namespace "Include:". We would seperate that namespace again by convention into type-specific segments, so article fragments containing music data would to be named "Include:Music:Beethovens 9th Symphony", Tabluar data to be included into an article would be named "Include:Table:Dollar rate since 1991", the big tables (upper right corner) possibly "Include:Infotable:Uranium", a navigation bar "Include:Navlist:10 largest cities of Island" and so on. It's important to have the type of the included data somehow coded into the article name, so you can render that fragment stand-alone!

Those fragments would be included with the already disussed syntax , . For not-included data, I'd prefere the XML-type syntax ( $$$$ ) - transcluding is something different than switching the syntax within the article, so we should seperate those two use-cases.

Very important: depending on the type (Music, Infotable, Navigation, ) of a transcluded fragment the software should not only decide on how to interpret the given data, but also on when and where to render.

From wikitech-l (1)


pros: cons:
 * complex syntax, which is more suitable for programmers than users

From wikitech-l (2)

 * [!math x^2 + y^2 = z^2  !]
 * [!hiero b-l:a-h  !]
 * [!music do re mi fa sol !]

From wikitech-l (3)
pros:
 * no key word to translate
 * quick to type
 * not significantly more difficult, for some perhaps even easier, to remember than " ... "
 * easy to read in source text, provides visual encapsulation/parenthesisation
 * easy to get right (it is intuitive to think you have to close the brackets you open)

cons:
 * Looks like a link, but isnt
 * Not an argument &mdash; nor is [[Image:]] . &mdash; Timwi 13:44, 30 Mar 2004 (UTC)
 * Well, in a sense it is: it links to an image somewhere else. Formulae don't involve anything stored anywhere else. IMSoP 00:50, 31 Mar 2004 (UTC)
 * ... yes they do ;-) But I see your point. &mdash; Timwi 10:04, 31 Mar 2004 (UTC)
 * not obvious which markup refers to which feature
 * hard to memorize or even only to recognize - with more and more randomly assigned special characters we might end up with a syntax that is only comprehensible for Perl programmers Erik Zachte 00:52, 31 Mar 2004 (UTC)


 * difficult to add additional types of markup
 * not really. Admittedly, once we have hundreds of extension, it might start to be difficult to remember which one is @# and which one is #$, but (1) I don't think we'll ever have that many extensions (HTML and Unicode do evolve too); (2) I don't think any single user uses more than a couple of them. &mdash; Timwi 10:04, 31 Mar 2004 (UTC)

IMSoP's proposal
<tt> ... </tt> <tt> ... </tt>

Advantages:
 * Retains most of the advantages of
 * Easier to add new extensions - i.e. <tt> </tt> cannot possibly clash with something other than an extension, unlike <tt> </tt>
 * Even more obvious to a reader that this is a block of special <tt>foo</tt> markup and not just gibberish

Disadvantages:
 * Harder/slower to type
 * Even more localisation needed, unless an international word for "special" can be found

= Discussion =

Should SVG be put in-line or linked to?
How does this one need any special syntax ? You just use and it should be converted to PNG unless the user chose otherwise in preferences. Taw 18:47, 28 Mar 2004 (UTC)
 * SVG should be editable to fix labels in diagrams, etc. How do we best accomplish that in your scheme?&mdash;Eloquence

By being able to edit the text at Image:foo.svg R3m0t 08:14, 29 Mar 2004 (UTC)
 * That text is the image description page. It should not be abused for source code.&mdash;Eloquence

SVG is not meant for editing by hand &mdash; just download it, open in some SVG editor, fix and reupload. Taw 01:56, 30 Mar 2004 (UTC)


 * SVGs in Wikipedia will frequently consist of both text and images. The images are certainly best edited with a graphics editing program only, but the texts can in many cases be edited as is, and we should provide an easy way to do so. SVGs are also typically rather small, and being able to quickly copy & paste them is an advantage, e.g. for translating SVGs between the Wikipedias.&mdash;Eloquence 02:35, 31 Mar 2004 (UTC)

There is text there describing the image. Also there are links to allow upload and downloads. Why not a link for editing? What problems would it cause? Leaving raw SVG in the page is nasty: very intimidating. MrJones 12:29, 30 Mar 2004 (UTC)

Keep it short
I think it's important to keep the markup short (as in 'just a few characters to type'), or at least have a short alternative. If the markup is long, like in XML-like people will have a tendency to skip the markup, as it becomes 'too much work'. This is especially true for 'inline' markup, when the markup itself can easily more characters than they thing you markup.


 * Markup usage time = time to find out which markup to use (tr) + time to actually use the markup (tu). XML-type names are highly mnemonic, thus, tr is very low, but take longer to type. It's the other way around with symbolic names like "[$", where it is difficult to remember which symbol represents which function. More importantly, it is very hard to learn this from reading the wikisource -- it just looks like line noise to the average reader.&mdash;Eloquence 16:15, 31 Mar 2004 (UTC)

Wider Discussion
This is two points really:
 * 1) If this decision is going to be final, and have such a wide-ranging effect, I (IMSoP) suggest that this page be publicised as widely as possible, so that users can give their opinions on which they would feel best using.
 * 2) Someone should try and incorporate as many as possible of the points already made on this topic at http://wikisophia.org/wiki/Rend if they aren't already covered.