Talk:Markup spec/DTD

The following discussion has been transferred from Meta-Wiki.
Any user names refer to users of that site, who are not necessarily users of MediaWiki.org (even if they share the same username).

There was I discussion on general aspects of what should be marked up in wikipedia (names, places, dates/times...). since Wikipedia DTD is about an XML representation of the current syntax I moved it to Talk:Simple ideology of Wikitax --Nichtich 01:36 Feb 7, 2003 (UTC)

CDATA

<nowiki><![CDATA[do [not] parse <this>]]></nowiki>

I've never been quite clear on how CDATA sections work. If my data includes a raw "]]>", how do I encode it? --Brion VIBBER 06:44 Jan 23, 2003 (UTC)

<![CDATA[just]]>]]><![CDATA[split]]>

linking

linking to images / other media

<link system="wiki" href="image:Wiki.png"/>

Image links are functionally different from regular wiki links, as they embed images. It would be best to use a distinct tag. --Brion VIBBER 07:03 Jan 23, 2003 (UTC)

You're right. I suggest:

Link to the page of the image: [[:image:Wiki.png]] <link system="wiki" href="image:Wiki.png"/>

Embed image/media/file/...

[[image:Wiki.png]] <media href="image:Wiki.png"/>

I prefer media because we only embed media objects and embed could mean something link "embed the content of another page". See also discussion on special pages below.

linking to other pages

<link system="wiki" href="Wikipedia FAQ"/> ... <url href="http://www.wikipedia.org"/> ... <mail to="webmaster@wikipedia.org"/>

These seem overcomplicated. Wouldn't it be simpler (in an XML way) to use the same tag for all links, and just have a wiki-specific URI? eg:

local wiki link: Main Page
- <link href="wiki:Main_Page">Main Page</link>
interwiki link: MeatBall:CommunityExpectation
- <link href="wiki://MeatBall/CommunityExpectation">MeatBall:CommunityExpectation</link>
interlanguage link: [[eo:DTD de Vikipedio]]
- <link href="wiki://EsperantoWikipedia/DTD_de_Vikipedio" rel="language" lang="eo" />
remote non-wiki link: Slashdot
- <link href="http://slashdot.org/">Slashdot</link>
ISBN: ISBN 0-201-89683-4
- <link href="isbn:0201896834">ISBN 0-201-89683-4</link>

Upon (possible) reconversion to wiki syntax, the parser could use the most efficient form of representation available in that particular wiki syntax for that type of link.

No redundancy please
An XML syntax should code information in tags and attributes. parsing strings is ugly and less efficient.
The difference between interwiki links and local wiki links depends on the application. Try to edit test:baz since now it's a valid name but maybe there will be a "test"-wiki in the future.
interlanguage links are a special topic. We could use a special tag:
```
<interlanguage href="eo:DTD_de_Vikipedio"/>
```
How about link system="url" for external links instead of url and email?

--Nichtich 22:34 Feb 2, 2003 (UTC)

It could be useful to code

User:Foo => <link system="wiki" space="user" href="Foo"/>
Talk:Bar => <link system="wiki" space="talk" href="Bar"/>

and in other languages

Benutzer:Foo => <link system="wiki" space="user" href="Foo"/>
Diskussion:Bar => <link system="wiki" space="talk" href="Bar"/>

But how to handle a page like:

Talk:User:Foo

Also possible (for instance in the german Wikipedia):

Diskussion:Talk:User:Foo

--Nichtich 21:59 Feb 2, 2003 (UTC)

linking to other languages

IMHO the interlanguage-link would be better like

<interlanguage lang="eo" href="DTD_de_Vikipedio"/>

That way you give more information without the need of parsing the content of href. Lothar Kimmeringer 02:37, 15 Sep 2003 (UTC)

I think we should abstract a bit:

<alternate hreflang="eo">DTD de Vikipedia</alternate>

Yes, I know, this seems a little bit like overkill, but we should look at the future. -- Eckhart Wörner 20:20, 29 Aug 2004 (UTC)

notes on paragraphs

Manual Paragraphs with the p tag are pretty ugly to handle. Try:

<p>Hi! This is a paragraph

with an empty line in it.</p>

You get:

<p>Hi! This is a paragraph
<p>
with an empty line in it.</p>

but the valid syntax is

<p>Hi! This is a paragraph</p>
<p>with an empty line in it.</p>

Why can't we just remove all invalid HTML-Tags? :-(

Currently,

<p>A paragraph with

a blank line.</p>actually produces

<p>A paragraph with 
a blank line.</p>

Using a <p> tag essentially disables the special meaning of a blank line (but not that of a newline).
Typeterson 17:12, 22 Jun 2005 (UTC)

DTD in general

Latest comment: 20 years ago1 comment1 person in discussion

How about a definition of the term DTD right up front as in this external link:
http://www.hyperdictionary.com/dictionary/Document+Type+Definition

Need a rationale for creating a custom DTD

Why would we create and support a fit-to-purpose DTD, when everything we use in MediaWiki is already available in DocBook XML or even Simplified DocBook?

There are many excellent tools for converting Docbook XML to various other document formats -- HTML, RTF, PostScript and PDF. It's a well-accepted standard for document markup, and it would thus be useful for readers and for downstream publishers.

Creating a custom DTD would mean that we'd have to create processors from scratch. If DocBook (or another existing XML document format) meets our needs, what's the point? --Evan 00:43, 16 Dec 2003 (UTC)

Hmm. Not a lot of follow up on this. Is this a ridiculous question to ask? Are people not familiar with Docbook? Is this a dead proposal? --Evan 18:43, 27 Dec 2003 (UTC)

There's never been a lot of motion on this XML stuff. Unless one of the interested parties is going to write some code, it's likely to stay that way. --Brion VIBBER 22:39, 27 Dec 2003 (UTC)

DocBook for exporting sounds good (MediaWiki articles => DocBook => ...) but not every DocBook article will fits in MediaWiki. Up to now you cannot validate the syntax of wikipedia articles. As I mentioned before we lack of a real (=syntax-tree generating) parser of Wiki markup so Evaluting complex requests need complex unrelyable regular expressions. In XML you can navigate through articles using XPath. But Brion is right - shame on me as I have not yet wrote a parser nor contributed the MediaWiki source in any other way ;-) -- Nichtich 18:13, 12 Jan 2004 (UTC)

IMHO you need the Wiki XML format. You will want it at the very least as an easier to parse intermediate format for converting into DocBook later. It is non trivial to map Wiki markup data into DocBook. The same thing is done in most compilers. You use an intermediate language before converting to the target language. It looks like the first step to take is to build a decent Wiki markup parser, then create a Wiki markup to Wiki XML compiler, then a Wiki XML to DocBook compiler. It may seem more complicated this way, but the problem gets easier to solve if broken into pieces. Note: I think this should all be written in PHP. Vasc 3:12, 9 May 2004 (UTC)

Why a DTD?

A DTD seems like a pretty clumsy way of formally specifying syntax. Why not use the standard—and as an added bonus, more elegant—method of formally representing syntax and sematics and write a context-free grammar for wikitext? That could be used with parser generators and an abstract syntax definition to build an abstract syntax tree, which can then be written out as XML or to whatever you'd like. --Delirium 03:14, 25 May 2004 (UTC)Reply

A DTD is not a replacement for a context-free grammar. For example, it would be nice if we have a context-free grammar to transform wiki markup to this DTD, but wiki markup is intended to be understandable for humans, not for computers. Look at the MediaWiki function for quotes (') -- something like these quote embedded in italic text is easy to understand for humans, but not for computers. -- Eckhart Wörner 15:14, 29 Aug 2004 (UTC)

Example article

I'm not sure if I understand the XML definitions. Would the following text be a valid and complete article? If not, please correct!

<?xml version="1.0" encoding="iso-8859-1"?>
<article>
    <meta>
        <!-- article title and which wiki -->
        <title interwiki="en" article="Jabberwocky" />
        <!-- interwiki links -->
        <interwiki language="de" article="Jabberwocky" />
        <interwiki language="es" article="Jabberwocky" />
    </meta>
    <text>
        `Twas brillig, and the slithy toves
        Did gyre and gimble in the wabe:
        All mimsy were the <link interwiki="en" article="Borogove">borogoves</link>,
        And the mome raths outgrabe.
    </text>
</article>

-- Stw 21:22, 19 Mar 2004 (UTC)

Namespaces, interwikis and article titles should be split -- Nichtich 16:14, 31 Mar 2004 (UTC)

Complete different start-up

Well, I see some problems with this DTD. It is too fixed on the current MediaWiki software with its cur and old tables.

<article>
  <meta>
    <history>
      <edit>
        <text>
          <!-- This contains the text of a previous version -->
        </text>
      </edit>
    </history>
  </meta>
  <text>
    <!-- This contains the text of the current version -->
  </text>
</article>

This look like the previous version being less worth than the current version, which is a wrong supposition in a wiki. It somehow presents the current version as the article, the rest - well, some odd FDL stuff... It can be even worse: Imagine the <history /> of the article being filled with some previous versions. Now these previous versions also have a <meta /> element containing a <history />, which means that the history could be mentioned a lot of times - in fact, as often as there are versions in the history.

In my opinion the history should be the leading element in an article:

 <article>
   <head>
         <!-- This contains meta information that is version-independent,
                      such as the title and some status information               -->
   </head>
   <versions>
     <version>
       <meta>
         <!-- This contains meta information that is version-dependent,
                      such as the interwiki links, but also timestamp and author  -->
       </meta>
       <text>
         <!-- This contains the text of the version (or a redirect, or
                      WikiTax, or it is even left out                             -->
       </text>
     <version>
   </versions>
 </article>

Advantages:

With the same DTD, you can...
- get the current version of an article (one version)
- get each other version of an article (one version)
- get each version of an article together with a history (all versions, but only one with text)
- get a history list of an article (all versions, but no text)
- get a complete dump of an article (all versions, all text)

Disadvantages:

The current article version is not as clear as before, there has to be a convention of which article is first and which is last - there are two possibilities.

Eckhart Wörner 16:17, 29 Aug 2004 (UTC)

Numbered headlines must die!

Don't copy HTML together with its ERRORS! Lesson has been learnt that numbered headings are poor solution and <section>+<h> should be used instead (allows easier document inclusion, styling and adds real sectioning). See XHTML2 specs for details.

End of content from meta.wikimedia.org.
Note that the above conversation may have been edited or added to since the transfer. If in doubt, check the edit history.