Machine-friendly wiki interface

For a client-side reader/editor and legitimate bots, it would be useful to be able to bypass some of the variableness of the for-humans web interface.


 * Retrieve raw wikicode source of a page without parsing the edit page
 * ex http://www.wikipedia.org/wiki/Foobar?action=raw
 * Should we be able to get some meta-data along with that -- revision date, name, etc? Or all separate...
 * How best to deal with old revisions? The 'oldid' as at present, or something potentially more robust; revision timestamp should be unique, but may not always be (only second resolution; some old timestamps wiped out by a bug in February '02 leaving multiple revisions at the same time)
 * At some future point, preferred URLs may change and UTF-8 may be used more widely; a client should be able to handle 301 & 302 redirects, and the charset specified in the Content-type header. If your bot won't handle UTF-8, it should explicitly say so in an Accept-charset header so the server can treat you like a broken web browser and work around it.
 * out-the-side XML interface to pages for common reference combinations, so that server load is drastically reduced by going only to the right pages, and so that indices can be prepared combining all references to a:
 * location in terrestrial spacetime
 * specific people by any of their titles, names or group affiliations
 * eco-regions which are the basis of biology and climate information, and the most reasonable objective basis for geography.
 * Fuller RDF-based Recentchanges
 * Also page history and incoming/outgoing links lists? Watchlist?
 * A cleaner save interface and login?
 * Look into wasabii (web application standard API [for] bi-directional information interchange). It's meant as a general API for CMSes, weblogs, etc.  The spec may be rich enough for it to work with Wikipedia.  The plus side of supporting wasabii is that any wasabii-compliant end-user application should be able to interface with Wikipedia.
 * In the blog world, at least, wasabii seems to be positioning itself as the next generation standard API (replacing bloggerAPI as the popular interface), which means lots of end-user applications will be created. All we'd have to do is support wasabii at some URL and we'd automatically inherit crap loads of functionality.
 * How compatible would this be with the simple ideology of Wikitax? Or Wikitax as presently constituted?
 * The specs at the site aren't very clear. Are there are implementations you can point to that would give a better idea of how it actually would operate? (The mailing list drops off in September, with people saying that it's too bad there are no implementations so no one's really sure how it works so there are no implementations...) Additionally, it's not clear how the recursive node model maps onto wiki (is a title a parent node, and old versions subnodes? Or are new versions subnodes? Or...???) How would categories, schemes and taxonomies map to languages/sections and namespaces? --Brion VIBBER

Comments, suggestions?

I would like to see a simple HTTP function that allows me to take away JUST the article HTML, or raw wiki edit format using a URL such as /wiki/Article_name?agentType=bot&output=wiki|HTML

142.179.66.22 19:13 Feb 2, 2003 (UTC)

OK, I just moved this here from the Wikipedia client page. Here's what Brion VIBBER and I said (and I see wasabii uses xml-rpc, so it still kinda fits)

Oh dear god, not XML-RPC! I agree that having to parse out the table of contents and extra links and so forth is bad, but there are ways to do what we ned via plain old HTTP. For instance, to get the raw data for a page, you might use a URL like "http://www.wikipedia.org/wiki/some_article?outfmt=raw". It would also be nice to be able to get the page as HTML, but without the extra links, like so: "http://www.wikipedia.org/wiki/some_article?outfmt=html". - TOGoS


 * I agree that XML-RPC (or SOAP or WebDAV) feels like overkill. A simple XML return format for thinsg like recentchanges, history, and metadata (author, revision date) might be useful, but plain old HTTP should provide all the calling semantics we need. --Brion VIBBER 01:34 Dec 11, 2002 (UTC)

You're right. I like to complain about XML (I think XML-RPC is especially over-hyped), but it has its place. Anyway... What I'm really thinking is that the interface between the server and the client should be as simple and intuitive as possible. For instance (to expand on what I already said), to update a page, you would just do a PUT or POST request to 'http://www.wikipedia.org/wiki/article_name' with the new raw wiki data as the body of the request. now, with the web interface, you have to post data to some weird, non-inintuitve URL. I know the client would take care of this (and I actually don't really mind it for the web interface), but having a simple, intuitive, not-bound-to-the-implementation API is a good thing, in general. So I think it would be worth it in the long run to do a little extra work to get the server to understand a more intuitive API

(And sorry I hate XML-RPC so much. I know it's probably the inevitable over-bloated protocol of the future. Everyone else seems to love it.) - TOGoS

I loaded a tarball to my MySQL server and wrote a few lines of PGP...Try this ->


 * http://www.mac-kenzie.net/wikipedia/raw.php?article=Architecture
 * http://www.mac-kenzie.net/wikipedia/raw.php?article=Architecture&xml=true


 * these links seem to be dead :(

eco-regions which are the basis of biology and climate information, and the most reasonable objective basis for geography

Boy, do I understand little of all this, but I guess I will need some of it for this project
 * w:Wikipedia:WikiProject Ecoregions -- anthere

XML-RPC is easily abstractable, and most programming lanugages support it. At least it is much less bloated than SOAP. However, I think its more important to get a working implementation fast, than to struggle to implement an half-finished architecture.