User:Dantman/Abstract query language/old
From MediaWiki.org
- Return a list of user objects
- Query from the list of revisions
- ?page is an unassigned variable that must be externally defined, expected to be a page, ie Title (the mapping of Revision:page to a page/Title is built into MediaWiki).
- The page of a revision is defined by ?page
- ?rev:user is ?u, so the User that is returned is extracted from there
- Order by the largest timestamp of the revisions for the distinct user.
DISTINCT { USER ?u } FROM { Revision ?rev } WHERE { ?rev:page IS ?page, ?rev:USER IS ?u } ORDER { MAX(?rev:TIMESTAMP) }
Taking into account the database structure which would be described inside MediaWiki the query would translate about into this sql currently.
SELECT USER.*, rev_user_text AS user_name, MAX(rev_timestamp) AS TIMESTAMP FROM revision LEFT JOIN USER ON (rev_user = user_id) WHERE (rev_page = 1) GROUP BY rev_user, rev_user_text ORDER BY TIMESTAMP DESC
The query however when looped over would return objects where ->u would contain a User instance.
The key difference of course is:
- If the database structure changes, the generated sql will be different, the abstract query will not need to be updated to fit the new db structure as SQL would.
- This flexibility would actually flex out even to extreme database changes. That query could still be functional even if we dropped the entire user table, the rev_user_text field, and replaced them with a usermap table that maps anon ips and registered users to ids.
- Because the query is written more abstractly individual db engines could theoretically set rules for certain types of queries to optimize them for that specific DB engine.
- We should theoretically be able to keep our top MySQL performance.
- We might also get better performance in other dbs like pgsql.
PHP Api [edit]
There's a little undecided on the php api, there are two possibilities:
// Abstract query $q->query(" DISTINCT { User ?u } FROM { Revision ?rev } WHERE { ?rev:page IS ?page, ?rev:user IS ?u } ORDER { MAX(?rev:timestamp) }", array( 'page' => $this->getTitle() ) ); // Abstract query (php syntaxed?) $q->query(array( "distinct" => "User ?u", "from" => "Revision ?rev" "where" => array( "?rev:page" => "?page", "?rev:user" => "?u" ), "order" => "MAX(?rev:timestamp)" ), array( 'page' => $this->getTitle() ) );
The advantage of the PHP syntax is that for a portion of the syntax mistakes php will throw a syntax error, however the abstract string dsl is more flexible for things like != and other comparisons.
We're beginning to target HipHop as a primary target, and we have the option in cases where performance is key to use a php extension, so performance wise a string based dsl shouldn't be a prooblem.