User:Dantman/Abstract query language

From mediawiki.org

We have no high-level query language. As a result any time you want to do queries of any level you are exposed to database internals. This means that if we have a reason to change the database schema in a way that changes old assumptions we can go and break a lot of code easily. Basically we're missing the "Model" in MVC in 90% of situations (only trivially simple cases and non-internal things using the API have any Model).

Advantages:

  • If the database structure changes, the generated sql for an abstract query will be different, the abstract query will not need to be updated to fit the new db structure as SQL would.
    • This flexibility would actually flex out even to extreme database changes. A user query could still be functional even if we dropped the entire user table, the rev_user_text field, and replaced them with a usermap table that maps anon ips and registered users to ids.
  • Because queries are written more abstractly individual db engines could theoretically set rules for certain types of queries to optimize them for that specific DB engine.
    • We should theoretically be able to keep our top MySQL performance.
    • We might also get better performance in other dbs like pgsql.

Syntax[edit]

// Select category name and page counts
$q = Query::select('string $name, int $pages')
	// Get data from our list of categories
	->from('Categories $cat')
	// Limit to non-empty categories
	->where('$cat->pages > 0')
	// Extract certain pieces of information from the category we selected
	->where('{$name, $pages} = $cat')
foreach ( $q->run() as $cat ) {
	$cat->title;
	$cat->pages;
}

// Select a Title
$q = Query::selectOnly('Title $to')
	// Get data from all existing pages
	->from('Pages::all $to')
	// cross the query with page links
	->from('Pagelinks $link')
		// Filter down to pagelinks that come from a specific title, creating a JOIN ON in the SQL
		->where('$link->from == $from')
		// Restrict the links to a specific namespace
		->where('$link->to->namespace == "mw.template"')
		// The title the link points to is the title we want to select
		->where('$to = $link->to')
	// Set $from to a specific page we have a Title instance for
	->with('Page $from', $disPageObj)
foreach ( $q->run() as $title ) {
	$title->getText();
}

// Query for distinct user objects
$q = Query::distinct('User $user')
	// Query from the list of all revisions
	->from('Revisions $rev')
	// Restrict the query to a specific page
		->where('$rev->page == $page')
		// The user we want to return is the revision's author
		->where('$user = $rev->user')
		// Order our query by the most recent timestamp value for each user
		->orderBy('MAX($rev->timestamp)')
	->with('Page $page', $this->getTitle());
foreach ( $q->run() as $user ) {
	$user->getName();
}

A syntax like this is both powerful enough and separated enough to be capable of doing big queries like one of the ones we do in SpecialAllpages::showChunk.

// Select pages from the list of all existing pages
$q = Query::select('Page $page')
	->from('Pages::all $page')
	// Filter by namespace
	->where( '$page->title->namespace == $ns')
		->with('Namespace $ns', $namespace)
	// Start at a specific key
	->where( '$page->title->text >= $fromKey' )
		->with('string $fromKey', $fromKey);
	// Order by title text
	->orderBy('$page->title->text')
	// Limit to a specific number of pages
	->limit($this->maxPerPage);
	// Trigger a special mode where we query limit+1
	// items so we can tell the result if there are
	// more items on a later page
	->withPageEnd();

if ( $hideredirects ) {
	// Filter out redirects
	$q->where('$page->isRedirect == false');
}

if ( $toKey !== "" ) {
	// End at a specific key
	$q->where('$page->title->text <= $toKey')
		->with('string $toKey', $toKey);
}

// Finally run the query
$res = $q->run();

See also[edit]