User:Ebernhardson/mysql deprecated

Abstract
MediaWiki's database abstraction is starting to show its age. It depends on the mysql_* functionality which is deprecated as of PHP 5.5, released June 19, 2013. Deprecated does not mean removed, MediaWiki can conceivably run on mysql_* for at least a few more years. In those next few years we will conceivably develop a new database abstraction layer.

Within the PHP community at large development processes have changed. Projects like the Framework Interoperability Group[1], tools like composer[2] and websites like packagist.org are encouraging modern php development to share code between projects and develop common solutions to common problems.

MediaWiki, and the php community at large, can directly benefit from us using and contributing back to the php ecosphere rather than living on our own little island. If we decide to disregard the existing projects within the php ecosphere we should strive to develop an independent abstraction layer that will be used by many php projects and not just mediawiki. Locking all that knowledge up into a single project is a waste of technical talent.

[1] http://www.php-fig.org

[2] http://www.getcomposer.org

From the deprecation RFC
Moving away from ext/mysql is not only about security but also about having access to all features of the MySQL database.

ext/mysql was built for MySQL 3.23 and only got very few additions since then while mostly keeping compatibility with this old version which makes the code a bit harder to maintain. From top of my head missing features not support be ext/mysql include:
 * Stored Procedures (can't handle multiple result sets)
 * Prepared Statements
 * Encryption (SSL)
 * Compression
 * Full Charset support

So moving away from ext/mysql is a good thing.

What does the current abstraction do for us
To know if any existing solutions can meet our use case we must first look at our current abstraction and see what it does for us, and what it has failed to do for us.

Random quotes about the database layer from wikitech-l

 * You should think of Database.php as a query-building module rather than a DB abstraction layer. -- Tim Starling


 * As far as I'm concerned, DB abstraction is a courtesy to low-traffic non-Wikimedia users. The queries we use, and the schema, should be optimised for MySQL. -- Tim Starling


 * While more abstraction never hurts for testing, the abstraction that we currently have is sufficient. -- Christian Aistleitner


 * Well, having pgsql and all the other BOOL-supporting DBMSes forced to use integer types just because devs are too lazy to make sure things work properly on all databases is exactly the sort of thing the abstraction layer is meant to prevent. Each database should be able to properly use its own features. -- Aryeh Gregor


 * In my opinion that would be a bit horrifyingly scary. ;) -- Brian Vibber ( in response to the question: Would it make sense for the database abstraction layer to know our schema, so that it can automatically cast things to bool before handing them down to the actual database )


 * Incremental improvements to our current way of doing things (cutting back on raw queries, moving MySQL-specific stuff from Database to DatabaseMySql, defining more clearly what Database methods mean and avoiding undefined behavior) seem entirely sufficient to allow support for any number of additional database backends. -- Aryeh Gregor


 * Maintenance of non-MySQL DBMS code has been a problem in the past -- Tim Starling

Random other quotes from wikitech-l

 * could avoid a lot of work for the entire development team writing, testing and debugging functionality readily available in an Open Source library. We all benefit from a better product that comes from sharing our ideas. -- Evan Prodromou

Benefits

 * Small. A review of the mailing list archives indicates the lightweight nature of the current abstraction is a favorable aspect to some developers.


 * Treats all databases other than MariaDB as second hand citizens in the name of performance.


 * Safer quoting, avoidance of SQL injection attacks. The current abstraction covers perhaps 90% of use cases where a user provided variable needs to be used in a query.


 * Table prefix support


 * Filtering or emulation of unsupported options to queries such as DELAYED


 * Emulation of MySQL-specific queries such as REPLACE


 * Table name quoting


 * Simple access to `foo_col` IN (x1,x2...xN)


 * Real prepared statements involve two round trips to the database, We have short living connections, and no statements are repeated, so the actual prepared query will not be reused.


 * DBO_TRX - Some sort of automated transaction handling


 * Integrated profiling (sort of)


 * Load Balancing implementation which handles many possible edge cases


 * ORMTable impelmentation which is NOT an ORM

Failings

 * History shows us that the requirement to manually escape queries (even if only some queries) can lead to mistakes causing security issues. The php community at large has mostly settled on the solution of bound parameters.  It is quite easy to see when reviewing a query using bound parameters if it will properly protect from malicious user input.  MediaWiki core contains 372 references to DatabaseBase::addQuotes, and extensions adds 425 more for just shy of 800 total.


 * https://www.mediawiki.org/wiki/Bug_management/Triage/Databases_20111102


 * Uses mysql_*, which is not a failure except that its going away


 * Treats all databases other than MariaDB as second hand citizens in the name of performance.

Size
phploc 1.7.4 by Sebastian Bergmann.

Lines of Code (LOC):                             15367 Cyclomatic Complexity / Lines of Code:          0.14 <-- excessively complex Comment Lines of Code (CLOC):                     6638 Non-Comment Lines of Code (NCLOC):                8729

Namespaces:                                          0 Interfaces:                                          7 Traits:                                              0 Classes:                                            41 Abstract:                                          2 (4.88%) Concrete:                                         39 (95.12%) Average Class Length (NCLOC):                    238 Methods:                                           886 Scope: Non-Static:                                    869 (98.08%) Static:                                         17 (1.92%) Visibility: Public:                                        794 (89.62%) Non-Public:                                     92 (10.38%) Average Method Length (NCLOC):                    11 Cyclomatic Complexity / Number of Methods:      2.41 Anonymous Functions:                                 0 Functions:                                           0

Constants:                                           3 Global constants:                                  0 Class constants:                                   3

Options existing within the PHP Community
Over the last three years the php community has moved to a development methodology that involves sharing independant pieces of code between multiple projects rather than going full force in NIH mode. Here we should at least review what options the rest of the php community is using: