Requests for comment/image and oldimage tables

Problems

 * 1) Database table  has no auto-incrementing primary key.
 * 2) Database table  has no primary key, at all.
 * 3) File revisions as a concept (within the code and API) should have unique identifier. Right now we only have "current file title + upload timestamp". The lack of a unique id for file revisions leads to race conditions, makes the table hard to index and hard to query.
 * 4) Uploading file revisions should not involve moving rows across tables, or replacing entire rows.

Use cases

 * End-user improvements
 * T139294: Persistent media links for files - Right now we only have permalinks for non-current file revisions. The current file revision is only accessible by the canonical url for the file. This in contrast to page revisions, which always have a stable revision id.
 * Enable creation of secondary database tables that associate data with files and/or their revisions (e.g. T33257: File properties and License integration). This is not feasible with the current schema due to a lack of primary keys.
 * API Improvements
 * prop=imageinfo - Use a proper file revision ID instead of fragile timestamp as identifier.
 * Technical debt
 * LocalFile::recordUpload2 - Eliminate timestamp kludge.
 * Eliminate the need to query two tables when needing to find a file revision.
 * Eliminate the need to move rows across tables when uploading a file. This is an anti-pattern in relational databases and makes certain improvements to database stability and performance hard or impossible (as found at Wikimedia Foundation).
 * Stability: Having a separate file and filerevision table would solve data inconsistency issues. A request will resolve the pointer once and all other queries will fetch information about the same revision. It also makes it more attractive to query a replica instead of the master.

1. Add primary keys
Rejected. This solution would be relatively easy to implement, but does not solve Problem 4.

Advantages

 * Eliminates the legacy handling of needing to interact with two tables for most operations (image and oldimage).


 * Aligns the file tables with the design of the and  tables.
 * Creates proper semantic separation between file entities and the representation of their revisions.

Effective change

 * Add fields:
 * img_id: New primary key for.
 * img_latest: Pointer to  for the current revision of a file (similar to )
 * oi_id: Primary key of.
 * oi_img: Pointer to  of the file this revision corresponds to (similar to )
 * Rename tables:
 * Rename  to.
 * Rename  to.
 * Add missing rows in  for current revisions, based on rows from.
 * Reduce fields in  to only those needed for current revisions only.
 * Currently indexed:  , ,  ,  ,  ,  ,.
 * img_timestamp: Remove. Used for contribution history and creation of archive name. Replaced by filerevision.
 * img_user_text: Remove. Used for contribution history, ApiQueryAllImages, SpecialMIMEsearch. Query from  instead (using a join, if needed).
 * img_sha1: Keep. Used for duplication detection for current revisions.
 * img_media_type, img_major_mime, img_minor_mime: Keep. Used by Special:MIMESearch for current revisions. A separate initiative may start after this RFC is approved and implemented to consider removing this in favour of a SearchEngine-based approach, and/or to find a way to efficiently from  instead.
 * img_size: Remove. Not indexed. Used by checkImages.php (error check) and Special:MediaStatistics (SUM query). Can be queried by joining against  instead.
 * img_width, img_height: Remove. Not indexed. Only store in  instead.
 * img_bits: Remove. Not indexed. Not queried. Only store in  instead.
 * img_description: Remove. Not indexed. Only store in  instead. Can be queried there if needed.
 * img_user: Remove. Only store in  instead. Can be queried there if needed. Used by ApiQueryAllImages and NewFilesPager (SpecialNewimages) to join against   for filtering bots. Can join against filerevision instead. Though should probably use recentchanges.rc_bot instead.

Migration strategy
Exact schema migration script to be written as part of the implementation and fine-tuned as needed during code review. We may need two separate strategies due to the size of the migration (one for the default db updater, and an opt-in maintenance script for large farms such as Wikimedia's). A few ideas so far:
 * Tim Starling comment #2747454:
 * Rename  to  . Create a view called.
 * Add new fields to.
 * Create  table.
 * (Disable uploads.)
 * Populate  from.
 * Also move  rows into  . (Not be visible from the   view)
 * (Deploy new MediaWiki version that uses file/filerevision.)
 * (Re-enable uploads)
 * Drop image and oldimage.
 * Jaime Crespo mentioned the idea of potentially doing the migration offline while serving traffic from codfw instead of eqiad.

3. Replace with MCR
Migrate all file information to page and revision tables and migrate meta data and content references by using Multi-Content Revisions per T107595.

We could decide to stop using the image/file tables entirely, in favour of storing the information in the wiki page/revision tables, with file-backend references similar to what we do for ExternalStorage. We could keep "file" as a simplified secondary data table (similar to "categories").