Topic on Talk:Files and licenses concept

Bryan (talkcontribs)

We need to stabilize the database scheme as soon as possible. The most annoying requirement for this purpose is #2 (A list of files by copyright holder should be obtainable), but this is too important to be dropped in my opinion.

Options:

  1. Add an fileprops entry for every revision and join fp_rev_id against page_latest. Requires an index on page_latest, but no schema changes to revision
  2. Add a boolean fp_latest to fileprops and add rev_fileprops_id to revision, which is a reference to fp_id. fp_latest needs to be included in all indices. If fp_id is taken equal to the revision that changed the fileprops entry, there is no need for an extra index to revision.
  3. Add a boolean fp_latest to fileprops and add an fileprops entry for every revision. This requires no schema change to existing tables. fp_latest needs to be included in all indices.
  4. Add img_fileprops_id to image, which is a reference to fp_id, which is the revision id of the revision that changed the fileprops entry. img_fileprops_id needs to be indexed.

I need a bit of thinking before I state my preference, but those 4 are the options available that I see.

Bryan (talkcontribs)
Bryan (talkcontribs)

I have thought about this, and I think #4 is the cleanest way to implement this feature. Using img_fileprops_id a direct link between file props and a file is created, which is in my opinion the most natural thing to do. The disadvantage is that while the current version links to a file, the previous versions link only to revision and thus to a page, which is a bit inconsistent. (We could add an fp_img_name column to solve the inconsistency, but I'm not sure I like that)

#3 is the easiest to implement from an operational perspective. As stated before however, it feels a bit like a hack to have a boolean fp_latest, but perhaps I'm wrong?

I do not have very strong opinions about this, so I'm fine with any of them.

Krinkle (talkcontribs)

I have the following points on my personal "requirement list" for the file_props identifier:

  • There should be a link between mw_revision and mw_file_props in such a way that if I edit a page and click "Permanent link" that that page will keep showing the same fileproperties and wikitext. In order words, a direct link between a revision and the text.oldid and file_props that belongs to it.
  • We shouldn't duplicate mw_text or mw_file_props rows if nothing has changed.

I think #2 is a good choise. That will also allow doing queries only searching through current files and their latest version of the fileproperties (JOIN between current revision and the file_props set that belong to it).

Bryan (talkcontribs)

I agree that duplicating rows if nothing changed is not a good idea.

You asked me what I had against fp_latest, but other than "I don't like it" I have no arguments against it, so let's not take my opinion with regards to that into account.

Krinkle (talkcontribs)

A possible other option:

#5: Add rev_fileprops_id to mw_revision, which is a reference to fp_id. fp_id is equal to the revision that changed the fileprops entry, there is no need for an extra index to mw_revision.

To get a list of all files that are currently by a certain author or with a certain license we'd join page_latest with revision and file_props. The advantage is that we don't need an extra "fp_latest", downside is that such a join may not be very fast depending on where the indexes are (We need mw_page to get all files and the current revision id, then from revision the current fp_id).

I think in terms of effeciency #5 is probably not going to make it and thus a quick and fast fp_latest in mw_file_props would help. However let's not forget that there's no direct link from a set of file_props to a page. One would still need a join to page for the titles etc. So perhaps #5 isn't bad after all.

Reply to "Database schema"