Talk:Requests for comment/Associated namespaces/Database schemas

marktraceur's notes

 * A namespace can be defined as the lead namespace of a group of namespaces

Why? Why not just define a namespace and then relations to other namespaces? That way you can remove the ns_talk_of field, and probably ns_type. You replace them with a table that looks like so:

I'd do something like this:

The nice thing about this approach is that you can have multiple "talk" relations without issue, and also you can have multiple other relations easily. One possible issue is that there can be multiple talk pages for the same one page...this may be useful, though, if there are different kinds of discussion you want in a namespace...anyway, I see no reason to have a "lead" namespace, or "groups" - they can just have relationships. --MarkTraceur (talk) 22:05, 24 April 2014 (UTC)


 * Let me see if I understood your example:
 * if I create a new page in page namespace 16, I can have one associated page from ns 10
 * if I create a new page in page namespace 11, I can have one associated page from ns 10, 16 and 17
 * When checking which relations has a page from ns 10 to other namespaces, how can I know if it is associated to just ns 16, or to ns 11, 16 and 17? How would you check for inconsistencies?--Micru (talk) 06:20, 25 April 2014 (UTC)
 * OK so.
 * In the current system, every page *has* a talk page. In a new system, every page that had a relationship to another namespace (e.g. Template, Template_Talk, Template_Documentation, and Template_Data in the above example) would have all of the related pages already, they just might not exist. So when I go to a Template page, there's a SELECT statement (heavily cached) that asks "What namespaces are associated with this namespace?" and then those tabs get created in the interface.
 * The enforcement will come in when people try to add multiple relationships to different talk pages for the same namespace, IMO. Enforcing one of each type of relationship per namespace might be nice, and you might accomplish this with a finite set of table fields, but we could as easily and more extensibly accomplish it with a table and appropriate uniqueness checks. --MarkTraceur (talk) 16:02, 25 April 2014 (UTC)
 * Ok, that makes sense..then I'll update the table as you said and I'll keep asking for feedback.--Micru (talk) 17:14, 25 April 2014 (UTC)

Indeces nl_namespace_name vs. nl_namespace_default
AFAICS nl_namespace_name and nl_namespace_default duplicate each other in function; why not just drop nl_namespace and use nl_namespace_default as both the constraint and primary key? (It's been a long time since I did anything proper with DB performance, though, so not sure what the performance impact would be here.) Jdforrester (WMF) (talk) 17:59, 25 April 2014 (UTC)

semi-CSV? + My old schema
What is with the weird custom "semi-CSV" format? Custom data formats are never a good idea for something like this, we don't want custom parsing and serialization code for this, please just use json or php's serialize format. This one-level semi-CSV also looks like a bad idea since it appears this schema has separate ns_protection and ns_options, if the schema used json or serialize there would be no need to have two columns for settings.

I did some experimentation with namespace registration before, take a look at the schema I used: https://github.com/dantman/mediawiki-core/compare/master...namespace-registry#diff-a22af9c6d1b8ab836a50a290227b98c9L1489 ~ Daniel Friesen (Dantman) (talk) 22:24, 25 April 2014 (UTC)
 * I merged nw_protection and nw_configuration into ns_settings as you suggested. Could you take another look and modify it as you feel convenient? --Micru (talk) 13:07, 28 April 2014 (UTC)

What's with the ns_constant column? It may be a bit more work but IMHO it would probably be better to deprecate the use of NS_ constants instead of trying to incorporate them. I see three potential issue with NS_ constants in a dynamic system like this. Firstly in order for constants to work they must always be defined, there's no way to defer that till when they are used, this means a db or cache fetch needs to be done early on in config loading for every page load, something we've always tried to avoid. Secondly this looks like a potential for a race failure, you need to know what the DB is in order to fetch ns information and define NS_ constants, however local settings and MW code loaded before the db is configured may need to use NS_ constants. Thirdly doesn't dynamically defined NS constants like this interfere with HipHop support?

I'd also like to note that when I worked on this area of stuff myself it was for. For the canonical strings that identified namespaces I used names like 'mw.user' 'ext.smw.property' and 'site.*' in order to differentiate namespaces. Different extensions will register different namespaces with the same name (eg: one registers Blog and another registers Blog) you don't want to mistake the "Blog" namespace you registered with the Blog namespace for a different extension. You also don't want to conflict with new core namespaces. And you want to differentiate site namespaces which are completely custom and may not mean the same thing on two different wiki even if the name and key are exactly the same. Daniel Friesen (Dantman) (talk) 20:29, 28 April 2014 (UTC)


 * On the canonical naming issue -- there was some general agreement at the last IRC RFC meeting on this that canonical names would be good enough for core namespaces, but I see your point about extensions. It might be better to consider symbolic names, or else have very clear documentation in namespace setup/registration code that indicates that your canonical name is meant to be unique, while you can apply any local/alias form... Might need to think about localization & aliases as a separate table that can be easily looked up too. Will brainstorm a bit on this later. --brion (talk) 14:54, 10 May 2014 (UTC)

Some comments from BJorsch
Not organized, this is just impressions from a quick look. BJorsch (WMF) (talk) 21:38, 28 April 2014 (UTC)
 * I'm not terribly fond of an arbitrary json blob in the database, but maybe it could work here. Remember that if you go with the json blob approach then you can't do any straightforward queries on it and updating it will be something of a pain. In other words, you're not seeming to get much over just keeping the constant-to-ID mapping in the database and then assigning the globals as they're currently done. But really, get input from Sean Pringle on this.
 * Don't use "all" as a language code, ISO 639-2 could someday assign that code. Use '*' or something. Probably could get better feedback on this from the Language team.
 * Relations are probably symmetric, e.g. 1=>0 as "talk" implies 0=>1 as "subject". This becomes more important when you consider that you're having 11=>10, 11=>16, and 11=>17 all as "talk", so what's the relationship for 10=>11, 16=>11, and 17=>11?

Quick notes
Some quick notes looking at this at the hackathon: --brion (talk) 14:50, 10 May 2014 (UTC)
 * note that namespace names can get really long -- the 'MediaWiki talk' namespace for Telugu is 43 bytes, and that's just one I checked offhand. 32-byte varchar for English is probably enough, but 64 byte varchar might not be enough for some long names, especially in Indic languages or rare languages that are outside BMP.
 * JSON blob for properties should be fine for basic lookups, but won't work well for pulling lists of namespaces the way the existing list array globals are used. Might be better to split those to separate fields.
 * namespace relation sets might need a sort-order field -- if we have a large set of namespaces bundled together we may want to be able to control the order they appear in the tabs, as simple order-of-insert or alpha won't always do.