Talk:Requests for comment/Associated namespaces/Database schemas

From mediawiki.org

marktraceur's notes[edit]

A namespace can be defined as the lead namespace of a group of namespaces

Why? Why not just define a namespace and then relations to other namespaces? That way you can remove the ns_talk_of field, and probably ns_type. You replace them with a table that looks like so:

namespace_relations
Field Type
nsr_id int
nsr_from_id int (corresponds to ns_id)
nsr_target_id int (corresponds to ns_id)
nsr_type varchar(32)

I'd do something like this:

id from_id target_id type
0 11 10 talk
1 16 10 documentation
2 17 10 schema
3 11 16 talk
4 11 17 talk

The nice thing about this approach is that you can have multiple "talk" relations without issue, and also you can have multiple other relations easily. One possible issue is that there can be multiple talk pages for the same one page...this may be useful, though, if there are different kinds of discussion you want in a namespace...anyway, I see no reason to have a "lead" namespace, or "groups" - they can just have relationships. --MarkTraceur (talk) 22:05, 24 April 2014 (UTC)Reply


@MarkTraceur: Let me see if I understood your example:
  1. if I create a new page in page namespace 16, I can have one associated page from ns 10
  2. if I create a new page in page namespace 11, I can have one associated page from ns 10, 16 and 17
When checking which relations has a page from ns 10 to other namespaces, how can I know if it is associated to just ns 16, or to ns 11, 16 and 17? How would you check for inconsistencies?--Micru (talk) 06:20, 25 April 2014 (UTC)Reply
OK so.
In the current system, every page *has* a talk page. In a new system, every page that had a relationship to another namespace (e.g. Template, Template_Talk, Template_Documentation, and Template_Data in the above example) would have all of the related pages already, they just might not exist. So when I go to a Template page, there's a SELECT statement (heavily cached) that asks "What namespaces are associated with this namespace?" and then those tabs get created in the interface.
The enforcement will come in when people try to add multiple relationships to different talk pages for the same namespace, IMO. Enforcing one of each type of relationship per namespace might be nice, and you might accomplish this with a finite set of table fields, but we could as easily and more extensibly accomplish it with a table and appropriate uniqueness checks. --MarkTraceur (talk) 16:02, 25 April 2014 (UTC)Reply
Ok, that makes sense..then I'll update the table as you said and I'll keep asking for feedback.--Micru (talk) 17:14, 25 April 2014 (UTC)Reply

Indeces nl_namespace_name vs. nl_namespace_default[edit]

AFAICS nl_namespace_name and nl_namespace_default duplicate each other in function; why not just drop nl_namespace and use nl_namespace_default as both the constraint and primary key? (It's been a long time since I did anything proper with DB performance, though, so not sure what the performance impact would be here.) Jdforrester (WMF) (talk) 17:59, 25 April 2014 (UTC)Reply

semi-CSV? + My old schema[edit]

What is with the weird custom "semi-CSV" format? Custom data formats are never a good idea for something like this, we don't want custom parsing and serialization code for this, please just use json or php's serialize format. This one-level semi-CSV also looks like a bad idea since it appears this schema has separate ns_protection and ns_options, if the schema used json or serialize there would be no need to have two columns for settings.

I did some experimentation with namespace registration before, take a look at the schema I used:
https://github.com/dantman/mediawiki-core/compare/master...namespace-registry#diff-a22af9c6d1b8ab836a50a290227b98c9L1489
~ Daniel Friesen (Dantman) (talk) 22:24, 25 April 2014 (UTC)Reply

@Dantman: I merged nw_protection and nw_configuration into ns_settings as you suggested. Could you take another look and modify it as you feel convenient? --Micru (talk) 13:07, 28 April 2014 (UTC)Reply

What's with the ns_constant column? It may be a bit more work but IMHO it would probably be better to deprecate the use of NS_ constants instead of trying to incorporate them. I see three potential issue with NS_ constants in a dynamic system like this. Firstly in order for constants to work they must always be defined, there's no way to defer that till when they are used, this means a db or cache fetch needs to be done early on in config loading for every page load, something we've always tried to avoid. Secondly this looks like a potential for a race failure, you need to know what the DB is in order to fetch ns information and define NS_ constants, however local settings and MW code loaded before the db is configured may need to use NS_ constants. Thirdly doesn't dynamically defined NS constants like this interfere with HipHop support?

I'd also like to note that when I worked on this area of stuff myself it was for bug 31063 [MediaWiki should use a reservation system for namespaces]. For the canonical strings that identified namespaces I used names like 'mw.user' 'ext.smw.property' and 'site.*' in order to differentiate namespaces. Different extensions will register different namespaces with the same name (eg: one registers Blog and another registers Blog) you don't want to mistake the "Blog" namespace you registered with the Blog namespace for a different extension. You also don't want to conflict with new core namespaces. And you want to differentiate site namespaces which are completely custom and may not mean the same thing on two different wiki even if the name and key are exactly the same. Daniel Friesen (Dantman) (talk) 20:29, 28 April 2014 (UTC)Reply

On the canonical naming issue -- there was some general agreement at the last IRC RFC meeting on this that canonical names would be good enough for core namespaces, but I see your point about extensions. It might be better to consider symbolic names, or else have very clear documentation in namespace setup/registration code that indicates that your canonical name is meant to be unique, while you can apply any local/alias form... Might need to think about localization & aliases as a separate table that can be easily looked up too. Will brainstorm a bit on this later. --brion (talk) 14:54, 10 May 2014 (UTC)Reply
Btw, as a bonus I was also contemplating tweaking the export format when I was working out those plans. Export would include the canonical strings (mw.user, ext.smw.property, site.*) inside the namespace export information. On import instead of creating a title called "Property:Foo" in the main namespace the import system could create the Property: namespace itself, and since it knows that the key for it is ext.smw.property it will automatically start working when SMW is installed. Likewise if you had a Blog: namespace created for ext.SomeBlog.blog and were importing from a wiki with a Blog: namespace created by ext.YetAnotherBlogEngine.blog the import system would understand that these Blog: namespaces are different, it could create a temporary Blog_1: namespace and allow you to rename it afterwards. Daniel Friesen (Dantman) (talk) 15:38, 10 May 2014 (UTC)Reply

Some comments from BJorsch[edit]

Not organized, this is just impressions from a quick look.

  • I'm not terribly fond of an arbitrary json blob in the database, but maybe it could work here. Remember that if you go with the json blob approach then you can't do any straightforward queries on it and updating it will be something of a pain. In other words, you're not seeming to get much over just keeping the constant-to-ID mapping in the database and then assigning the globals as they're currently done. But really, get input from Sean Pringle on this.
  • Don't use "all" as a language code, ISO 639-2 could someday assign that code. Use '*' or something. Probably could get better feedback on this from the Language team.
  • Relations are probably symmetric, e.g. 1=>0 as "talk" implies 0=>1 as "subject". This becomes more important when you consider that you're having 11=>10, 11=>16, and 11=>17 all as "talk", so what's the relationship for 10=>11, 16=>11, and 17=>11?

BJorsch (WMF) (talk) 21:38, 28 April 2014 (UTC)Reply

Quick notes[edit]

Some quick notes looking at this at the hackathon:

  • note that namespace names can get really long -- the 'MediaWiki talk' namespace for Telugu is 43 bytes, and that's just one I checked offhand. 32-byte varchar for English is probably enough, but 64 byte varchar might not be enough for some long names, especially in Indic languages or rare languages that are outside BMP.
  • JSON blob for properties should be fine for basic lookups, but won't work well for pulling lists of namespaces the way the existing list array globals are used. Might be better to split those to separate fields.
  • namespace relation sets might need a sort-order field -- if we have a large set of namespaces bundled together we may want to be able to control the order they appear in the tabs, as simple order-of-insert or alpha won't always do.

--brion (talk) 14:50, 10 May 2014 (UTC)Reply