User:Lucas Werkmeister/Draft:Customizing gender in MediaWiki

MediaWiki’s support for recognizing the gender of its users is currently limited: users can specify in their preferences whether they want to be described using gender-neutral, female, or male terms; gender-neutral is the default. This does not cover the full range of ways in which users should be addressed, and the lack of distinction between “did not customize this preference” and “explicitly prefers gender-neutral words” can also create confusion. I propose a way to improve on this situation, leaving control over it to the editor community. In this way, even if no Wikimedia community is interested in the feature, third-party wikis (e. g. dedicated LGBTQ+ sites) can make use of it without requiring software modifications.

Inspiration
The first influence for this scheme is the NaN undefineds magic word. Like, it allows interface messages to adapt according to the language’s needs (even in cases where in English no distinction is necessary); however, it’s more flexible, and allows not just a fixed number of forms (e. g. singular, plural), but also specific forms for certain numeric values. Thus, in addition to NaN $1 thingss, a message could also read NaN $1 thingss , if desired.

The second influence is the understanding that interface messages are a common mechanism to customize MediaWiki without modifying its configuration. There are many examples of interface messages that the software does not directly show in the user interface; for instance, Extension:Proofread Page uses the messages Proofreadpage quality0 category through Proofreadpage quality4 category to define the names of categories which pages are sorted into. While messages are usually translated on translatewiki.net, they can be overridden on each wiki by editing pages in the MediaWiki: namespace (this is often used for project-specific customizations, either in the wording of a message or in the way it is formatted; the message text is full-featured wikitext), and new messages can also be defined by creating pages there. Customizing the wiki in this way is much simpler than changing the configuration, which requires a software change to be deployed (see Requesting wiki configuration changes for the process on Wikimedia sites).

Concepts

 * Available gender values
 * The list of values which are available for the gender preference (on Special:Preferences and via the API) is defined by the i18n message keys of the form  in the wiki’s content language, where identifier is the value that is actually stored, and the message text is what is shown to the user on the special page (and may be translated). (TODO: the special page doesn’t actually show a name, it shows example usage like “she edits wiki pages”; could be separate messages, but then what is the name even used for?)
 * Note that the list of available values may change over time, and values may become unavailable if their message is deleted. MediaWiki must handle stored user preferences containing gender values which are not currently available; such values are not invalid.
 * Note also that gender values in the sense of this proposal do not directly map to genders of users. The current “unknown” value (also referred to as “neutral”) already covers a range of possibilities, from “unspecified” to any number of gender identities that could be categorized as non-binary. Gender values for this proposal are also shaped by the needs of the software; the reason the software needs the gender at all is to refer to the user, e. g. using pronouns. This means that gender identities may be grouped together where no distinction is needed to ensure correct messages, whereas even users with the same gender identity, e. g. just “nonbinary”, may need different gender values if they should be referred to differently.


 * Translated gender identifiers
 * While the  messages define the name of a gender value in various languages,   messages translate the identifier itself into other languages. The translated identifiers are not used to store the setting, but may be used in the  magic word, see below. (TODO: what happens if the gender identifier in the wiki’s content language is set to something else?)


 * The magic word
 * Currently, a wikitext construct like defines forms for the male, female, and neutral/unknown gender value. This is extended with the capacity to name specific gender values before an equals sign:  . Any number of identifier/form pairs is permitted. The identifier can either be in the wiki’s content language (matching the initial   message) or, for on-wiki uses, in the page’s content language (which may differ from the wiki’s content language, e. g. on language-specific village pumps in multilingual projects like Wikimedia Commons and Wikidata). For compatibility and convenience, omitted identifiers continue to map to the male, female, and neutral/unknown gender values, in this order; additional forms without identifiers (beyond these three) are ignored.


 * Gender fallbacks
 * The  messages define the identifier of a single other gender value which is used if a use of the  magic word does not include a form for the named user’s gender.
 * The default fallback for all genders is “male”. This matches the current behavior of the magic word. Note that it is common for the source messages (usually English) to use the  magic word even where it is not needed in the source language, to indicate to translators that the gender is available should it be required in the target language. For example, the confirmable-confirm message could be defined as Are you sure? in English, but is actually defined as Are  sure?, so that translators know that they can use the same magic word if necessary, e. g. in French: Êtes-vous sûr ? That is the practical reason why even “female” must fall back to “male”. (In principle, it would also be possible to make “male” fall back to “female”, but the current behavior is that in  , form1 is male and form2 female, and so it seems more natural that in  , form1 should be male rather than female.)
 * Note that languages in MediaWiki also have fallbacks, but they can have a whole list of fallbacks, which may include cycles (for instance, pt and pt-br are each other’s first fallback), whereas this proposal only allows for a single fallback per gender value. Encoding a list of fallbacks into the  messages would be a bit more complicated, and I conjecture that this additional complexity is not needed. If it does turn to be necessary, we can still add support for it.

Example
Suppose that Wikidata wants to allow its German-speaking users to specify that they use es/ihr pronouns. ( es means it, but shares many forms with the male pronoun, so it is sometimes combined with the forms of the female pronoun, including ihr , her .) To do this, they would:


 * Create the page MediaWiki:gender-name-nonbinary-es-ihr with the content nonbinary ( es/ihr in German)
 * Create the page MediaWiki:gender-name-nonbinary-es-ihr/de with the content nichtbinär (es/ihr)
 * Create the page MediaWiki:gender-identifier-nonbinary-es-ihr/de with the content nichtbinär-es-ihr
 * Create the page MediaWiki:gender-fallback-nonbinary-es-ihr with the content unknown
 * Override any other messages as necessary. I couldn’t find a good example to showcase both es and ihr, but a fictional one based on emailuserfooter might look like … Falls du antwortest, erfährt  deine E-Mail-Adresse.

In other languages, and for messages that were not overridden as above, such users would receive the “unknown”/“neutral” forms of a message (in English, usually based on “they”). However, translations in other languages could also treat this gender value specially, if desired; they would only have to use the identifier , rather than  as in the German message (or, if the identifier was translated into their language, the translated identifier).

Compatibility
This proposal is mostly backwards compatible. MediaWiki would not define any additional gender values by default; their introduction is left up to each wiki’s community. However, the interpretation of certain uses would change, specifically ones that look like they use the new “specify gender by identifier” feature: assuming that $1 refers to a male user, currently renders as female=text A , but would under this proposal render as text B instead. I do not see any way to avoid this, but I also think that this is exceedingly unlikely to happen in practice.

Risks

 * A fine-grained list of gender values may lead to users disclosing more personal information about themselves than usual. Imagine if a survey among wiki users reported that respondents were “82 male, 29 female, 1 nonbinary ( es/ihr in German), 327 unknown”; if the custom gender option is not widely used, this may inadvertently disclose or at least suggest that a certain user did or did not participate in that survey. Users should be made aware of this risk.