Topic on Extension talk:FormatNum

Verdy p (talkcontribs)

The implementation currently uses   (U+2009 THIN SPACE) for the abbreviation "t" supported as a group separator (incorrectly named thousand separator in the doc, because most South-Asian languages group digits differently).

But that thin space is incorrect because it is breakable.

The correct code should be using &x202F; (U+202F NARROW NON-BREAKING SPACE). It has a known named character entity &nnbsp; that mat be used instead of the numeric character entity, but unfortunately still not recognized by all HTML parsers. But nothing prohibit Mediawiki recognizing that named entity, because many users are searching the correct code to use (which is definitely not   like you did!)

Note that the need of these thin spaces is especially highly recommended for the standard French typographic fine, which must necessarily be unbreakable, and also needed by ISO instead of the ambiguous comma of English (and highly prefered to   (U+00A0 NON-BREAKING SPACE, but NBSP is still better as a fallback than THINSP, which must never be used in any normal number... except for very long numbers where a break is sometimes needed, generally only between groups of about 60 digits, i.e. for a width of 30em for just the digits, or about 33 to 35em when including their group separators every 3 or 5 digits !)

Note that "formatnum" still does not support very long numbers, which need a secondary (breakable) group separator (every N groups, an alternate separator like ,  or   or just   may be used instead of the normal group separator, the default being every 20 primary groups of 3 digits) so they can break more cleanly. It is only for that THINSP may be used instead of NNBSP, for the large group separator.

Verdy p (talkcontribs)

Also you should support a language code parameter to select automatically the correct formatting of decimal numbers, notably

  • the correct set of decimal digits (See the Unicode general category "Nd"),
  • the correct group sizes (not 3 in all language, South Asian languages have a lowest group of 4 digits, then group every 2 or 3 digits),
  • the correct group (typically comma or nnbsp) and decimal (typically dot or comma) separators,
  • correct signs and sign position or the notation of negative numbers...

See Unicode CLDR data about these formats per locale, or their implementation in ICU.

Reply to "thin separator"