Universal Language Selector/WebFonts

From MediaWiki.org
Jump to: navigation, search

Webfonts - Introduction[edit | edit source]

Font Selection on Universal Language Selector (ULS)

Wikimedia wikis are available in nearly 300 languages. Although the wikis in English receive much of the traffic, the other language wikis (using Latin and non-Latin scripts) also generate considerable volume of content. Additionally, the Latin script based wikis contain pages with content fragments written in non-Latin script. Apart from Wikipedia, sister projects like Wiktionary, Wikisource are also largely multilingual.

Latin script based languages do not face many serious issues related to the availability of fonts to read content on the variety of devices used by users. But non-Latin languages often face issues with reading content when fonts on the operating systems are unavailable, out-dated, bug-ridden or aesthetically sub-optimal for reading content.

In some cases, non-Latin wiki pages also display banners at the top of the page (Eg: See Malayalam wikipedia) asking users to visit a help page if they encounter problems while reading the text. These help pages contain links to download fonts and installation instruction for different operating systems.

For many languages, if the operating system is not recent, chances are that fonts are not available by default. For example, the fonts for the Odia script are available only in very recent versions of the Windows operating systems. Similarly, new characters are being encoded in Unicode for non-latin languages and they are not available in the fonts that may already exist in the various operating systems (including the newer versions). For example, 5 characters that were encoded for the Malayalam script can be used only in fonts on Windows 7. The same fonts when used in other operating systems or versions may not necessarily provide similar usability for reading content.

Wikimedia attempted to solve this problem in 2011 using the MediaWiki extension Webfonts. In 2013, this extension was deprecated and all its features integrated into the Universal Language Selector extension, which provides additional language features including input methods and language selection.

Besides solving the problem of non-availability of fonts for many languages, this extension provides a way to serve fonts in wikipages for a few other (somewhat interesting) use-cases as well.

The Webfonts technology provides an opportunity to have special fonts to resolve accessibility issues related to reading. The OpenDyslexic font is one such special font that makes use of font weights and design to alleviate problems related to reading content. This is particularly useful for readers requiring assistance on account of the medical condition Dyslexia.

Wikipedia's sister project Wikisource attempts to digitize and preserve copyright-expired literature. Some of these books require special fonts to convey the content properly. For example, a grammar book in Hebrew wants a traditional writing style to explain the concepts. Webfonts helps there by specifying and providing the exact font required.

A text written in Cuneiform needs a special font to display it properly.

Challenges[edit | edit source]

Webfont technology is becoming more and more popular on the web, with Google Web Fonts as one of the best known providers of free webfont service. But the use case for this technology is still mainly for aesthetic purpose. Non-Latin webfonts are not as popular as Latin script fonts. Google's webfont repository serves a few non-Latin fonts.

To get the desired rendering, serving fonts is not enough. The operating system should be capable of rendering the script using the font. Windows XP Service pack 2 and older versions cannot render Indic complex scripts, no matter whether the font is present or not. Android was not capable of rendering any complex scripts before 4.x versions even though native browsers supported webfonts.

Rendering Inconsistency[edit | edit source]

In addition to issues with varied connections and browsers, it’s also important to consider how fonts are rendered across operating systems.

Mac OS and Windows have fundamentally different philosophies when it comes to rendering text. Mac OS uses anti-aliasing to smooth text, while Windows Vista and 7 use a technology called ClearType to render more detailed text.

Windows XP doesn’t do either (at least by default), and Linux might do any number of things depending on how it’s configured.

Windows has two different rendering modes: the newer DirectWrite and the older GDI. The two modes can produce substantially different results.

IE9 and Firefox use DirectWrite, and they can use hardware acceleration to render pages. If hardware acceleration isn’t available or has been turned off, fonts don’t look quite as smooth in these browsers.

Chrome uses GDI, and Firefox 7 implemented a pref that specifies a list of fonts for which GDI rendering will be used at sizes below 16px. By default the list contains fonts such as Arial, Tahoma, Verdana, Trebuchet MS, Segoe UI and Consolas. Downloadable fonts will always use DirectWrite.

Windows ClearType takes advantage of the fact that pixels on LCD screens are composed of red, green, and blue vertical bars. It treats these bars as narrow pixels, allowing Windows to display more detail in text.

ClearType can make text sharper and more readable, but sometimes it creates a colored shadow around text; some people complain it gives them a headache. There is an "Adjust ClearType Text" tool to configure ClearType for your display. If the software isn’t configured properly, it can actually make text harder to read.

CSS3 has a property called "font-smooth" that is designed to provide an anti-aliasing of web fonts. As of today, it does not appear to be supported by any of the major modern browsers.

Webfont rendering with Windows is a challenge that Google Web Fonts service also faces. Stackoverflow has several questions about bad font rendering by Google Chrome even with Google Web Fonts 12

Google webfont users use SVG fonts to render whenever possible to avoid dependency on operating system rendering engines. This works well as long as your font is simple, without hints and without Opentype tables or instructions. A Latin font (but not all languages) may get correctly rendered using SVG fonts, but non-Latin fonts usually need Opentype instructions for correct rendering. SVG font format does not support them, and basically using an SVG font for, say, Hindi guarantees wrong rendering.

Impact on bandwidth[edit | edit source]

The number of glyphs present in non-Latin fonts poses a challenge for bandwidth usage. While a normal English font can have font size below 100 Kilobytes, a Malayalam script font will have about 1000 glyphs and size above 300 Kilobytes in truetype (.ttf) format.

The Jomolhari font for Tibetan script is 352.4 KB; the TrueType Akkadian font used for Cuneiform is 772 KB; and the Tuladha Jejeg font for Javanese script is 399 KB. But some non-Latin fonts are very small: the Lohit Devanagari font for Devanagari script (Hindi, Marathi, Sanskrit) is just 37 KB. The smallest font we have is the Saweri font for Buginese languages: 3.6 KB. The Shapour font used for Pahlavi script is 4KB.

Small fonts are perfect for web use. Anything more than, say, 100KB costs bandwidth, because the font is downloaded along with the page content. A typical thumbnail image shown in a wiki page will be around 15-20 KB in size (e.g., the image of Barack Obama on his wikipage is 18.6 KB). We can say adding the font addition is similar including the image in a page. But if the font is bigger, more bandwidth is consumed. There are multiple bandwidth optimizations done- a font gets downloaded only once, when you visit the page for the first time; a font in a webfont format, like eot or woff, is very compressed; etc. While a font is being downloaded, browsers will try to render the content using a system fallback font, and use the downloaded font when it is ready. This will cause a flash in the webpage often known as FOUT. The visibility of this FOUT is more noticeable when the font is big, but with current browser technology this is unavoidable.

The bandwidth issue gets more serious if the wiki page has multiple languages, each of which uses a webfont. A typical wiki page is multilingual. Every wiki page has a sidebar showing interwiki links to articles in other languages about the same topic. The wiki page about Obama exists in 204 languages besides English, and has interwiki links to all of them, using about 28 non-Latin scripts.

Imagine a wiki page has all languages, and we use our webfont repository to render these languages, if the language has a default font. The total bandwidth usage is calculated as 3.8 MB. Does such a page exist? Yes, see https://translatewiki.net/wiki/Special:SupportedLanguages. But the strings are all language names. (We are handling this situation in a different way. Read on.)

Selection of fonts[edit | edit source]

Availability of free licensed fonts for non-Latin languages is a challenge. There are a lot of languages without any free licensed font. Sometimes the font available will be bad in quality. This does not mean there are good proprietary licensed fonts. Because of this, for some languages we end up using a low quality font since there is no other choice. Large-scale deployment of these fonts creates a lot of users and they report bugs. Another headache is an inactive upstream. Most of the fonts will not be version controlled, and there won't be any issue trackers. Sometimes the maintainer or author of the font won't respond to our attempts to communicate.

Because of this, we try to choose fonts with an active upstream, version control, issue tracker. But sometimes we end up with no choice.

Sinhala has only one free licensed font, LKLUG, by the Lanka Linux Users group. When we inquired whether that font is actively maintained, the answer was no. The GNU FreeFont project's Serif font has Sinhala glyphs copied from LKLUG, but they have not been updated for a long time. The Sri Lankan government recently released some Sinhala fonts, but unfortunately the licensing conditions prevent us from using them.

Thaana, written in the Divehi script, did not have any fonts. GNU FreeSans had glyphs for this script, but FreeSans as such cannot be used as a webfont since it is a large font covering lots of scripts. With the help of the FreeFont project, these glyphs were extracted to create a new font for Thaana named Free Thaana )[1], [2].

Tamil, using Tamil script, had only one free licensed font—Lohit Tamil from Red Hat—but the Tamil community did not want it as the default font for Tamil because of bad quality in rendering, so it has been kept as and optional font. Lohit Tamil is also distributed by Google as part of Google Web Fonts.

Fonts were added to the font repository based on bug reports we received in Wikimedia Bugzilla. There were some additions based on community requests by mailing lists, hackathons, talk page requests etc.

We include the bug reports or request sources for font addition in source code so that we can trace why a particular font was added and when.

Statistics about the fonts in the repository[edit | edit source]

  • Number of languages supported by ULS for webfonts: 115
  • Number of fonts: 82
  • Number of languages having 'system font' as default font : 49
    • For these languages, unless the user uses language settings -> font selection, by default no font will be embedded. Most of these languages are well supported in computers.
    • 36 of these 49 languages also have an optional font, OpenDyslexic, to help people with dyslexia to read content.
  • Number of languages with a default webfont: 66
    • For each of these languages there is a default webfont that gets embedded unless user has a preference that prevents it. These default fonts are in general suggested/requested by communities. They keep changing, slowly, as communities find a better font or find serious issues with their current default fonts. Also, for languages with big default fonts, we need to talk with their communities to make them non-default if possible.
    • 23 of these 66 are written in Tibetan script and use the Jomolhari font (352.4 KB). There was a request to support all languages with this script.
    • Only 30 of the 66 are known to MW (as per Names.php). The rest of them can appear in random wiki articles.
      • Average size of these 30 languages' fonts: 153KB
      • Smallest: 3.6 KB
      • Biggest: 772 KB: Akkadian for akk

Table of languages and default fonts[edit | edit source]

LangCode Lang name Default font Size of woff in KB
bug Buginese Saweri 3.6
mak Makasar Saweri 3.6
pal Middle Persian (Pahlavi) Shapour 4
peo Old Persian Artaxerxes 1.4
pa Punjabi Lohit Punjabi 12.6
lo Lao Phetsarath 15
gu Gujarati Lohit Gujarati 28
hbo Ancient Hebrew Taamey Frank CLM 36
ahr Khandeshi (अहिराणी) Lohit Marathi 36.4
mr Marathi Lohit Marathi 36.4
ne Nepali Lohit Nepali 36.5
bh Bihari Lohit Devanagari 37.3
bho Bhojpuri Lohit Devanagari 37.3
gom Goan Konkani Lohit Devanagari 37.3
hi Hindi Lohit Devanagari 37.3
kok Konkani Lohit Devanagari 37.3
mai Maithili Lohit Devanagari 37.3
sa Sanskrit Lohit Devanagari 37.3
arc Aramaic Estrangelo Edessa 39.5
syc Syriac (ܣܘܪܝܝܐ) Estrangelo Edessa 39.5
or Oriya Lohit Oriya 47.8
kn Kannada Lohit Kannada 48
tcy Tulu Lohit Kannada 48.1
te Telugu Lohit Telugu 56.4
saz Saurashtra (ꢱꣃꢬꢵꢯ꣄ꢡ꣄ꢬꢵ) Pagul 69
km Khmer KhmerOSbattambang 76
dv Divehi FreeFont-Thaana 104
ii Sichuan Yi Nuosu SIL 133
my Burmese TharLon 159
bn Bengali Siyam Rupali 165
bpy Bishnupria Manipuri Siyam Rupali 165
cr Cree (ᓀᐦᐃᔭᐍᐏᐣ) OskiEast 169
am Amharic AbyssinicaSIL 250
gez Ge'ez AbyssinicaSIL 250
ti Tigrinya AbyssinicaSIL 250
tig Tigre AbyssinicaSIL 250
dz,bo and languages in Tibetan script Dzongkha, Standard Tibetan,… Jomolhari 352.4
jv-java Javanese (ꦧꦱꦗꦮ) Tuladha Jejeg 399
sux Sumerian Akkadian 772
akk Akkadian Akkadian 772

And if around 450 languages appear on a page (as language selector or similar cases) the total font size is 50KB (Autonym font).

You can browse all the fonts included.

How webfonts are applied[edit | edit source]

Webfonts are by default disabled in all WMF wikis. Users need to enable them using ULS preferences window. Once enabled, webfonts are applied based on a decision algorithm as given below.

Webfonts are applied using a jquery library -jquery.webfonts. jQuery webfonts is configured with the MW webfonts repository.

ULS WebFonts Workflow Diagram

On page load, jquery.webfonts starts its job. The plugin is applied to the body. A webfont applicable to the body is chosen and applied. jquery.webfonts starts looking for HTML elements with the lang attribute. If the element has a language attribute and there is a font applicable for that language, the font is applied to that element. The CSS font-family attribute of the element is also checked. If the font family contains a font, and it is present in the webfonts repository, that font should be delivered to the client. That much is the high level working of jquery.webfonts. Now to the details:

Identifying the font to be applied[edit | edit source]

By specifying font-family

Inside the wiki text YourText, the webfonts extension will check whether the font is available; if so it will download it to the client so that the reader will not face any difficulty in reading the text even if the font specified is not available in their computer.

By specifying language

Inside the wiki text YourText, we will check whether any font is available for the given language with the extension; if so it will download it to the client. If there are multiple fonts for the language, the default font will be used. If the default font is not preferred, use the font-family approach to specify the font. If the tag has both lang and font-family definitions, font-family gets precedence.

User preferences[edit | edit source]

For every language, users can choose a webfont from the available fonts or can choose to opt out—i.e., do not use webfonts for the language. The default font for a language is based on the user community requests. Depending on the language support available in operating systems, some communities do not demand default webfonts. For them, system fonts (no webfont) will be the default preference. For some other languages, the user community demanded setting a specific font as the default font to be applied. The list of languages with a non-system default font is given above.

The user preferences are slightly more complicated than we imagined. These font preferences are provided using a cog icon in every page or as part of the language selector at the top of the page. But there is another font selector in MediaWiki preferences to select a font for the edit text area; the options include monospace, sans serif, etc. We want to respect that choice too. So before applying the webfonts to an edit area, we check for the user's preferences from the "Preferences" page of the wiki. That preference gets precedence.

But it doesn't stop there. Browsers apply monospace fonts to the editable fields for some languages without any preference from users. Monospaced makes sense only for a small number of scripts; e.g., monospace does not make sense for Arabic or Indian scripts. Both Google Chrome and Firefox apply monospace font to the edit area if the calculated language is a monospace-applicable language. We need to respect this behavior from browsers too. Our code should not override this behavior, so as not to annoy users by changing the appearance.

More complications[edit | edit source]

Font inheritance

In a multi-language web page, if we apply a font to a top element associated with a language, as per usual CSS inheritance behavior it gets applied to the child nodes. What happens if one of the child node languages differs? For that language, there may be a font to be applied, or no font applicable. If there is a different font to be applied, there is nothing much to worry about, because applying that font breaks the inheritance. But if there is no font to be applied, the parent font gets inherited to this element. This is undesirable, since the child element language does not have a font as per preferences or as per the repository. But how will the font get inherited if the child node is a different language? This happens only when the parent and child share the same script.

To illustrate this with an example, consider a Hindi wiki page with a paragraph in Marathi. Both languages are written using Devanagari script, differing only by a few letters. If the Marathi paragraph did not define any font, a Hindi font will be applied to it. It is not disastrous, but still a bug. We have to break the inheritance at the level of the Marathi paragraph and avoid this happening. Explicitly defining a font family for that paragraph is the way.

Avoiding too much inline css

If we are applying webfonts to all elements with a lang attribute, there will be many in a DOM. If the child inherits the font from its parent, there is no need to explicitly define a style for child element. Figuring out this while parsing the body is also important.

Font detection[edit | edit source]

Webfont generates CSS that uses a local variable (fontFamilyName) to make sure fonts are not downloaded if a font with the same font-family name exists in the local computer.

That means that if we have FontA as default for LangA, applying FontA to an element does not mean that it always gets downloaded to the client machine. It gets downloaded only when the client computer does not have FontA.

So algorithms to detect local font availability do not help us here. Such algorithms help to programmatically decide an aesthetically good font stack for a target platform. (E.g., see http://www.lalit.org/lab/javascript-css-font-detect/ and read the comments.)

So it is quite possible that even if the user's computer has a font for a particular language, the webfont system will download another font from the server to render text in that language. But in most cases this is the user's choice. For a few languages—those with a default font, listed above—this download occurs without a user preference. Each of those languages is configured to use a default font because of a user community request. So in one sense, it is again a user choice.

It is feasible to detect tofu for some languages. (Tofu is a glyph inside fonts with the glyph name .notdef as per Opentype specification, shown when a requested glyph not present in font.) But that alone does not detect whether a user will be able to read the content without issues as explained in the beginning of the document.

Even for the languages with default fonts, our servers experienced a significant number of font requests. This urged us to further tune our algorithm to incorporate the above mentioned tofu detection algorithm. The current version of ULS webfonts has tofu detection in place, only for languages having default fonts.

A generic tofu detection algorithm that works for all languages is a complex challenge. We are doing some experiments on that aspect. A special blank font named Tofu is being attempted by Behdad Esfahbod with a few bytes size. But the technology used for that is very advanced and current browsers do not support it.

Font fallback[edit | edit source]

It is quite possible that the text to which the font is applied will have characters outside the font. It is a common practice to add fallback fonts to the fontfamily. attribute. We add sans-serif as the fallback font.

But IE 6 is not capable of doing font-fallback properly. Because of this, if the page contains text within elements having no lang information, squares are displayed. The extension is disabled in IE 6 browser.

Browser compatibility[edit | edit source]

Internet explorer 6 and 7 is blacklisted.

Notable browser bugs[edit | edit source]

Google chrome (chromium too) upto version 15 has a bug that cause some complex script webfont being rejected by the browser. This issue is observed only for Meera font of Malayalam. The OTS tool does not show any errors in the font. But the issue is not present in Chrome 17. See Bug report ([3]).

In Mac OS X 10.7, Opera browser is not capable of rendering complex scripts like Malayalam. See our detailed test report about various OS-browser combinations. Also see Bug 31823 - Opera 11.51 does not display saz font.

OSX 10.7.2/Opera 11.60 has no fallback for Latin characters.

Optimizations for bandwidth usage[edit | edit source]

Font formats[edit | edit source]

To convert the truetype or opentype fonts to WOFF and EOT formats, google's sfntly tool is used. For EOT format, for optimal compression, MicroTypeExpress format is used.Since MicroType Express gives an approximate 15% gain in compression over gzip, a web font service striving for maximum performance can benefit from implementing EOT compression.

For some languages, the font size can be very big, with careful testing, we can remove all hinting instructions to reduce the size.

Caching configuration[edit | edit source]

To ensure that the web fonts files are cached on the clients' machines, font file types must be added to the web server configuration. In Apache2 this consists of:

  • Adding font file extensions to the FileTimes regex at FilesMatch for the relevant directory, example:
<FilesMatch "\. (gif|jpe?g|png|css|js|woff|svg|eot|ttf)$">
  • Adding ExpiresByType values to the relevant MIME types, similarly to image MIME types. Note that there's no standard MIME type for TTF. application/x-font-ttf is used for Wikimedia.
  • Adding the MIME types:
AddType application/x-woff .woff
AddType application/vnd.ms-fontobject .eot
AddType application/x-font-ttf .ttf

For a full example see the caching configuration update done for the Wikimedia cluster.

Bits caches eqiad cluster network 2013Generated from http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&c=Bits+caches+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report

Browser Cache[edit | edit source]

If the font is available in user's local system, font will not be downloaded from the mediawiki server. It will be taken from the user's computer. Otherwise, font will be downloaded from the server only once. ie when the user selects the font first time. Next time on wards, font will be taken from the local cache.

Autonym Font[edit | edit source]

When an article on Wikipedia is available in multiple languages, we see the list of the languages in a column on one side of the page. The names on the list are written in the script that the language uses (Also know as language autonym). Which also means that all the appropriate fonts were needed to be present for the autonyms to be displayed. For instance, an article like the one about the Nobel Prize is written in more than 125 languages and needs around 35 fonts to display the names of the the languages. Initially this was handled by the native fonts available on the reader's device. If a font did not present, the user would see square boxes (known as tofu) instead of the name of a language. To workaround this problem, not just for the language list, but for other section in the content area as well, the Universal Language Selector (ULS) provides a set of webfonts that were loaded with the page. While this ensured that more language texts were displayed, a serious problem was also encountered. The presence of the additional fonts added an overload on the page, and users saw pages loading much slower than before. To improve client-side performance, webfonts are not applied to the interlanguage links area.

Not using the webfonts for the lnterlanguage links seemed like the easiest solution, but it also takes us back to the sup-optimal multilingual experience that we were trying to solve in the first place. Articles may be perfectly displayed thanks to web fonts, but If a link is not displayed in the language list, many users will not be able to discover that there is a version of the article in their language. The autonyms were not just required for the interlanguage links. They were also required for the Language Search and Selection window of the Universal Language Selector, which allows users to find their language if they are on a wiki displaying content in a script unfamiliar to them.

Autonym font tries to solve this. The font contains glyphs and opentype rules for rendering the language autonyms. And it contains only those glyphs for a language. For example, for Thai, the font has glyphs required for rendering ไทย alone.

Font available at [4]

In MediaWiki, add a class 'autonym' to the elements containing language autonyms. The font will be applied automatically from ULS extension.

WebFonts and FOUT[edit | edit source]

One unavoidable consequence of using webfonts is flash of unstyled text (FOUT). Most webfonts don't take long to load, but some do.. Browser behavior differs while waiting for the font.

  1. Use a system font immediately while the webfont loads. When it does, change the font (Firefox behavior- till 2011)
  2. Keep the text invisible until 3 seconds, then fallback to a system font. When the webfont finally loads (e.g. at 6 seconds) change the text to use it. (current Firefox behavior).
  3. Keep the text invisible until the webfont is ready (current Safari and Chrome behavior)

From Chrome 35 onward, Chrome is going to follow Firefox behavior.

Some developers use WebFont Loader JS or Typekit in order to have control over this behavior. But All of the above options are disruptive to the users. Wikimedia webfonts implementation does not do anything special to address this rather than depending on browsers behavior.

Font load events are coming to browsers and provide great insight over when fonts are loaded, allowing you to customize behavior without depending on any heavy javascript libraries.

Google chrome team is working on improving the situation by introducing WOFF2 to further reduce font size, performance improvements and font load events

See a discussion initiated by Paul Irish on this topic

Client side performance optimization[edit | edit source]

Resource loader refactoring to optimally load only the required javascript is done to reduce the javascript payload. Optimization on the DOM scan during page-ready time is elaborated at https://etherpad.wikimedia.org/p/VtkX1MynVt.

Usage statistics[edit | edit source]

Webfonts are disabled by default in WMF wikis because of high bandwidth usage on clusters and potential slow down on page loads for users. Webfonts needs to be enabled manually by users using ULS preference window. Following graph shows the number of people enabling this feature from the day the default-disable state was introduced.


Frequently Asked Questions[edit | edit source]

I have the font for my language in my computer. Can I disable the feature using preferences?[edit | edit source]

Yes, you can set system font as the preference for your language.

If the font provided by Webfonts is the same font you have in your computer, effectively webfonts will not be applied for that language. In this case a user need not bother about selecting font preferences.

Can't the webfonts system detect whether a local computer has font for a language and download fonts?[edit | edit source]

See also #Font detection

It is possible to detect whether a specific font X is present in the computer. We must know the name of the font to do that test. It is not a straight forward test. We need to create two temporary content elements in the page. Render the content with a fallback font like sans-serif and another with the specific font. If the width and height of the rendered element matches, one can assume that same font was used to render and that happens when fallback font is used instead of specific font. The sample text chosen should be a text that the specific font supports. So if we are testing Hindi, the text should be in Hindi. And the characters chosen should be such that different fonts render them with differing dimensions. This becomes quite complex and non-scalable solution if we are talking about 100s of languages. Knowing all possible font names for a language is also difficult.

It may be possible to detect tofu for a given language, but still not foolproof since not having tofu does not mean that the system has usable fonts for a language.

So we leave the decision of whether a default font to be applied for a language or not to the user community consensus.