Universal Language Selector/WebFonts

Webfonts - Introduction


Wikimedia wikis are available in nearly 300 languages. Although the wikis in English receive much of the traffic, the other language wikis (using Latin and non-Latin scripts) also generate considerable volume of content. Additionally, the Latin script based wikis contain pages with content fragments written in non-Latin script. Apart from Wikipedia, sister projects like Wiktionary, Wikisource are also largely multilingual.

Latin script based languages do not face many serious issues related to the availability of fonts to read content on the variety of devices used by users. But non-Latin languages often face issues with reading content when fonts on the the operating systems are unavailable, out-dated, bug-ridden or aesthetically sub-optimal for reading content.

In some case, non-Latin wiki pages also display banners at the top of the page (Eg: See Malayalam wikipedia) asking users to visit a help page if they encounter problems while reading the text. These help pages contain links to download fonts and installation instruction for different operating systems.

For many languages, if the operating system is not recent, chances are that fonts are not available by default. For example, the fonts for the Odia script are available only in very recent versions of the Windows operating systems. Similarly, new characters are being encoded in Unicode for non-latin languages and they are not available in the fonts that may already exist in the various operating systems (including the newer versions). For example, 5 characters that were encoded for the Malayalam script can be used only in fonts on Windows 7. The same fonts when used in other operating systems or versions may not necessarily provide similar usability for reading content.

Wikimedia attempted to solve this problem in 2011 using the MediaWiki extension Webfonts. In 2013, this extension was deprecated and all its features integrated into the Universal Language Selector extension, which provides additional language features including input methods and language selection.

Besides solving the problem of non-availability of fonts for many languages, this extension provides a way to serve fonts in wikipages for a few other (somewhat interesting) use-cases as well.

The webfonts technology provides an opportunity to have special fonts to resolve accessibility issues related to reading. The OpenDyslexic font is one such special font that makes use of font weights and design to alleviate problems related to reading content. This is particularly useful for readers requiring assistance on account of the medical condition Dyslexia.

Wikipedia's sister project Wikisource attempts to digitize and preserve copyright expired literature. Some of these books require special fonts to convey the content properly. For example, A grammar book in Hebrew want traditional writing style to explain the concepts. Webfonts helps there by specifying and providing exact font required.

A text written in Cuneiform need special font to display it properly.

Challenges
Webfonts technology is becoming more and more popular in web. Google webfonts is one of the famous providers of free webfonts service. But the use case for this technology is still mainly for aesthetic purpose. Non-latin webfonts are not popular as latin script fonts. Google's webfont repository is serving a few non-latin fonts.

To get the desired rendering, serving fonts is not enough. The operating system should be capable of rendering the script using this font. Windows XP Service pack 2 and older versions cannot render Indic complex scripts, no matter whether font is present or not. Android was not capable of rendering any complex scripts before 4.x versions even though native browsers supported webfonts.

Rendering Inconsistency
In addition to issues with varied connections and browsers, it’s also important to consider how fonts are rendered across operating systems.

Mac OS and Windows have fundamentally different philosophies when it comes to rendering text. Mac OS uses anti-aliasing to smooth text, while Windows Vista and 7 use a technology called ClearType to render more detailed text. (Read more about ClearType here.)

Windows XP doesn’t do either (at least by default), and Linux might do any number of things depending on how it’s configured.

Windows has two different rendering modes: the newer DirectWrite and the older GDI. The two modes can produce substantially different results.

IE9 and Firefox use DirectWrite, and they can use hardware acceleration to render pages. If hardware acceleration isn’t available or has been turned off, fonts don’t look quite as smooth in these browsers.

Chrome uses GDI, and Firefox 7 implemented a pref that specifies a list of fonts for which GDI rendering will be used at sizes below 16px. By default the list contains fonts such as Arial, Tahoma, Verdana, Trebuchet MS, Segoe UI and Consolas. Downloadable fonts will always use DirectWrite.

Windows ClearType takes advantage of the fact that pixels on LCD screens are comprised of red, green, and blue vertical bars. It treats these bars as narrow pixels, allowing Windows to display more detail in text.

ClearType can make text sharper and more readable but sometimes it creates colored shadow around text, some people complaints it creates headache. There is a "Adjust ClearType Text" tool to configure ClearType for your display. If the software isn’t configured properly, it can actually make text harder to read.

CSS3 has a property called "font-smooth" that is designed to provide an anti-aliasing of web fonts.

As of today. it does not appear to be supported by any of the major modern browsers.

Webfont rendering with windows is a challenge, Google webfonts service also faces the bad rendering issues. Stackoverflow has several questions about bad font rendering by Google chrome even with google webfonts 12

Google webfont users use svg fonts to render whenever possible to avoid dependency on operating system rendering engines. It works well as long as your font is simple, without hints, without opentype tables or instructions. A latin language font (but not all languages) may get correctly rendered using svg fonts. But non-latin language fonts usually need opentype instructions for correct rendering. SVG font format does not support it and basically SVG font for say Hindi means wrong rendering guaranteed.

Impact on bandwidth
The number of glyphs present in non latin fonts poses a challenge for bandwidth usage. While a normal English font can have font size below 100 Kilobytes, A Malayalam script font will have about 1000 glyphs and size above 300 Kilobytes in truetype (.ttf) format.

Jomolhari font for Tibetan script has size 352.4 KB. Akkadian font used for Cuneiform has 772 KB truetype font size. Tuladha Jejeg font of Javanese script is 399 Kilobytes. But some non-latin fonts are very small. Lohit Devanagari font for Devangari script (Hindi, Marathi, Sanskrit languages) is just 37 Kilobytes. The smallest font we have is Saweri font for Buginese languages-3.6 Kilobytes. Shapour font used for Pahlavi script is 4KB.

Small sized fonts are perfect for webfonts. But anything more than say, 100KB has cost on bandwidth. The font is getting downloaded along with the page content. A typical thumb image shown in a wiki page will be around 15-20 KB in size (Image of Barack Obama in his wikipage is 18.6 KB). We can say that the font addition is similar to the image being present in a page. But if the font is bigger, more bandwidth is consumed. There are multiple bandwidth optimizations done- font get downloaded only once when you visit the page for first time, font in webfonts formats like eot or woff is very compressed etc. While font is being downloaded the browsers will try to render the content using a system fallback font and then with the downloaded font when it is ready. This will cause a flash in the webpage often known as FOUT. The visibility of this FOUT is more noticeable when font size is big. With the current browser technology this is unavoidable.

The bandwidth issue get more serious if the wiki page has multiple languages and each uses webfont. A typical wiki page is multi lingual. Every wiki page has a side bar showing the links to articles in other language about the same topic called interwiki links. Wiki page about Obama is present in 205 languages and that many interwiki links exists.

Imagine a wiki page has all languages, and we use our wefonts repository to render these languages, if the language has a default font. The total bandwidth usage is calculated as 3.8 MB. Does such a page exist? Yes, see https://translatewiki.net/wiki/Special:SupportedLanguages. But the strings are all language names. (We are handling this situation in a different way. Read on.)

Selection of fonts
Availability of free licensed fonts for non-latin languages is a challenge. There are lot of languages without any free licensed font. Sometimes the font available will be bad in quality. This does not mean there are good proprietary licensed fonts. Because of this, for some languages we end up with using low quality font since there no other choice. Large scale deployment of these fonts creates lot of users and they report bugs. Inactive upstreams for the fonts is another head ache. Most of them will not be version controlled and there wont be any issue trackers. Sometimes the maintainer or author of the font won't respond to our attempts to communicate.

Because of this, we try to choose fonts with active upstream, version control, issue tracker. But sometimes we end up with no choice.

Sinhala language has only one free licensed font, named LKLUG font- A font by Lanka Linux Users group. When we enquired whether that font is actively maintained, the answer was no. GNU Freefont project's FreeSerif font has Sinhala glyphs which are copied from LKLUG font, but not updated for a long time. Srilankan government recently released some Sinhala fonts, but unfortunately the licensing conditions prevent us from using it.

Thaana language in Divehi script did not had any fonts. GNU FreeSans had glyphs for this script, but FreeSans as such cannot be used as a webfont since it is a large font covering lot of scrips. With the help of GNU Freefont project, these glyphs were extracted to create a new font for Thaana named Free Thaana and 1 2.

Tamil language (Also Tamil script) had only one free licensed font named Lohit Tamil from Red Hat, but Tamil community did not want it as default font for Tamil because of bad quality in rendering. So it is kept as optional font. Lohit Tamil is also distributed by Google as part of Google webfonts.

Fonts were added to the font repository based on bug reports we received in Wikimedia bugzilla. There were some additions based on community requests by mailing lists, hackathons, talk page requests etc.

We include the bug reports or request sources for font addition in source code so that we cna trace why a particular font was added and when.

Statistics about the fonts in the repository
Number of languages supported by ULS for webfonts: 115

Number of fonts: 82

Number of languages having 'system font' as default font : 49

For these languages, unless user use language settings -> font selection, by default there is no font will be embedded. Most of these languages are languages with good support in computers.

36 of these 49 languages also has an optional font OpenDyslexic to assist people with Dylexia issue to read content.

Number of languages with a default webfont : 66 For these languages, there is a default webfont - they get embedded unless user has a preference that prevent it. These default fonts are in general suggested/asked by communities. This keep on changing slowly when community find a better font or they find serious issues with default fonts. Also, for languages with bigger fonts as default, we need to talk with communities to make it non-default if possible.

Out of these 66, 23 are Tibetan script using Jomolhari font with size 352.4 KB -There was a request to support all languages with this script.

Out of 66, only 30 languages are known to MW (as per Names.php). Rest of them can appear in random wiki articles.

Average font size of this 30 languages: 153KB Smallest: 3.6 KB Biggest: 772 KB: Akkadian for akk

Table of languages and default fonts
And if around 450 languages appear in a page (as language selector or similar cases) total font size : 50KB Autonym font.

You can browse all the fonts included.

How webfonts are applied?
Webfonts are applied using a jquery library -jquery.webfonts. jQuery webfonts is configured with the MW webfonts repository.



On page load, jquery.webfonts starts its job. The plugin is applied to the body. A webfont applicable to the body is chosen and applied. jquery.webfonts start looking for html elements with lang attribute. If the elements has a language attribute and for that language if there is a font applicable, font is applied to that element. The css font-family css attribute of the elements are also checked. If the font family contains a font, and it is present in the webfonts repository, that font should be deliverd to the client. That much is the high level working of jquery.webfonts. Now to the details:

Identifying which font to be applied
By specifying font-family

Inside the wiki text YourText, webfonts extension will check whether the font is available, if so it will download it to the client. So the reader will not face any difficulty in reading the text even if the font specified is not available in their computer.

By specifying language

Inside the wiki text YourText, we will check whether any font is available for the given language with the extension, if so it will download it to the client. If there are multiple fonts for the language, the default font will be used. If default font is not preferred, use the font-family approach to specify the font. If the tag has both lang and font-family definitions, font-family get precedence.

User preferences
For every language, users can choose a webfont from the available fonts or can choose to opt out-ie donot use webfonts for the language. The default font for a language is based on the user community requests. Depending on the language support available in operating systems, some communities do not demand default webfonts. For them system fonts (no webfont) will be the default preference. For some other languages, user community demanded setting a specific font as default font to be applied. The list of languages with a non-system default font is given above.

The user preferences are slightly complicated than we imagine. These font preferences are provided using a cog icon in every page or as part of the language selector at the top of the page. But there is another font selector in MediaWiki preferences to select an edit area font. That is for the edit text area and the options include monospace, sans serif etc. We want to respect that choice too. So before applying the webfonts to an edit area, check for the users preferences from "Preferences" page of wiki. That preference get precedence. But it does not stop there. Browsers apply monospace fonts to the editable fields for some languages without any preference from users. Both Google chrome and Firefox applies monospace font to edit area if the calculated language is a monospace-applicable-language. Monospaced letters make sense only for a small numer of languages. Monospace does not make sense for Arabic or Indian languages. We need to respect this behavior from browsers too. Our code should not override this behavior, if so users will be annoyed to see a different behavior they used to see.

More complications
Font inheritance

In a multi language web page, if we apply a font to a top element associated with a language, as per usual css inheritance behavior it get applied to the child nodes. What happens if one of the child node language differs? For that language, there may be a font to be applied or no font applicable. If there is a different font to be applied nothing much to worry, because applying that font breaks the inheritance. But if there is no font to be applied, the parent font get inherited to this element. That is not desired since the child element language does not have a font as per preferences or as per the repository. But how the font will get inherited if the child node is different language? This happens only when the parent and child share same script.

To illustrate this with an example, consider a Hindi wiki page with a paragraph in Marathi. Both languages are written using Devanagari script, they differ by a few alphabets. If Marathi paragraph did not define any font, Hindi font will be applied for Marathi paragraph. It is not disastrous, but still a bug. We have to break the inheritance at the level of Marathi paragraph and avoid this being happening. Explicitly define a font family for that paragraph is the way.

Avoiding too much inline css

If we are applying webfonts to all elements with lang attribute, there will be many in a DOM. If the child will inherit font from parent, no need to explicitly define a style for child element. Figuring out this while parsing the body is also important.

Font detection
Webfont generates css that use local (fontFamilyName) to make sure fonts are not downloaded if a font with same fontfamily name exist in local computer.

That means, if we have FontA for LangA as default, Applying FontA to that element does not mean that it always get downloaded to clients machine. It get downloaded only when client computer does not have FontA.

So algorithms to detect whether a font is available in local machine does not help us here. Such algorithms helps to programmatically decide a aesthetically good font stack for a target platform. (Eg and read the comments)

So it is quite possible that users computer has a font for a particular language, but still the webfont system download another font from server to render text in that language. But most of the cases this is a users choice. There are a small set of languages to which this download happens without users preference. They are the languages with a default font. The list of that languages is given above. Each of those language is configured to use a default font because of a user community request. So in one sense, it is again a users choice.

It is feasible to detect tofu (It is a glyph inside fonts with the glyph name .notdef as per Opentype specification, shown when a requested glyph not present in font) for some languages. But that alone does not detect whether a user will be able to read the content without issues as explained in the beginning of the document.

Even for the languages with default fonts, we experienced significant amount requests to our servers for fonts. This urged us to further tune our algorithm to incorporate the above mentioned tofu detection algorithm. Current version of ULS webfonts has tofu detection in place. Note that detection is only for languages having default fonts.

A generic tofu detection algorithm that works for all languages is a complex challenge. We are doing some experiments on that aspect. A special blank font named Tofu is being attempted by Behdad Esfahbod with a few bytes size. But the technology used for that is very advanced and current browsers does not support.

Font fallback
It is quite possible that the text to which the font is applied will have characters outside the font. It is a common practice to add fallback fonts to the fontfamily attribute. We add sans-serif as the fallback font.

But, IE 6 is not capable of doing font-fallback properly. Because of this, if the page contains text within elements having no lang information, squares are getting displayed. The extension is disabled in IE 6 browser.

Browser compatibility
Internet explorer 6 and 7 is blacklisted.

Notable browser bugs
Google chrome (chromium too) upto version 15 has a bug that cause some complex script webfont being rejected by the browser. This issue is observed only for Meera font of Malayalam. The OTS tool does not show any errors in the font. But the issue is not present in Chrome 17. See Bug report.

In Mac OS X 10.7, Opera browser is not capable of rendering complex scripts like Malayalam. See our detailed test report about various OS-browser combinations. Also see Bug 31823 - Opera 11.51 does not display saz font.

OSX 10.7.2/Opera 11.60 has no fallback for Latin characters.

Font formats
To convert the truetype or opentype fonts to WOFF and EOT formats, google's sfntly tool is used. For EOT format, for optimal compression, MicroTypeExpress format is used.Since MicroType Express gives an approximate 15% gain in compression over gzip, a web font service striving for maximum performance can benefit from implementing EOT compression.

For some languages, the font size can be very big, with careful testing, we can remove all hinting instructions to reduce the size.

Caching configuration
To ensure that the web fonts files are cached on the clients' machines, font file types must be added to the web server configuration. In Apache2 this consists of:


 * Adding font file extensions to the FileTimes regex at FilesMatch for the relevant directory, example:
 * Adding ExpiresByType values to the relevant MIME types, similarly to image MIME types. Note that there's no standard MIME type for TTF. application/x-font-ttf is used for Wikimedia.
 * Adding the MIME types:

AddType application/x-woff .woff AddType application/vnd.ms-fontobject .eot AddType application/x-font-ttf .ttf For a full example see the caching configuration update done for the Wikimedia cluster.



Browser Cache
If the font is available in user's local system, font will not be downloaded from the mediawiki server. It will be taken from the user's computer. Otherwise, font will be downloaded from the server only once. ie when the user selects the font first time. Next time on wards, font will be taken from the local cache.

Autonym Font
When an article on Wikipedia is available in multiple languages, we see the list of the languages in a column on one side of the page. The names on the list are written in the script that the language uses (Also know as language autonym). Which also means that all the appropriate fonts were needed to be present for the autonyms to be displayed. For instance, an article like the one about the Nobel Prize is written in more than 125 languages and needs around 35 fonts to display the names of the the languages. Initially this was handled by the native fonts available on the reader's device. If a font did not present, the user would see square boxes (known as tofu) instead of the name of a language. To workaround this problem, not just for the language list, but for other section in the content area as well, the Universal Language Selector (ULS) provides a set of webfonts that were loaded with the page. While this ensured that more language texts were displayed, a serious problem was also encountered. The presence of the additional fonts added an overload on the page, and users saw pages loading much slower than before. To improve client-side performance, webfonts are not applied to the interlanguage links area.

Not using the webfonts for the lnterlanguage links seemed like the easiest solution, but it also takes us back to the sup-optimal multilingual experience that we were trying to solve in the first place. Articles may be perfectly displayed thanks to web fonts, but If a link is not displayed in the language list, many users will not be able to discover that there is a version of the article in their language. The autonyms were not just required for the Interlanguage links. They were also required for the Language Search and Selection window of the Universal Language Selector, which allows users to find their language if they are on a wiki displaying content in a script unfamiliar to them.

Autonym font tries to solve this. The font contains glyphs and opentype rules for rendering the language autonyms. And it contains only those glyphs for a language. For example, for Thai, the font has glyphs required for rendering ไทย alone.

Font available at

In MediaWiki, add a class 'autonym' to the elements containing language autonyms. The font will be applied automatically from ULS extension.

WebFonts and FOUT
One unavoidable consequence of using webfonts is flash of unstyled text (FOUT). Most webfonts don't take long to load, but some do.. Browser behavior differs while waiting for the font.


 * 1) Use a system font immediately while the webfont loads. When it does, change the font (Firefox behavior- till 2011)
 * 2) Keep the text invisible until 3 seconds, then fallback to a system font. When the webfont finally loads (e.g. at 6 seconds) change the text to use it. (current Firefox behavior).
 * 3) Keep the text invisible until the webfont is ready (current Safari and Chrome behavior)

From Chrome 35 onwards, Chrome is going to follow Firefox behavior.

Some developers use WebFont Loader JS or Typekit in order to have control over this behavior. But All of the above options are disruptive to the users. Wikimedia webfonts implementation does not do anything special to address this rather than depending on browsers behavior.

Font load events are coming to browsers and provide great insight over when fonts are loaded, allowing you to customize behavior without depending on any heavy javascript libraries.

Google chrome team is working on improving the situation by introducing WOFF2 to further reduce font size, performance improvements and font load events

See a discussion initiated by Paul Irish on this topic

Client side performance optimization
Resource loader refactoring to optimally load only the required javascript is done to reduce the javascript payload. Optimization on the DOM scan during page-ready time is elaborated at https://etherpad.wikimedia.org/p/VtkX1MynVt.

I have the font for my language in my computer. Can I disable the feature using preferences?
Yes, you can set system font as the preference for your language.

If the font provided by Webfonts is the same font you have in your computer, effectively webfonts will not be applied for that language. In this case a user need not bother about selecting font preferences.

Can't the webfonts system detect whether a local computer has font for a language and download fonts?
It is possible to detect whether a specific font X is present in the computer. We must know the name of the font to do that test. It is not a straight forward test. We need to create two temporary content elements in the page. Render the content with a fallback font like sans-serif and another with the specific font. If the width and height of the rendered element matches, one can assume that same font was used to render and that happens when fallback font is used instead of specific font. The sample text chosen should be a text that the specific font supports. So if we are testing Hindi, the text should be in Hindi. And the characters chosen should be such that different fonts render them with differing dimensions. This becomes quite complex and non-scalable solution if we are talking about 100s of languages. Knowing all possible font names for a language is also difficult.

It may be possible to detect tofu for a given language, but still not foolproof since not having tofu does not mean that the system has usable fonts for a language.

So we leave the decision of whether a default font to be applied for a language or not to the user community consensus.