Universal Language Selector/WebFonts

Webfonts - Introduction
Wikimedia wikis are available in around 300 languages. And the content in these wikis is a lot more than that. English wiki receives most of the traffic. But in English wiki also there are wiki pages with content fragments written in non-latin languages. Apart from wikipedia, sister projects like Wiktionary, Wikisource are also largely multilingual.

Latin script based languages does not face any serious issues with the availability of fonts in users computers to read content. But non-latin languages often face issues with reading content when the operating system does not have fonts or the fonts in the operating system is not upto date, bug free or aesthetically good.

Non-latin wiki pages often have a banner at the top of the page(Eg: See Malayalam wikipedia) asking people to visit a help page if they face reading issues. The help page contains download links to fonts and installation instruction for different operating systems.

For many languages, if the operating system is not recent, chances are that fonts are not available by default. For example Odia script fonts are available only in latest version of Windows. A lot of new characters are being encoded in Unicode for non-latin languages, they are not available in the bundled fonts in operating systems, including the new versions. For example 5 characters that were encoded for Malayalam script is not available in Windows except Windows 7. So even if the OS has a font, that does not mean that people can read the content.

Wikimedia attempted to solve this problem in 2011 using a mediawiki extension named Webfonts. In 2013, this extension was deprecated and all its features integrated to Universal Language Selector extension which provides additional language features for example, input methods and language selection.

n addition to solving unavailability of fonts for many languages, the feature provides a way to serve fonts in wikipages for some other interesting use-cases too.

The webfonts technology provides an opportunity to have special fonts to resolve accessibility issues related to Reading. Dyslexic readers can read the content if it is presented with a special font. OpenDyslexic is one such font having special font weights and design to make reading easy for Dyslexic people.

Wikipedia's sister project Wikisource attempts to digitize and preserve copyright expired literature. Some of these books require special fonts to convey the content properly. For example, A grammar book in Hebrew want traditional writing style to explain the concepts. Webfonts helps there by specifying and providing exact font required.

A text written in Cuneiform need special font to display it properly.

Challenges
Webfonts technology is becoming more and more popular in web. Google webfonts is one of the famous providers of free webfonts service. But the use case for this technology is still mainly for aesthetic purpose. Non-latin webfonts are not popular as latin script fonts. Google's webfont repository is serving a few non-latin fonts.

To get the desired rendering, serving fonts is not enough. The operating system should be capable of rendering the script using this font. Windows XP Service pack 2 and older versions cannot render Indic complex scripts, no matter whether font is present or not. Android was not capable of rendering any complex scripts before 4.x versions even though native browsers supported webfonts.

Rendering Inconsistency
In addition to issues with varied connections and browsers, it’s also important to consider how fonts are rendered across operating systems.

Mac OS and Windows have fundamentally different philosophies when it comes to rendering text. Mac OS uses anti-aliasing to smooth text, while Windows Vista and 7 use a technology called ClearType to render more detailed text. (Read more about ClearType here.)

Windows XP doesn’t do either (at least by default), and Linux might do any number of things depending on how it’s configured.

Windows has two different rendering modes: the newer DirectWrite and the older GDI. The two modes can produce substantially different results.

IE9 and Firefox use DirectWrite, and they can use hardware acceleration to render pages. If hardware acceleration isn’t available or has been turned off, fonts don’t look quite as smooth in these browsers.

Chrome uses GDI, and Firefox 7 implemented a pref that specifies a list of fonts for which GDI rendering will be used at sizes below 16px. By default the list contains fonts such as Arial, Tahoma, Verdana, Trebuchet MS, Segoe UI and Consolas. Downloadable fonts will always use DirectWrite.

Windows ClearType takes advantage of the fact that pixels on LCD screens are comprised of red, green, and blue vertical bars. It treats these bars as narrow pixels, allowing Windows to display more detail in text.

ClearType can make text sharper and more readable but sometimes it creates colored shadow around text, some people complaints it creates headache. There is a "Adjust ClearType Text" tool to configure ClearType for your display. If the software isn’t configured properly, it can actually make text harder to read.

CSS3 has a property called "font-smooth" that is designed to provide an anti-aliasing of web fonts.

As of today. it does not appear to be supported by any of the major modern browsers.

Webfont rendering with windows is a challenge, Google webfonts service also faces the bad rendering issues. Stackoverflow has several questions about bad font rendering by Google chrome even with google webfonts 12

Google webfont users use svg fonts to render whenever possible to avoid dependency on operating system rendering engines. It works well as long as your font is simple, without hints, without opentype tables or instructions. A latin language font(but not all languages) may get correctly rendered using svg fonts. But non-latin language fonts usually need opentype instructions for correct rendering. SVG font format does not support it and basically SVG font for say Hindi means wrong rendering guaranteed.

Impact on bandwidth
The number of glyphs present in non latin fonts poses a challenge for bandwidth usage. While a normal English font can have font size below 100 Kilobytes, A Malayalam script font will have about 1000 glyphs and size above 300 Kilobytes in truetype(.ttf) format.

Jomolhari font for Tibetan script has size 352.4 KB. Akkadian font used for Cuneiform has 772 KB truetype font size. Tuladha Jejeg font of Javanese script is 399 Kilobytes. But some non-latin fonts are very small. Lohit Devanagari font for Devangari script(Hindi, Marathi, Sanskrit languages) is just 37 Kilobytes. The smallest font we have is Saweri font for Buginese languages-3.6 Kilobytes. Shapour font used for Pahlavi script is 4KB.

Small sized fonts are perfect for webfonts. But anything more than say, 100KB has cost on bandwidth. The font is getting downloaded along with the page content. A typical thumb image shown in a wiki page will be around 15-20 KB in size(Image of Barack Obama in his wikipage is 18.6 KB). We can say that the font addition is similar to the image being present in a page. But if the font is bigger, more bandwidth is consumed. There are multiple bandwidth optimizations done- font get downloaded only once when you visit the page for first time, font in webfonts formats like eot or woff is very compressed etc. While font is being downloaded the browsers will try to render the content using a system fallback font and then with the downloaded font when it is ready. This will cause a flash in the webpage often known as FOUT. The visibility of this FOUT is more noticeable when font size is big. With the current browser technology this is unavoidable.

The bandwidth issue get more serious if the wiki page has multiple languages and each uses webfont. A typical wiki page is multi lingual. Every wiki page has a side bar showing the links to articles in other language about the same topic called interwiki links. Wiki page about Obama is present in 205 languages and that many interwiki links exists.

Imagine a wiki page has all languages, and we use our wefonts repository to render these languages, if the language has a default font. The total bandwidth usage is calculated as 3.8 MB. Does such a page exist? Yes, see https://translatewiki.net/wiki/Special:SupportedLanguages. But the strings are all language names. (We are handling this situation in a different way. Read on.)

Selection of fonts
Availability of free licensed fonts for non-latin languages is a challenge. There are lot of languages without any free licensed font. Sometimes the font available will be bad in quality. This does not mean there are good proprietary licensed fonts. Because of this, for some languages we end up with using low quality font since there no other choice. Large scale deployment of these fonts creates lot of users and they report bugs. Inactive upstreams for the fonts is another head ache. Most of them will not be version controlled and there wont be any issue trackers. Sometimes the maintainer or author of the font won't respond to our attempts to communicate.

Because of this, we try to choose fonts with active upstream, version control, issue tracker. But sometimes we end up with no choice.

Sinhala language has only one free licensed font, named LKLUG font- A font by Lanka Linux Users group. When we enquired whether that font is actively maintained, the answer was no. GNU Freefont project's FreeSerif font has Sinhala glyphs which are copied from LKLUG font, but not updated for a long time. Srilankan government recently released some Sinhala fonts, but unfortunately the licensing conditions prevent us from using it.

Thaana language in Divehi script did not had any fonts. GNU FreeSans had glyphs for this script, but FreeSans as such cannot be used as a webfont since it is a large font covering lot of scrips. With the help of GNU Freefont project, these glyphs were extracted to create a new font for Thaana named Free Thaana and 1 2.

Tamil language(Also Tamil script) had only one free licensed font named Lohit Tamil from Red Hat, but Tamil community did not want it as default font for Tamil because of bad quality in rendering. So it is kept as optional font. Lohit Tamil is also distributed by Google as part of Google webfonts.

Fonts were added to the font repository based on bug reports we received in Wikimedia bugzilla. There were some additions based on community requests by mailing lists, hackathons, talk page requests etc.

We include the bug reports or request sources for font addition in source code so that we cna trace why a particular font was added and when.

Statistics about the fonts in the repository
Number of languages supported by ULS for webfonts: 115

Number of fonts: 82

Number of languages having 'system font' as default font : 49

For these languages, unless user use language settings -> font selection, by default there is no font will be embedded. Most of these languages are languages with good support in computers.

36 of these 49 languages also has an optional font OpenDyslexic to assist people with Dylexia issue to read content.

Number of languages with a default webfont : 66 For these languages, there is a default webfont - they get embedded unless user has a preference that prevent it. These default fonts are in general suggested/asked by communities. This keep on changing slowly when community find a better font or they find serious issues with default fonts. Also, for languages with bigger fonts as default, we need to talk with communities to make it non-default if possible.

Out of these 66, 23 are Tibetan script using Jomolhari font with size 352.4 KB -There was a request to support all languages with this script.

Out of 66, only 30 languages are known to MW(as per Names.php). Rest of them can appear in random wiki articles.

Average font size of this 30 languages: 153KB Smallest: 3.6 KB Biggest: 772 KB: Akkadian for akk

Table of languages and default fonts
And if around 450 languages appear in a page(as language selector or similar cases) total font size : 50KB Autonym font.

You can browse all the fonts included.

How webfonts are applied?
Webfonts are applied using a jquery library -jquery.webfonts. jQuery webfonts is configured with the MW webfonts repository.

On page load, jquery.webfonts starts its job. The plugin is applied to the body. A webfont applicable to the body is chosen and applied. jquery.webfonts start looking for html elements with lang attribute. If the elements has a language attribute and for that language if there is a font applicable, font is applied to that element. The css font-family css attribute of the elements are also checked. If the font family contains a font, and it is present in the webfonts repository, that font should be deliverd to the client. That much is the high level working of jquery.webfonts. Now to the details:

Identifying which font to be applied
By specifying font-family

Inside the wiki text YourText, webfonts extension will check whether the font is available, if so it will download it to the client. So the reader will not face any difficulty in reading the text even if the font specified is not available in their computer.

By specifying language

Inside the wiki text YourText, we will check whether any font is available for the given language with the extension, if so it will download it to the client. If there are multiple fonts for the language, the default font will be used. If default font is not preferred, use the font-family approach to specify the font. If the tag has both lang and font-family definitions, font-family get precedence.

User preferences
For every language, users can choose a webfont from the available fonts or can choose to opt out-ie donot use webfonts for the language. The default font for a language is based on the user community requests. Depending on the language support availbale in operating systems, some communities do not demand default webfonts. For them system fonts(no webfont) will be the default preference. For some other languages, user community demanded setting a specific font as default font to be applied. The list of languages with a non-system default font is given above.

The user preferences are slightly complicated than we imagine. These font preferences are provided using a cog icon in every page or as part of the language selector at the top of the page. But there is another font selector in MediaWiki preferences to select an edit area font. That is for the edit text area and the options include monospace, sans serif etc. We want to respect that choice too. So before applying the webfonts to an edit area, check for the users preferences from "Preferences" page of wiki. That preference get precedance. But it does not stop there. Browsers apply monospace fonts to the editable fields for some languages without any preference from users. Both Google chrome and Firefox applies monospace font to edit area if the calculated language is a monospace-applicable-language. Monospaced letters make sense only for a small numer of languages. Monospace does not make sense for Arabic or Indian languages. We need to respect this behavior from browsers too. Our code should not override this behavior, if so users will be annoyed to see a different behavior they used to see.

More complications
Font inheritance

In a multi language web page, if we apply a font to a top element associated with a language, as per usual css inheritance behavior it get applied to the child nodes. What happens if one of the child node language differs? For that language, there may be a font to be applied or no font applicable. If there is a different font to be applied nothing much to worry, because applying that font breaks the inheritance. But if there is no font to be applied, the parent font get inherited to this element. That is not desired since the child element language does not have a font as per preferences or as per the repository. But how the font will get inherited if the child node is different language? This happens only when the parent and child share same script.

To illustrate this with an example, consider a Hindi wiki page with a paragraph in Marathi. Both languages are written using Devanagari script, they differ by a few alphabets. If Marathi paragraph did not define any font, Hindi font will be applied for Marathi paragraph. It is not disastrous, but still a bug. We have to break the inheritance at the level of Marathi paragraph and avoid this being happening. Explicitly define a font family for that paragraph is the way.

Avoiding too much inline css

If we are applying webfonts to all elements with lang attribute, there will be many in a DOM. If the child will inherit font from parent, no need to explicitly define a style for child element. Figuring out this while parsing the body is also important.

Font detection
Webfont generates css that use local(fontFamilyName) to make sure fonts are not downloaded if a font with same fontfamily name exist in local computer.

That means, if we have FontA for LangA as default, Applying FontA to that element does not mean that it always get downloaded to clients machine. It get downloaded only when client computer does not have FontA.

So algorithms to detect whether a font is available in local machine does not help us here. Such algorithms helps to programmatically decide a aesthetically good font stack for a target platform.(Eg and read the comments)

So it is quite possible that users computer has a font for a particular language, but still the webfont system download another font from server to render text in that language. But most of the cases this is a users choice. There are a small set of languages to which this download happens without users preference. They are the languages with a default font. The list of that languages is given above. Each of those language is configured to use a default font because of a user community request. So in one sense, it is again a users choice.

It may be feasible to detect tofu(It is a glyph inside fonts with the glyph name .notdef as per Opentype specificaiton, shown when a requested glyph not present in font). But that alone does not detect whether a user will be able to read the content without issues as explained in the beginning of the document.

Font fallback
It is quite possible that the text to which the font is applied will have characters outside the font. It is a common practice to add fallback fonts to the fontfamily attribute. We add sans-serif as the fallback font.

But, IE 6 is not capable of doing font-fallback properly. Because of this, if the page contains text within elements having no lang information, squares are getting displayed. The extension is disabled in IE 6 browser.