User:Santhosh.thottingal/FontDetection

From mediawiki.org

Font detection[edit]

Webfont generates css that use local(fontFamilyName) to make sure fonts are not downloaded if a font with same fontfamily name exist in local computer.

That means, if we have FontA for LangA as default, Applying FontA to that element does not mean that it always get downloaded to clients machine. It get downloaded only when client computer does not have FontA.

So algorithms to detect whether a font is available in local machine does not help us here. Such algorithms helps to programmatically decide a aesthetically good font stack for a target platform.(Eg http://www.lalit.org/lab/javascript-css-font-detect/ and read the comments)

Out intention here is to avoid tofu. ie to detect if there is at least a font that can render a given text in an arbitrary script.

What is tofu?[edit]

It is a glyph inside fonts with the glyph name .notdef as per Opentype specificaiton. http://www.microsoft.com/typography/otspec/recom.htm says "The .notdef glyph is very important for providing the user feedback that a glyph is not found in the font. This glyph should not be left without an outline as the user will only see what looks like a space if a glyph is missing and not be aware of the active font's limitation. It is recommended that the shape of the .notdef glyph be either an empty rectangle, a rectangle with a question mark inside of it, or a rectangle with an “X”. Creative shapes, like swirls or other symbols, may not be recognized by users as indicating that a glyph is missing from the font and is not being displayed at that location."

So, we can redefine the problem as "Given an arbitrary atomic unicode character in arbitrary script, whether the client browser render it as tofu or not?".

Detecting Tofu[edit]

Detect whether the fallback font in client machine render a rectangle with optional question mark or cross inside it.

Complication[edit]

a) There are some fallback fonts that has empty glyph for tofu.

Solution 1[edit]

Put generic font family values in the beginning of the font stack: Eg: font-family: sans-serif, Autonym. Theoretically the glyphs should come from the first font and should fallback to the second one if glyphs not found in the generic sans-serif font.

Observation: generic font families are always applied at the last.

Solution 2[edit]

In a span that is not visible to user, render a character that we are sure no fallback font has the glyph. Aim is to get a tofu rendered. Render the actual text without applying webfont.If the rendering is same as tofu rendering, we are done with tofu detection,.

data:text/html, <html><span lang=en> &#3328 %E4%B8%AD</span>

Observation: does not work. Tofu glyphs are coming from different fonts. One tofu is using ? in it and other does not. Size also varies.

Solution 3[edit]

Render the actual text using sans-serif. Render them by splitting into code points. eg: മലയാളം as മ, ല, യ, ാ , ള, ം. Get the width and height of each code points rendered. If all of them are equal, the hypothesis is all of them are tofu.

http://codepen.io/santhoshtr/pen/BuqwK

Gerrit patch: https://gerrit.wikimedia.org/r/#/c/94613/

Experiments[edit]