Extension:UniversalLanguageSelector/Fonts for Chinese wikis

Introduction
Including all Chinese characters makes a webfont file too large. We may want to tailor the font file for every page based on characters used on that page. Once finished, this feature can be applied to other languages facing the same problem, such as Japanese.

As of writing, there isn't any "good" enough free font which includes all Chinese characters in Unicode. And the "wiki" concept itself encourages collaborative content creation, so it would be nice to invite user to create a glyph for it when the system sees a character without existing data.

Proposal
Go to Proposal

Mentors
DChan,  Liangent

Repository
Font Tailor

Tofu Detection

Development_Report
Go to Development_Report

Milestones

 * May 19: Start coding.
 * Warm up with code and development tool set
 * Clarify what to do next
 * June 22: Mid-term evaluation: Finish the prototype of Font Tailor
 * July 20: Finish the Font Tailor ( ttf tailor finished and well tested. svg/woff/eot tailor finished but with no guarantee )
 * Aug 11: Pencil down: Tofu detection with font family settings
 * < - - - We are here
 * Aug 22: Final evaluation: Documents and final-term report

Dynamic WebFonts
For standard WebFonts service, a static font file is downloaded. The @font-face rule is like: @font-face { font-family: WenQuanYi; ...   src: url('fontspath/wenquanyi.ttf') format('ttf'), ...; } Now we should return different font which is well tailored to contain all / only the characters in that page. So we change the url to: @font-face { font-family: WenQuanYi; ...   src: url('FontRequest.php?font=WenQuanYi...') format('ttf'), ...; } When the page is visited, a font request will be fired towards FontTailor.php. The php will get enough information from the parameters. If a tailored font file exists and is up-to-date, return it by attachment: header( "Content-Type: application/octet-stream" ); header( "Content-Disposition: attachment; filename=\"$wanted_filename\"" ); readfile( $tailored_fontfile ); If no tailored font file is available or it is out dated, the php should generate one.

Tailored Font Management
Under the font's path, there are three subtrees:


 * tailored/
 * 02c68248c6b40670c2889218987af948.ttf
 * 9efbe2b03fd390fa3e4bec7d65b36f46.ttf
 * tailored_for_title/
 * Main_Page_17.ttf -> 02c68248c6b40670c2889218987af948.ttf
 * Main_Page_16.ttf -> 02c68248c6b40670c2889218987af948.ttf
 * tailored_for_url/
 * %2Fwiki%2FMain_Page.ttf -> 02c68248c6b40670c2889218987af948.ttf
 * %2Fwiki%2FTest%3Fdebug%3Dtrue.ttf -> 9efbe2b03fd390fa3e4bec7d65b36f46.ttf
 * %2Fwiki%2FMain_Page.ttf -> 02c68248c6b40670c2889218987af948.ttf
 * %2Fwiki%2FTest%3Fdebug%3Dtrue.ttf -> 9efbe2b03fd390fa3e4bec7d65b36f46.ttf

Tree tailored contains all the real tailored font. Every different set of characters maps to a tailored font. e.g. 'abcde' and 'abcdef' map to different files. The file name is the md5 value of the char-sequence.

Tree tailored_for_title and tailored_for_url contain soft links to some tailored font file.

Font Tailor Workflow
// TODO The FontTailor.php gets to know three parameters: FontTailor will get the content of the source page, and get the set of characters in use. Then generate a subset of the font file. Currently we use php-font-lib to tailor TTF / WOFF / EOT. And write another SVG tailor by cutting the SVG xml dom tree. But only TTF tailor is tested up to now.
 * 1) Which page is requesting WebFonts? - We can get this in $_SERVER['HTTP_REFERER'] generally.
 * 2) Which font is requested? - Get from url parameters.

Known Issues
It's strange that the output font file cannot work in WebFonts. But if you read it by another font editor ( FontCreater or FontForge ), and save to another file, it will work. You can find that the two files have some difference. I don't know why, yet. If someone have knowledge on TTF fonts, please take a look:
 * php-font-lib bug

- Output TTF of php-font-lib

- Fixed TTF by FontForge

Current solution is to run another fix function: Open('input.ttf',1) SelectAll Copy Generate('output.ttf') Close It's ugly to call exec in PHP, and it's also ugly to have fontforge required. So I want to fix the problem in php-font-lib if possible.
 * 1) !/usr/bin/env fontforge

// TODO
 * Concurrent Requests

Tofu Detection with FontFamily
If a Chinese character is rendered as a tofu, the reason is obviously that the glyph is not available in the fonts, both from WebFonts service or from the system. According to, the most reliable way to detect a tofu is to compare it's image with the known tofu's image, such as unicode 0x0D00.

However, you cannot do that with a fixed fontFamily like sans-serif, because a WebFonts service may render it properly with the remote fonts. So the current detectTofu method may get some false-positive error. We should detect tofu with it's real fontFamily setting. And tofus are different, too. As you see below:
 * &#x0d00; [sans-serif tofu]
 * &#x0d00; [Linux Libertine tofu]
 * &#x0d00; [宋体 tofu]
 * &#x0d00; [Georgia tofu]

Detect Tofu by Comparing Image
Use HTML5's canvas element to draw each character, and compare with the tofu's image.

It's introduced in another patch from me, see and patch 122277.

Popup to Show Tofu Information
Traverse the DOM tree to find all text nodes, mark them as red, and bind click event to make a popup to show each tofu's information. In the future we can guide them to the font's contribute page or our own glyph-contribution page.