Extension:UniversalLanguageSelector/Fonts for Chinese wikis

Introduction
Including all Chinese characters makes a webfont file too large. We may want to tailor the font file for every page based on characters used on that page. Once finished, this feature can be applied to other languages facing the same problem, such as Japanese.

As of writing, there isn't any "good" enough free font which includes all Chinese characters in Unicode. And the "wiki" concept itself encourages collaborative content creation, so it would be nice to invite user to create a glyph for it when the system sees a character without existing data.


 * Proposal: Click Here
 * Mentors: DChan,  Liangent
 * Repository: Font Tailor

Milestones

 * May 19: Start coding.
 * Warm up with code and development tool set
 * Clarify what to do next
 * June 22: Mid-term evaluation: Finish the prototype of Font Tailor
 * < - - - We are here
 * July 20: Finish the Font Tailor ( Finish ttf and svg at least )
 * Aug 11: Pencil down: Glyph collector for tofus
 * Aug 22: Final evaluation

Dynamic WebFonts
For standard WebFonts service, a static font file is downloaded. The @font-face rule is like: @font-face { font-family: WenQuanYi; ...   src: url('fontspath/wenquanyi.ttf') format('ttf'), ...; } Now we should return different font which is well tailored to contain all / only the characters in that page. So we change the url to: @font-face { font-family: WenQuanYi; ...   src: url('FontTailor.php?key1=val1&key2=val2...') format('ttf'), ...; } When the page is visited, a font request will be fired towards FontTailor.php. The php will get enough information from the parameters. If a tailored font file exists and is up-to-date, return it by attachment: header( "Content-Type: application/octet-stream" ); header( "Content-Disposition: attachment; filename=\"$wanted_filename\"" ); readfile( $tailored_fontfile ); If no tailored font file is available or it is out dated, the php should generate one.

Font Tailor
The FontTailor.php gets to know three parameters: FontTailor will get the content of the source page, and get the set of characters in use. Then generate a subset of the font file. Currently we use php-font-lib to tailor TTF / WOFF / EOT. And write another SVG tailor by cutting the SVG xml dom tree. But only TTF tailor is tested up to now.
 * 1) Which page is requesting WebFonts? - We can get this in $_SERVER['HTTP_REFERER'] generally.
 * 2) Which font is requested? - Get from url parameters.

Known Issues
Currently I use php's curl to get the page content. I think there may be a way to read the DB directly. Check APIs for it later.
 * Get wiki page content from the database, while not fire a HTTP request

It's strange that the output font file cannot work in WebFonts. But if you read it by another font editor ( FontCreater or FontForge ), and save to another file, it will work. You can find that the two files have some difference. I don't know why, yet. If someone have knowledge on TTF fonts, please take a look:
 * php-font-lib bug

- Output TTF of php-font-lib

- Fixed TTF by FontForge

Current solution is to run another fix function: Open('input.ttf',1) SelectAll Copy Generate('output.ttf') Close It's ugly to call exec in PHP, and it's also ugly to have fontforge required. So I want to fix the problem in php-font-lib if possible.
 * 1) !/usr/bin/env fontforge

July 13
Find a most proper way to trigger font-tailor finally. Hook ArticlePageDataAfter event:
 * Done

- Whether it contains special font-face, such as Chinese Hanzi font

- Whether a tailored font has been there for this article

Take a lot of time to explore different solutions, including reading database during a webfonts request. It's difficult to setup context before calling the components in core. I did it but finally discarded because it's so ugly. I think the current idea is good enough.
 * Issues

Very promising that you'll see a almost finished FontTailor next week.

July 6
Some thoughts on optimization of FontTailor:
 * Done

- Use WikiPage to load page data from database directly, while not curl the url via http. Not done because I cannot get a proper context to call the functions, yet.

- Use charset to locate the tailored font. Just like Git, store data by tracking the content, while not filename. Previously, I locate a font file according to parameters like: title=page_a, time=123456. It will go to FONTNAME_page_a_123456.ttf. In this way, every page and every revision will generate a font file. Now I want to change it to tracking content. Any request with chars "1379aceglo" will go to FONTNAME_`sha1("1379aceglo")`.ttf. Then some page and some revision will share the same tailored font.

- Cannot call functions in /mediawiki/includes because lack of context during webfont request.
 * Issues

- Start working at full speed!

June 29
Find a way to read and parse wiki page characters directly, while not using php_curl. Last week in college, many things to do, including graduation parties :). Will start working full time since next Monday.
 * Doing
 * Issues

June 22 / Midterm Report
Midterm Report

June 15
Looking for a solution of font tailor. Finally decide to use php-font-lib. I tested it with a demo. Look good.
 * Done

Finish the compiler, which direct a static font request to a dynamic api: FontTailor.php

Finish a working demo before the mid-term.
 * To do

June 8
Test the feasibility of dynamic-webfonts.
 * Done

Start working on font tailor.
 * To do

May 25
Nothing, actually. Just as explained last week, I'll defend my master's thesis on May 28, and I'm preparing for it. Try implementing Dynamic Webfont. Another coming event is graduation-travel to northwest China (May 31 ~ June 6). So the next weekly-report will be on June 8.
 * Done
 * To do
 * Issue

After that I will follow a stabler schedule.

Community Bonding Period Report
Pretty good for me. we discuss in wikizh-l. There are the users of my product. They have even given me some technical advice. Finished some design Font Tailor Design
 * Communication
 * Clarify to do next

Will Start coding soon. Both of my mentors will be a little busy during the event, just as I was notified when applying. So I will stay closer with wikizh-l. I have confidence to get the job done.
 * About schedule

And just as I clarified in the proposal, I have to spend some time on my master's thesis and other graduation related affairs in May and June. Full speed will start since July. So maybe you'll see some delay before midterm. I'll explain such issues in weekly-report.

May 18

 * Done
 * Design font tailor
 * To do
 * START coding!
 * Issues
 * I'll defend my master's thesis on May 28. It will take some time to prepare.

May 11

 * Done
 * Re-announce the project on wikizh-l and Distribution list/Global message delivery/zh in Chinese. Discuss with the people there.
 * Reading code: jquery.webfonts, ext.uls.webfonts.js
 * To do
 * Finish designing the font-tailor. Be ready to start coding on May 19

May 4

 * Done
 * Knowledge preparation: Git, JS, jQuery. I use Hg and script languages like Python a lot, so it's easy to get started.
 * Study the tech talk about Unit testing.
 * To do
 * Read ULS docs and code
 * Build up development environment