Requests for comment/Hierator
Approved in RFC meeting -- Tim Starling (WMF) (talk) 21:27, 7 January 2015 (UTC)
WikiHiero, our implementation of Ancient Egyptian hieroglyphic markup, is aging and has reached the point where you can't squeeze more features from its HTML+PNG based output. It also supports only a small subset of Manuel de Codage syntax. I propose to replace WikiHiero's HTML renderer with SVGs generated by a service powered by JSesh, a FLOSS Egyptian hieroglyphics editor.
SVGs will be rendered using Hierator, a Java-based rendering service I wrote that uses JSesh libraries for rendering.
- JSesh is the most feature-complete free solution with almost full Manuel de Codage syntax support. Much fuller than WikiHiero at present.
- Because WikiHiero's present hieroglyph images were taken from an early version of JSesh, the upgrade will preserve a particular hieroglyph style people are used to.
- The service will also minify the SVGs (I'm experimenting with a simple reduction of floating point digits in curves which seems to be working for this particular use case so far, so I'm reluctant to bring in a full-scale SVG minifier so far).
- The resulting SVGs will be stored by MediaWiki in Swift. PNG thumbnails (needed as a fallback for ancient browsers) will be generated by MediaWiki and also stored in Swift.
- JSesh can output PNGs too, but it has a limit of 2000x2000 pixels and the process is much slower.
- The predicted load on this service will be pretty low, as there's less than 5k distinct hieroglyphic texts in enwiki dump. The hardware requirements therefore will also be low, with just 2 servers for redundancy. Because they will be mostly idling, sharing would be perfectly fine. Hierator is a Java servlet, so it can work just by being given its own subdirectory on a standard Java appserver such as Jetty or Tomcat.
- Do we already have machines that can be shared with it?
Comparison with other options
- Use Unicode Egyptian characters
- Almost nobody has these fonts, and the fonts are too big to be suitable for web fonts. Also, in addition to displaying glyphs, they need to be combined in a few ways, for example lile this:
, enclosed in cartouches
and so on. Some of this stuff e.g. rotation with CSS while keeping its bounding box in mind, is unusably buggy in modern browsers.
- More package dependencies, less mature, inherently less secure because you'll need to sanitize input from TeX injections.
- SVG rendering itself is quite fast.
- WikiHiero will normalize texts sent to Hierator to improve cache hit ratio. This is also needed because some of existing texts were written for WikiHiero's more relaxed parsing rules and thus need to be tightened before passing them to JSesh.
- Overall, our current PHP-based parser is quite fast, so to match it I intend to cache the rendered HTML for a given hieroglyphic text in memcached to avoid spending time on tokenization and checking for SVG existence in file repo even in best case.
- I considered using RESTBase for this, however it's not ready yet and we still need to store the files in a world readable file repo.
- JSesh has no Debian (or any other) packages. And since it's a GUI app, packaging it properly might be harder than simply putting the required JARs somewhere into classpath.
- We could use Trebuchet for deployment, however unlike the tiny Hierator, JSesh is over 100 megabytes (due to resources with all those hieroglyph images), so we return to that old discussion about deployment of large binary files:)
Where is development at:
- The initial version of Hierator is complete, I still need to figure out the proper build system for it though.
- The PHP side of things is in early stages, current snapshot is here.