Proton/ja

Proton を使う利用者はウィキペディアの記事を PDF 化してダウンロードできます. デスクトップとモバイルビューと、出力はどちらにも対応します.

技術的な詳細
Proton は構造が単純なサービスで、PDF化に Puppeteer ライブラリ駆動の Chromium を使います. 構成要素は2件です.
 * すべてのリクエストを処理するキューシステム (PDF 作成のジョブはリソースと使用時間の両方を消費するため)
 * レンダラコードにより、Puppeteer にリクエストされたページの PDF 化を指示.

Proton はウェブサービスとして提供中で記述は JavaScript、Node.js を利用します. 見た目が整って読みやすい書式の PDF化を目指しました. ウィキメディアのウィキ群では Proton のプロキシは RESTBase 後方に置く予定です. ライブラリは[$2 puppeteer-core] を使用、chromium ブラウザ機能を同梱しないので別途、ダウンロードしてください. chromium を実行可能に指定するため 環境変数を使用しています. It is intended to provide beautiful and clean PDFs. On Wikimedia wikis, Proton will be proxied behind RESTBase. It uses the puppeteer-core library, chromium browser is not bundled with puppeteer-core and it has to be downloaded separately. The  environment variable is used to point to chromium executable.

The best way to generate the Article PDF is to use browser built-in to PDF functionality. That method provides the best results and additionally allows us to reuse the existing print styles available for both Desktop and Mobile versions of Wikipedia. The system doesn't post-process the requested HTML. Articles are printed the same way as they appear in print preview in the user browser. The generated PDFs are very similar (if not identical) to what anyone can achieve by using Print to PDF on their Chrome browser. To get best results, Proton disables the JavaScript. It is done to disable all dynamic content transformations, like lazy-loaded images on Mobile pages.

Note: for some users, the PDF they get from browser print and the one they get from Proton service might differ a bit as fonts configuration on user system can have specific settings related to fonts hinting/kerning.

QueueSystem
The Queue system is the heart of Proton renderer. It handles the flow of each job through waiting/processing/timeout logic. Each job in the queue can have two states - waiting and processing. The queue system not only allows a specific amount of jobs to run at the same time but it also handles job timeouts and job cancellation. Because of the queue complexity, we had to implement the solution that allows us to:


 * limit the number of waiting jobs
 * after a defined amount of seconds reject the waiting job
 * limit the number of rendering jobs (as PDF rendering requires lots of resources)
 * a safety net to reject rendering jobs that takes too much time
 * to save resources, when the request is aborted queue will try to cancel the job, doesn't matter which state the aborted job is (processing/rendering).

The queue system is based on Bluebird promises, and utilizes the cancellation feature (about which see #Known hacks below).

Renderer
The Renderer is a simple facade to access  method from puppeteer library. Renderer is responsible for setting proper chromium environment and browser viewport, requesting the Wikipedia page, calling the  function. Plus it keeps an eye on the browser process. Each render starts new Chromium instance, and after successful render, the chromium process exits. To save resources, and keep our system in good state Renderer asks Chromium to shut down and if because of any reason browser still keeps processing the request it will send the  to browser process to make sure it doesn't use any more CPU nor the memory.

その他の機能
キューがいっぱい、もしくはどこかの段階でタイムアウトしてジョブが完了しない時は、 Proton サービスから Retry-After ヘッダ付きのコード を返します. Retry-After ヘッダはロードバランサに当該の Proton ノードを回避させて現在のジョブ処理が終了します. システムは Retry-After ヘッダを 設定値に設定します. その時点以降、処理ジョブはすべて終了され、システムは新しいジョブに取り掛かる流れです. The Retry-After header instruments load balancer to depool given Proton node so it can finish processing current jobs. System sets Retry-After header to  configuration value. After that time all processing jobs should finish, and the system should be able to pick up new jobs.

既知の問題点
Proton utilizes the BBPromise cancellation feature. Cancellation feature is disabled by default, to enable promise cancellation BBPromise.config has to be called with  flag. The trick is that the BBPromise config has to be set before any promise is created. But because Proton uses the Service-runner, and Service-runner uses BBPromises for everything, even reading configuration files this wasn't easy to implement. The  flag cannot be set in the Proton application, because the Proton code is executed after Service-runner initialization. It also couldn't be defined in config, as Service-runner uses promises when reading the config. In version 2.6.6 of Service-runner introduces use of the  environment variable, which has to be set to truthy value. If the environment variable is not set, Proton initialization will fail with error.

In order to support a wide variety of languages its suggested to install the following fonts in the deployment:


 * fonts-liberation
 * fonts-noto
 * fonts-noto-cjk
 * fonts-noto-cjk-extra
 * fonts-noto-color-emoji
 * fonts-noto-extra
 * fonts-noto-mono
 * fonts-noto-ui-core
 * fonts-noto-ui-extra
 * fonts-noto-unhinted

開発
で開発を行います. コードレビューの場所は Gerrit です. アカウント登録は を参照してください. このサービスはプロジェクト テンプレートの を使用し、サービス開発規約すべてに準拠します. Code review happens in Gerrit. See Gerrit/Getting started to set up an account for yourself. Service uses the ServiceTemplateNode project template and follows all Service development rules.

テストを実行するには
To run all swagger tests and mocha tests:

npm test

To run all coverage test:

npm run coverage

技術的な説明文書

 * README.MD には Proton 内部変数と設定変数の説明を記載.


 * ServiceTemplateNode/Deployment Proton サービスの実装手順
 * Reading/Web/Projects/Print_Styles 出力書式とできることの限界は Browser 印刷書式と同一.

Proton 開発者向けリンク

 * Proton Github リポジトリ
 * Puppeteer documentation
 * BBPromise 説明文書

関連項目

 * RESTBase: Proton を使って作成する PDF 向けのキャッシュ / 保存 API プロキシ
 * Proton: 状況の追跡、実装、データフローの詳細
 * Proton: 状況の追跡、実装、データフローの詳細

お問い合わせ
ヘルプが必要なとき、ご質問やフィードバックは、チャットルーム または wikitech-l メーリングリストにお問い合わせください.