User:APatro (WMF)/T204568 - Extend message checker framework

From mediawiki.org

Objectives[edit]

Avoid failing builds due to mistakes by translators [1]. To achieve this, project maintainers can add error conditions that will not allow translators to save translations that have errors.

Sub objectives[edit]

  1. Add support for raising errors in the system during translation.
  2. Disallow saving translations that have errors.
  3. Allow administrators and fuzzy bot to override the checks.
  4. Allow project maintainers to specify errors and warnings in groups.yaml.
  5. Maintain backwards compatibility with older YAML file format treating existing checkers as warnings.
  6. Merge insertables and checkers since they are pretty same, $name should be insertable, warn if not used, and usually prevent saving.
  7. Reduce amount of custom code that has to be written currently to implement Insertables.

Validators[edit]

A new MessageValidator framework has been added with the intent of replacing the existing MessageChecker framework. Validators run on the translated message and based on the configuration, a warning or error message is shown to the translator. Translations with warnings can still be saved, but ones that have error cannot. Only a user with translate-manage permission can save translations that have errors.

It is possible to add a regex in the configuration when configuring the Validator, and it can also be made an insertable. Hence when declaring a validator, it is possible to make $name insertable, warn the translator if it is not used, and prevent saving.

Adding a custom validators is still possible and will be needed for more specialized validations.

Configuration[edit]

Following is a summarized validator configuration,

VALIDATORS:
    # Example 1
    - id: InsertableRegex
      enforce: true
      insertable: true
      params: /\$[a-z0-9]+/
    # Example 2
    - id: InsertableRegex
      insertable: true
      params:
          regex: /(?<pre>\[)[^]]+(?<post>\]\([^)]+\))/
          display: $pre $post
          pre: $pre
          post: $post
    # Example 3
    - class: MathJaxMessageValidator
      enforce: true
    # Example 4
    - id: BraceBalance

In the example above,

  1. InsertableRegex is a bundled validator that can accept a custom regex and run validations.
  2. MathJaxMessageValidator is a custom validator class.
  3. BraceBalance is another bundled validator.

VALIDATORS uses an array format. Lets look at the various parameters being used here in each array item,

Parameters[edit]

Property Type Description
id string Incase a bundled / pre-provided validator is being used, the ID of the validator. Required if class is not specified.
class string If a custom validator is being used, then use this option. Specifies the name of the validator class See example #3 in the above config. The AUTOLOAD option can be used to load the class. Required if id is not specified.
enforce bool Whether the validator should be enforced. If true, and a translation fails validation an error will be displayed which must be fixed in order to save the translation.
insertable bool Whether the validator should also be an insertable.
params string / KV collection If params is specified as a string, it is used as the regex. See example #1

In this case if insertable is true,

  1. display is the first value from the regex match.
  2. pre is also the first value from the regex match.
  3. post is left empty.

If params is specified as a collection, see below for further details. See example #2

params.regex string The regex to be used for validator. Must use named captures. When specifying named captures, do not use the $ symbol in the name.

In example #2, two named captures are used - pre and post.

params.display string Mandatory value. The display value for the insertable. Named captures prefixed with $ are used here. See example #2.
params.pre string The pre value for the insertable. Named captures prefixed with $ are used here. If not specified, is set to the display value. See example #2.
params.post string The post value for the insertable. Named captures prefixed with $ are used here. See example #2. If not specified, defaults to an empty string.

Pre-provided / Bundled validators[edit]

Following is a list of bundled validators,

BraceBalance[edit]

ID: BraceBalance

Ensures that the number of open braces / brackets, matches the number of closed braces / brackets in the translation.

For example, the following translations would pass,

  • {{ }}
  • [ ]

whereas, the following would fail,

  • [ ]]
  • {{ }

This validator cannot be marked as insertable.

InsertableRegex[edit]

ID: InsertableRegex

A generic reusable validator that can be used to specify custom validations and insertables.

For example, take the following configuration where the validator is marked as insertable and enforced,

- id: InsertableRegex
  enforce: true
  insertable: true
  params: "/\$[a-z0-9]+/"

Given the following source message - Hello $name. My name is $myName. that is being translated, the translation must have the parameters - $name and $myName. They will also be displayed as insertables to make it easier for translators to use them in the translation. An absence of these parameters will cause an error to be displayed to the translator.

InsertableRubyVariable[edit]

ID: InsertableRubyVariable

This is a validator that matches ruby variables in the translations. Internally it extends InsertableRegexValidator and uses the following regex - %{[a-zA-Z_]+}

MediaWikiMisc[edit]

ID: MediaWikiMisc

Provides validations for expiry options and IP block options specified in the MediaWiki core. These are usually in the format,

indefinite:indefinite,3 hours:3 hours,12 hours:12 hours,24 hours:24 hours,31 hours:31 hours,36 hours:36 hours,48 hours:48 hours,60 hours:60 hours,72 hours:72 hours,1 week:1 week,2 weeks:2 weeks,1 month:1 month,3 months:3 months,6 months:6 months,1 year:1 year,2 years:2 years,3 years:3 years,infinite:indefinite

The validations ensure that the translations have the exact same number of key-value pairs. These validations are run only on messages with keys,

  1. protect-expiry-options
  2. ipboptions

MediaWikiPlural[edit]

ID: MediaWikiPlural

Ensures that if the source / definition contains a {{PLURAL:$1|message|messages}}, the translation should also have it. It can also be used as an insertable.

UI[edit]

The UI has been updated to differentiate between errors and warnings. The icon associated with warning notices has been updated.

During translation, if an error is noticed with the translation, the Save translation button is disabled unless the user who is translating has translate-manage permission.

Additionally validation is also done on the server when the user is saving the translation. This will still allow users who have the translate-manage permission to save the translation even if it has errors.

A bug noticed during development - T220789 - Invalid "more warnings" label shown during message translation has also been fixed along with this patch.

Custom validators[edit]

Custom validators must implement the MediaWiki\Extensions\Translate\MessageValidator\Validator interface [1]. Custom validators can also use the trait MediaWiki\Extensions\Translate\MessageValidator\ValidationHelper [2] that contains some commonly used methods. Below is an example of a custom validator,

use MediaWiki\Extensions\Translate\MessageValidator\Validator;
use MediaWiki\Extensions\Translate\MessageValidator\ValidationHelper;

/**
 * My Custom Validator
 */
class MyCustomValidator implements Validator {
	use ValidationHelper;
	
	public function validate( $messages, $code, array &$notices ) {
	    // Validation code goes here. Push notices into the notices array
	}
}

The format for the $notices array,

// $key is the message key
$notices[$key][] = [
    # check idenfitication
    [ 'printf', $subcheck, $key, $code ],
    # check notice message
    'translate-checks-parameters-unknown',
    # optional special param list, formatted later with Language::commaList()
    [ 'PARAMS', $params ],
    # optional number of params, formatted later with Language::formatNum()
    [ 'COUNT', count( $params ) ] ],
    'Any other parameters to the message',

InsertablesSuggesters[edit]

The YAML structure for InsertablesSuggesters can now be specified as an array. InsertablesSuggesters are used to provide insertables to translators. Many insertables for custom message groups on Translatewiki.net simply use a regular expression to extract parts of text and then use them as Insertables. For this, project maintainers have to create a custom InsertablesSuggesters which involves writing PHP code [3][4].

With the array format in place and with the help of the RegexInsertablesSuggester class it is possible to do this without writing any PHP code.

New configuration format[edit]

Consider the suggester and checker for the MathJax project.

The YAML file configuration for the could be updated to,

TEMPLATE:
    VALIDATORS:
	    # Validation and Insertable - /\$[0-9]+/
        - id: InsertableRegex
          enforce: true # Warning or Error
          insertable: true
          params: /\$[0-9]+/		
    INSERTABLES:
        # \left \begin{$1}
        - class: RegexInsertablesSuggester
          params: /\\[a-z0-9${}]+/
        # <math>
        - class: RegexInsertablesSuggester
          params: /<[a-z]+>/
        # [title]{$1}
        - class: RegexInsertablesSuggester
          params:
              regex: /(?<pre>\[)[^]]+(?<post>\]\([^)]+\))/
              display: "$pre $post"
              pre: $pre
              post: $post

The INSERTABLES can be defined as an array allowing users to specify multiple regular expressions to match insertables in the text. The structure of the params argument is the same as parameters defined under the Validators section.