Extension:AntiSpoof/Equivalence sets

This is a copy of equivset.in. Please submit changes on GitHub, not here.

The sections are:


 * equivset 1
 * equivset 2 (CJK ideographs)
 * equivset 3 (Hiragana/Katakana/Hangeul)


 * 1) This is the input file for generateEquivset.php
 * 2) The format is:
 * 3)     => [ ]
 * 4) If the codepoint is given, it must match the character, or else a warning
 * 5) will be issued and the line will be ignored.
 * 6) The effect of such a line is to conflate the two identified character, i.e.
 * 7) to put them in the same set. If two sets share a member, then they will be
 * 8) merged into a single larger set.
 * 9) We have attempted to include the following types of equivalence:
 * 10)    * Case folding. Although letters of different cases are often visually
 * 11)      distinct, they can easily be confused by people who are familiar with
 * 12)      the alphabet. Two words with a different case may be read as the same
 * 13)      word. This is a popular technique for impersonation.
 * 14)    * Visually similar characters. Cross-script pairs are included, but these
 * 15)      tend to produce false conflations within scripts, and so should be
 * 16)      avoided. The software implements a blanket restriction against cross-
 * 17)      script strings, which makes cross-script pairs mostly redundant.
 * 18)    * Chinese Simplified/Traditional pairs.
 * 19) The list is based on one by Neil Harris, which was derived by unknown methods.
 * 20) That list also contained transliteration pairs, which we considered excessive
 * 21) and have attempted to remove. For example, the Latin E and H were considered
 * 22) equivalent, because the Latin transliteration of the Greek "Η" (which
 * 23) looks like Latin H) is "E".
 * 1)    * Chinese Simplified/Traditional pairs.
 * 2) The list is based on one by Neil Harris, which was derived by unknown methods.
 * 3) That list also contained transliteration pairs, which we considered excessive
 * 4) and have attempted to remove. For example, the Latin E and H were considered
 * 5) equivalent, because the Latin transliteration of the Greek "Η" (which
 * 6) looks like Latin H) is "E".
 * 1) looks like Latin H) is "E".