Extension:Chr2syl

Chr2syl is an extension written by Jeff Merkey that enables the generation of unicode symbols for the Sequoyah Syllabary for the Cherokee Language when Cherokee words are entered in simple text phonetics with a set of parser tags. This extension also will convert Unicode characters in the Sequoyah Syllabary back into English text phonetics for ease of editing in Cherokee. This extension enables the ability to write in Cherokee without requiring special keyboard maps or software to author Cherokee content in the Syllabary.

Syntax
chr2syl uses Cherokee text phonetics tags that contain any collection of cherokee or english words which is then parsed and output in the Syllabary. The extension verifies that the words adhere to proper constructs for Cherokee nouns, verb roots and verb stems, then converts conforming words into the syllabary.

chr2html uses Cherokee text phonetics tags that contain any collection of cherokee or english words which is then parsed and output in the Syllabary. The extension verifies that the words adhere to proper constructs for Cherokee nouns, verb roots and verb stems, then converts conforming words into the syllabary, however, the output is in raw HTML unicode strings and this extension is solely for Cherokee raw HTML coding in MediaWiki.

syl2chr uses Cherokee Unicode Characters tags that contain any collection of Syllabary Unicode strings. The extension converts Cherokee Unicode back into simple text phonetics.

Sample output
These input strings constructed using chr2syl:

gohi iga osda (have a good day) osiyo         (hello) dohitsu       (I am fine, how are you?) vv            (yes) waya          (wolf) selu          (corn) gega          (I am going now) kane isdi gowelvi (I am speaking words from Wikipedia) I want to also test english text.

ᎪᎯ ᎢᎦ ᎣᏍᏓ ᎣᏏᏲ ᏙᎯᏧ ᎥᎥ ᏩᏯ ᏎᎷ ᎨᎦ ᎧᏁ ᎢᏍᏗ ᎪᏪᎸᎢ I want to also test english text.

gohi iga osda osiyo dohitsu vv waya selu gega kane isdi gowelvi I want to also test english text.

Produces the following Output:

ᎪᎯ ᎢᎦ ᎣᏍᏓ ᎣᏏᏲ ᏙᎯᏧ ᎥᎥ ᏩᏯ ᏎᎷ ᎨᎦ ᎧᏁ ᎢᏍᏗ ᎪᏪᎸᎢ I want to also test english text.

gohi iga osda osiyo dohitsu vv waya selu gega kane isdi gowelvi I want to also test english text.

&#5034&#5039 &#5026&#5030 &#5027&#5069&#5075 &#5027&#5071&#5106 &#5081&#5039&#5095 &#5029&#5029 &#5097&#5103 &#5070&#5047 &#5032&#5030 &#5031&#5057 &#5026&#5069&#5079 &#5034&#5098&#5048&#5026 I want to also test english text.

Modern Otali (Oklahoma) Syllabary Mappings for drifted language variants
Following are Syllabary maps which map drifted Otali Dialect Language Characters and syllables back into the Sequoyah Syllabary. Drifted Otali words which use the following constructs should be mapped to the Sequoyah Syllabary characters in this table.


 * nah-32:(0) Ꮐ (nah)
 * hna-31:(1) Ꮏ (hna)
 * qua-38:(2) Ꮖ (qua)
 * que-39:(3) Ꮗ (que)
 * qui-40:(4) Ꮘ (qui)
 * quo-41:(5) Ꮙ (quo)
 * quu-42:(6) Ꮚ (quu)
 * quv-43:(7) Ꮛ (quv)
 * dla-60:(8) Ꮬ (dla)
 * tla-61:(9) Ꮭ (tla)
 * tle-62:(10) Ꮮ (tle)
 * tli-63:(11) Ꮯ (tli)
 * tlo-64:(12) Ꮰ (tlo)
 * tlu-65:(13) Ꮱ (tlu)
 * tlv-66:(14) Ꮲ (tlv)
 * tsa-67:(15) Ꮳ (tsa)
 * tse-68:(16) Ꮴ (tse)
 * tsi-69:(17) Ꮵ (tsi)
 * tso-70:(18) Ꮶ (tso)
 * tsu-71:(19) Ꮷ (tsu)
 * tsv-72:(20) Ꮸ (tsv)
 * hah-79:(21) Ꮿ (ya)
 * gwu-11:(22) Ꭻ (gu)
 * gwi-40:(23) Ꮘ (qui)
 * hla-61:(24) Ꮭ (tla)
 * hwa-73:(25) Ꮹ (wa)
 * gwa-38:(26) Ꮖ (qua)
 * hlv-66:(27) Ꮲ (tlv)
 * guh-11:(28) Ꭻ (gu)
 * gwe-39:(29) Ꮗ (que)
 * wah-73:(30) Ꮹ (wa)
 * hnv-37:(31) Ꮕ (nv)
 * teh-54:(32) Ꮦ (te)
 * qwa-6:(33) Ꭶ (ga)
 * yah-79:(34) Ꮿ (ya)
 * na-30:(35) Ꮎ (na)
 * ne-33:(36) Ꮑ (ne)
 * ni-34:(37) Ꮒ (ni)
 * no-35:(38) Ꮓ (no)
 * nu-36:(39) Ꮔ (nu)
 * nv-37:(40) Ꮕ (nv)
 * ga-6:(41) Ꭶ (ga)
 * ka-7:(42) Ꭷ (ka)
 * ge-8:(43) Ꭸ (ge)
 * gi-9:(44) Ꭹ (gi)
 * go-10:(45) Ꭺ (go)
 * gu-11:(46) Ꭻ (gu)
 * gv-12:(47) Ꭼ (gv)
 * ha-13:(48) Ꭽ (ha)
 * he-14:(49) Ꭾ (he)
 * hi-15:(50) Ꭿ (hi)
 * ho-16:(51) Ꮀ (ho)
 * hu-17:(52) Ꮁ (hu)
 * hv-18:(53) Ꮂ (hv)
 * ma-25:(54) Ꮉ (ma)
 * me-26:(55) Ꮊ (me)
 * mi-27:(56) Ꮋ (mi)
 * mo-28:(57) Ꮌ (mo)
 * mu-29:(58) Ꮍ (mu)
 * da-51:(59) Ꮣ (da)
 * ta-52:(60) Ꮤ (ta)
 * de-53:(61) Ꮥ (de)
 * te-54:(62) Ꮦ (te)
 * di-55:(63) Ꮧ (di)
 * ti-56:(64) Ꮨ (ti)
 * do-57:(65) Ꮩ (do)
 * du-58:(66) Ꮪ (du)
 * dv-59:(67) Ꮫ (dv)
 * la-19:(68) Ꮃ (la)
 * le-20:(69) Ꮄ (le)
 * li-21:(70) Ꮅ (li)
 * lo-22:(71) Ꮆ (lo)
 * lu-23:(72) Ꮇ (lu)
 * lv-24:(73) Ꮈ (lv)
 * sa-44:(74) Ꮜ (sa)
 * se-46:(75) Ꮞ (se)
 * si-47:(76) Ꮟ (si)
 * so-48:(77) Ꮠ (so)
 * su-49:(78) Ꮡ (su)
 * sv-50:(79) Ꮢ (sv)
 * wa-73:(80) Ꮹ (wa)
 * we-74:(81) Ꮺ (we)
 * wi-75:(82) Ꮻ (wi)
 * wo-76:(83) Ꮼ (wo)
 * wu-77:(84) Ꮽ (wu)
 * wv-78:(85) Ꮾ (wv)
 * ya-79:(86) Ꮿ (ya)
 * ye-80:(87) Ᏸ (ye)
 * yi-81:(88) Ᏹ (yi)
 * yo-82:(89) Ᏺ (yo)
 * yu-83:(90) Ᏻ (yu)
 * yv-84:(91) Ᏼ (yv)
 * to-57:(92) Ꮩ (do)
 * tu-58:(93) Ꮪ (du)
 * ko-10:(94) Ꭺ (go)
 * tv-59:(95) Ꮫ (dv)
 * qa-73:(96) Ꮹ (wa)
 * ke-7:(97) Ꭷ (ka)
 * kv-12:(98) Ꭼ (gv)
 * ah-0:(99) Ꭰ (a)
 * qo-10:(100) Ꭺ (go)
 * oh-3:(101) Ꭳ (o)
 * ju-71:(102) Ꮷ (tsu)
 * ji-69:(103) Ꮵ (tsi)
 * ja-67:(104) Ꮳ (tsa)
 * je-68:(105) Ꮴ (tse)
 * jo-70:(106) Ꮶ (tso)
 * jv-72:(107) Ꮸ (tsv)
 * a-0:(108) Ꭰ (a)
 * e-1:(109) Ꭱ (e)
 * i-2:(110) Ꭲ (i)
 * o-3:(111) Ꭳ (o)
 * u-4:(112) Ꭴ (u)
 * v-5:(113) Ꭵ (v)
 * s-45:(114) Ꮝ (s)
 * n-30:(115) Ꮎ (na)
 * l-2:(116) Ꭲ (i)
 * t-52:(117) Ꮤ (ta)
 * d-55:(118) Ꮧ (di)
 * y-80:(119) Ᏸ (ye)
 * k-6:(120) Ꭶ (ga)
 * g-6:(121) Ꭶ (ga)

Installation
Place  inside LocalSettings.php. You also need the chr2syl program code (tar.gz) for high performance conversion. The source code for the chr2syl program is released under GPLv3. After downloading the tar.gz and building the program, copy it into a directory /chr created under your main MediaWiki base directory ($IP).

Chr2Syl Source Code
//#define WINDOWS


 * 1) define LINUX


 * 1) ifdef WINDOWS


 * 1) define strncasecmp strnicmp

typedef UCHAR BYTE; typedef USHORT WORD;
 * 1) include "windows.h"
 * 2) include "winioctl.h"
 * 3) include "winuser.h"
 * 4) include "stdarg.h"
 * 1) include "stdio.h"
 * 2) include "stdlib.h"
 * 3) include "ctype.h"
 * 4) include "conio.h"


 * 1) endif


 * 1) ifdef LINUX


 * 1) include 
 * 2) include 
 * 3) include 
 * 4) include 
 * 5) include 
 * 6) include 
 * 7) include 
 * 8) include 
 * 9) include 
 * 10) include 
 * 11) include 
 * 12) include 
 * 13) include 
 * 14) include 
 * 15) include 
 * 16) include 
 * 17) include <errno.h>
 * 18) include <stdlib.h>
 * 19) include <string.h>
 * 20) include <unistd.h>
 * 21) include <sched.h>
 * 22) include <ctype.h>


 * 1) endif

typedef struct _CHR_UNI {  unsigned char c1; unsigned char c2; unsigned char c3; } chr_uni;

chr_uni unicode_table[]= {  { 0xE1, 0x8E, 0xA0 }, { 0xE1, 0x8E, 0xA1 }, { 0xE1, 0x8E, 0xA2 }, { 0xE1, 0x8E, 0xA3 }, { 0xE1, 0x8E, 0xA4 }, { 0xE1, 0x8E, 0xA5 }, { 0xE1, 0x8E, 0xA6 }, { 0xE1, 0x8E, 0xA7 }, { 0xE1, 0x8E, 0xA8 }, { 0xE1, 0x8E, 0xA9 }, { 0xE1, 0x8E, 0xAA }, { 0xE1, 0x8E, 0xAB }, { 0xE1, 0x8E, 0xAC }, { 0xE1, 0x8E, 0xAD }, { 0xE1, 0x8E, 0xAE }, { 0xE1, 0x8E, 0xAF }, { 0xE1, 0x8E, 0xB0 }, { 0xE1, 0x8E, 0xB1 }, { 0xE1, 0x8E, 0xB2 }, { 0xE1, 0x8E, 0xB3 }, { 0xE1, 0x8E, 0xB4 }, { 0xE1, 0x8E, 0xB5 }, { 0xE1, 0x8E, 0xB6 }, { 0xE1, 0x8E, 0xB7 }, { 0xE1, 0x8E, 0xB8 }, { 0xE1, 0x8E, 0xB9 }, { 0xE1, 0x8E, 0xBA }, { 0xE1, 0x8E, 0xBB }, { 0xE1, 0x8E, 0xBC }, { 0xE1, 0x8E, 0xBD }, { 0xE1, 0x8E, 0xBE }, { 0xE1, 0x8E, 0xBF }, { 0xE1, 0x8F, 0x80 }, { 0xE1, 0x8F, 0x81 }, { 0xE1, 0x8F, 0x82 }, { 0xE1, 0x8F, 0x83 }, { 0xE1, 0x8F, 0x84 }, { 0xE1, 0x8F, 0x85 }, { 0xE1, 0x8F, 0x86 }, { 0xE1, 0x8F, 0x87 }, { 0xE1, 0x8F, 0x88 }, { 0xE1, 0x8F, 0x89 }, { 0xE1, 0x8F, 0x8A }, { 0xE1, 0x8F, 0x8B }, { 0xE1, 0x8F, 0x8C }, { 0xE1, 0x8F, 0x8D }, { 0xE1, 0x8F, 0x8E }, { 0xE1, 0x8F, 0x8F }, { 0xE1, 0x8F, 0x90 }, { 0xE1, 0x8F, 0x91 }, { 0xE1, 0x8F, 0x92 }, { 0xE1, 0x8F, 0x93 }, { 0xE1, 0x8F, 0x94 }, { 0xE1, 0x8F, 0x95 }, { 0xE1, 0x8F, 0x96 }, { 0xE1, 0x8F, 0x97 }, { 0xE1, 0x8F, 0x98 }, { 0xE1, 0x8F, 0x99 }, { 0xE1, 0x8F, 0x9A }, { 0xE1, 0x8F, 0x9B }, { 0xE1, 0x8F, 0x9C }, { 0xE1, 0x8F, 0x9D }, { 0xE1, 0x8F, 0x9E }, { 0xE1, 0x8F, 0x9F }, { 0xE1, 0x8F, 0xA0 }, { 0xE1, 0x8F, 0xA1 }, { 0xE1, 0x8F, 0xA2 }, { 0xE1, 0x8F, 0xA3 }, { 0xE1, 0x8F, 0xA4 }, { 0xE1, 0x8F, 0xA5 }, { 0xE1, 0x8F, 0xA6 }, { 0xE1, 0x8F, 0xA7 }, { 0xE1, 0x8F, 0xA8 }, { 0xE1, 0x8F, 0xA9 }, { 0xE1, 0x8F, 0xAA }, { 0xE1, 0x8F, 0xAB }, { 0xE1, 0x8F, 0xAC }, { 0xE1, 0x8F, 0xAD }, { 0xE1, 0x8F, 0xAE }, { 0xE1, 0x8F, 0xAF }, { 0xE1, 0x8F, 0xB0 }, { 0xE1, 0x8F, 0xB1 }, { 0xE1, 0x8F, 0xB2 }, { 0xE1, 0x8F, 0xB3 }, { 0xE1, 0x8F, 0xB4 }, };

char *phonetic_syl[]= {  "a", "e", "i", "o", "u", "v", "ga", "ka", "ge", "gi", "go", "gu", "gv", "ha", "he", "hi", "ho", "hu", "hv", "la", "le", "li", "lo", "lu", "lv", "ma", "me", "mi", "mo", "mu", "na", "hna", "nah", "ne", "ni", "no", "nu", "nv", "qua", "que", "qui", "quo", "quu", "quv", "sa", "s", "se", "si", "so", "su", "sv", "da", "ta", "de", "te", "di", "ti", "do", "du", "dv", "dla", "tla", "tle", "tli", "tlo", "tlu", "tlv", "tsa", "tse", "tsi", "tso", "tsu", "tsv", "wa", "we", "wi", "wo", "wu", "wv", "ya", "ye", "yi", "yo", "yu", "yv", };

char *phonetic_opt[]= {  "na", "hna", "nah", "ne", "ni", "no", "nu", "nv", "qua", "que", "qui", "quo", "quu", "quv", "dla", "tla", "tle", "tli", "tlo", "tlu", "tlv", "tsa", "tse", "tsi", "tso", "tsu", "tsv", "ga", "ka", "ge", "gi", "go", "gu", "gv", "ha", "he", "hi", "ho", "hu", "hv", "ma", "me", "mi", "mo", "mu", "da", "ta", "de", "te", "di", "ti", "do", "du", "dv", "la", "le", "li", "lo", "lu", "lv", "sa", "se", "si", "so", "su", "sv", "wa", "we", "wi", "wo", "wu", "wv", "ya", "ye", "yi", "yo", "yu", "yv", "a", "e", "i", "o", "u", "v", "s", };

char *wikichr[]= {  "&#5024", "&#5025", "&#5026", "&#5027", "&#5028", "&#5029", "&#5030",    "&#5031", "&#5032", "&#5033", "&#5034", "&#5035", "&#5036", "&#5037",    "&#5038", "&#5039", "&#5040", "&#5041", "&#5042", "&#5043", "&#5044",    "&#5045", "&#5046", "&#5047", "&#5048", "&#5049", "&#5050", "&#5051",    "&#5052", "&#5053", "&#5054", "&#5055", "&#5056", "&#5057", "&#5058",    "&#5059", "&#5060", "&#5061", "&#5062", "&#5063", "&#5064", "&#5065",    "&#5066", "&#5067", "&#5068", "&#5069", "&#5070", "&#5071", "&#5072",    "&#5073", "&#5074", "&#5075", "&#5076", "&#5077", "&#5078", "&#5079",    "&#5080", "&#5081", "&#5082", "&#5083", "&#5084", "&#5085", "&#5086",    "&#5087", "&#5088", "&#5089", "&#5090", "&#5091", "&#5092", "&#5093",    "&#5094", "&#5095", "&#5096", "&#5097", "&#5098", "&#5099", "&#5100",    "&#5101", "&#5102", "&#5103", "&#5104", "&#5105", "&#5106", "&#5107",    "&#5108", };

char unicode_map[85][4]; int chr_opt_map[85]; char buffer[8192]; char work[4096];

unsigned char *strtoken(unsigned char **s, const char *ct) {  char *sbegin = *s, *end;

if (sbegin == NULL) return NULL;

end = strpbrk(sbegin, ct); if (end) *end++ = '\0'; *s = end;

return sbegin; }

int get_syl_map(char *syl) {  register int i;

for (i=0; i < 85; i++) {     register int len = strlen(phonetic_syl[i]); register int slen = strlen(syl);

if ((len == slen) && !strncasecmp(syl, phonetic_syl[i], len)) {        printf("%s = %s (%d)\n", syl, phonetic_syl[i], len); return i;     } }  return -1; }
 * 1) if VERBOSE
 * 1) endif

int printph(unsigned char *p) {  register int i;

for (i=0; i < 85; i++) {     if (!memcmp(p, unicode_map[i], 3)) {        printf("%s", phonetic_syl[i]); return 3; }  }   return 0; }

int chr_syllable(char *p, int mkword, int xml_esc) {  register int i;

for (i=0; i < 85; i++) {     register int len = strlen(phonetic_opt[i]);

if (!strncasecmp(p, phonetic_opt[i], len)) {        printf("%s = %s (%d)\n", p, phonetic_opt[i], len); if (mkword) {           register int j = chr_opt_map[i]; if (j < 85) {              if (xml_esc) printf("%s", wikichr[j]); else printf("%s", unicode_map[j]); }        }
 * 1) if VERBOSE
 * 1) endif

return len; }  }   return 0; }

int chr_word(char *p, int *count, int mkword, int xml_esc) {   register int i, len = 0; register char *s = work;

if (count) *count = 0;

if (p) len = strlen(p);

if (!len) return 0;

// 'i' and 'a', which are ambiguous in English, are never used // as single vowel morphemes in the Cherokee language. if (len == 1 && ((tolower(*p) == 'i') || (tolower(*p) == 'a'))) return 0;

work[0] = '\0'; strcpy(work, p); for (i=0; i < 4096; i++) {      if (!work[i]) break;

if ((work[i] == ' ') || (work[i] == '\r') || (work[i] == '\n')) {         work[i] = '\0'; break; }   }    len = strlen(work); if (!len) return 0;

s = work; while (s && *s) {      i = chr_syllable(s, mkword, xml_esc); if (!i) break;

s += i;      if (count) *count += i;   }

return (len == *count); }

int printchr(char *s, int xml_esc) {   int cnt = 0; char *p = s;   register int i, len = 0, count = 0;

if (!chr_word(s, &cnt, 0, 0)) return 0;

if (p) len = strlen(p);

if (!len) return 0;

while (p && *p) {      while ((*p == '(') || (*p == ')')) {         putc(*p++, stdout); count++; }

while (*p == ' ') {         putc(*p++, stdout); count++; }

i = chr_syllable(p, 1, xml_esc); if (!i) break;

p += i;      count += i;    }

if (len != count) printf("%s l:%d c:%d\n", s, len, count);
 * 1) if VERBOSE
 * 1) endif

return (len == count); }

char str[4096 * 4]; char *str_p = NULL; char vstr[4096 * 4]; char *vstr_p = NULL;

int main(int argc, char *argv[]) {   register int i, id = 1, total = 0, xml_esc = 0, string = 0; unsigned char *p, *s; int tl = 0, cs = 0, syl2ph = 0, unicode = 0;

printf("building syllabary maps\n"); for (i=0; i < 85; i++) {      chr_opt_map[i] = get_syl_map(phonetic_opt[i]); printf("%s-%d:", phonetic_opt[i], chr_opt_map[i]); }   printf("\n");
 * 1) if VERBOSE
 * 1) endif
 * 1) if VERBOSE
 * 1) endif
 * 1) if VERBOSE
 * 1) endif

printf("building syllabary unicode map\n"); for (i=0; i < 85; i++) {      unicode_map[i][0] = unicode_table[i].c1; unicode_map[i][1] = unicode_table[i].c2; unicode_map[i][2] = unicode_table[i].c3; unicode_map[i][3] = '\0'; printf("%s:", unicode_map[i]); }   printf("\n");
 * 1) if VERBOSE
 * 1) endif
 * 1) if VERBOSE
 * 1) endif
 * 1) if VERBOSE
 * 1) endif

for (i=0; i < argc; i++) {      if (!strcmp(argv[i], "--help")) {         printf("USAGE:  chr2syl -[unicode|ph|xml] -s str [ out ]\n"); printf("       -unicode - Cherokee Phonetics to Syllabary\n"); printf("       -ph      - Syllabary to Cherokee Phonetics\n"); printf("       -xml     - Output Syllabary in XML Format\n"); printf("       -s       - Use arguments instead of stdin/stdout\n"); exit(0); }

if (!strcmp(argv[i], "-unicode")) {         unicode = 1; id = i + 1; }

if (!strcmp(argv[i], "-ph")) {         syl2ph = 1; id = i + 1; }      if (!strcmp(argv[i], "-xml")) {         xml_esc = 1; id = i + 1; }

if (!strcmp(argv[i], "-s")) {         string = 1; id = i + 1; if (argv[id]) {            s = argv[id]; id = i + 1; }      }    }

if (string) {      if (syl2ph) {         register int j;          unsigned char *ph;

ph = s;         while (1) {            if (*ph && *ph == 0xE1) {               j = printph(ph); if (!j) putc(*ph++, stdout); else ph += j;            } else {               if (*ph) putc(*ph++, stdout);

if (!*ph) break; }         }          return 0; }

str[0] = '\0'; str_p = &str[0]; vstr[0] = '\0'; vstr_p = &vstr[0]; while (1) {         if (*s && (isalpha(*s) || (*s > 127))) {            *str_p++ = tolower(*s); *str_p = '\0';

*vstr_p++ = *s; *vstr_p = '\0'; }         else {            if (!printchr(str, xml_esc)) printf("%s", vstr);

if (*s) putc(*s, stdout);

if (!*s) break;

str[0] = '\0'; str_p = &str[0]; vstr[0] = '\0'; vstr_p = &vstr[0]; }         s++; }      return 0; }

if (syl2ph) {      register int j;       unsigned char *ph;

while (ph = (unsigned char *)fgets(buffer, 8192, stdin)) {         while (*ph) {            if (*ph == 0xE1) {               j = printph(ph); if (!j) putc(*ph++, stdout); else ph += j;            } else putc(*ph++, stdout); }      }       return 0; }

while (s = fgets(buffer, 8192, stdin)) {      while (*s) {         str[0] = '\0'; str_p = &str[0]; vstr[0] = '\0'; vstr_p = &vstr[0]; while (*s) {            if (isalpha(*s) || (*s > 127)) {               *str_p++ = tolower(*s); *vstr_p++ = *s++; }            else {               *str_p = '\0'; *vstr_p = '\0';

if (!printchr(str, xml_esc)) printf("%s", vstr); putc(*s++, stdout); break; }         }       }    }

return 0; }