Extension:Chr2syl

From MediaWiki.org
Jump to navigation Jump to search
MediaWiki extensions manual
OOjs UI icon advanced.svg
Chr2syl
Release status: stable
Implementation Tag, Locale
Description This extension enables the generation of unicode symbols for the Cherokee language
Author(s) Jeff Merkey
Database changes No
License GNU General Public License 3.0 or later
Download see below
Tags

  • <chr2syl>
  • <syl2chr>
Translate the Chr2syl extension if it is available at translatewiki.net
Check usage and version matrix.

Chr2syl is an extension written by Jeff Merkey that enables the generation of unicode symbols for the Sequoyah Syllabary for the Cherokee Language when Cherokee words are entered in simple text phonetics with a set of parser tags. This extension also will convert Unicode characters in the Sequoyah Syllabary back into English text phonetics for ease of editing in Cherokee. This extension enables the ability to write in Cherokee without requiring special keyboard maps or software to author Cherokee content in the Syllabary.

Syntax[edit]

chr2syl uses <chr2syl>Cherokee text phonetics</chr2syl> tags that contain any collection of cherokee or english words which is then parsed and output in the Syllabary. The extension verifies that the words adhere to proper constructs for Cherokee nouns, verb roots and verb stems, then converts conforming words into the syllabary.

chr2html uses <chr2html>Cherokee text phonetics</chr2html> tags that contain any collection of cherokee or english words which is then parsed and output in the Syllabary. The extension verifies that the words adhere to proper constructs for Cherokee nouns, verb roots and verb stems, then converts conforming words into the syllabary, however, the output is in raw HTML unicode strings and this extension is solely for Cherokee raw HTML coding in MediaWiki.

syl2chr uses <syl2chr>Cherokee Unicode Characters</syl2chr> tags that contain any collection of Syllabary Unicode strings. The extension converts Cherokee Unicode back into simple text phonetics.

Sample output[edit]

These input strings constructed using chr2syl:

<chr2syl>
gohi iga osda  (have a good day)
osiyo          (hello)
dohitsu        (I am fine, how are you?)
vv             (yes)
waya           (wolf)
selu           (corn)
gega           (I am going now)
kane isdi gowelvi  (I am speaking words from Wikipedia)
I want to also test english text.
</chr2syl>
<syl2chr>
ᎪᎯ ᎢᎦ ᎣᏍᏓ
ᎣᏏᏲ
ᏙᎯᏧ
ᎥᎥ
ᏩᏯ
ᏎᎷ
ᎨᎦ
ᎧᏁ ᎢᏍᏗ ᎪᏪᎸᎢ
I want to also test english text.
</syl2chr>
<chr2html>
gohi iga osda
osiyo
dohitsu
vv
waya  
selu
gega
kane isdi gowelvi
I want to also test english text.
</chr2html>


Produces the following Output:


ᎪᎯ ᎢᎦ ᎣᏍᏓ
ᎣᏏᏲ
ᏙᎯᏧ
ᎥᎥ
ᏩᏯ
ᏎᎷ
ᎨᎦ
ᎧᏁ ᎢᏍᏗ ᎪᏪᎸᎢ
I want to also test english text.


gohi iga osda
osiyo
dohitsu
vv
waya
selu
gega
kane isdi gowelvi
I want to also test english text.


&#5034&#5039 &#5026&#5030 &#5027&#5069&#5075
&#5027&#5071&#5106
&#5081&#5039&#5095
&#5029&#5029
&#5097&#5103
&#5070&#5047
&#5032&#5030
&#5031&#5057 &#5026&#5069&#5079 &#5034&#5098&#5048&#5026
I want to also test english text.

Extension Source Code[edit]

 <?php
 if ( ! defined( 'MEDIAWIKI' ) ) die();
 /**#@+
  * A parser extension that adds tags, <chr2syl>, <syl2chr>, and <chr2html> 
  * for converting Cherokee text phonetics to and from the Sequoyah Syllabary.
  *
  * @package MediaWiki
  * @subpackage Extensions
  *
  * @author Jeff V. Merkey (jmerkey@wolfmountaingroup.com)
  * @copyright Copyright © 2006, WolfMountainGroup, L.L.C.
  * @license http://www.gnu.org/copyleft/gpl.html GNU General Public License 3.0 or later
  */
 //Add the hook function call to an array defined earlier in the wiki code execution.
 $wgExtensionFunctions[] = 'chr2syl';

 $wgExtensionCredits['parserhook'][] = array(
        'name' => 'Chr2Syl',
        'author' => 'Jeffrey Vernon Merkey',
        'description' => 'Enables the generation of unicode symbols for the Cherokee language',
        'url' => 'https://www.mediawiki.org/wiki/Extension:Chr2syl',
        'license-name' => 'GPL-3.0+',
 );

 //This is the hook function. It adds the tag to the wiki parser and tells it what callback function to use.
 function chr2syl() {
    global $wgParser;
    # register the extension with the WikiText parser
    $wgParser->setHook( 'chr2html', 'renderChr2html' );
    $wgParser->setHook( 'chr2syl', 'renderChr2Syl' );
    $wgParser->setHook( 'syl2chr', 'renderSyl2Chr' );
 }
 # The callback function for converting the input text to HTML output
 function renderChr2html( $input, $argv ) {
    global $IP;
    $output = $input;
    $cmd = "$IP/chr/chr2syl -unicode -xml -s " . wfEscapeShellArg($input);
    wfDebug( "sylxmlcmd:" . $cmd . "\n");
    wfProfileIn(`chr2syl`);
    $output = wfShellExec( $cmd ) . ';'; 
    wfProfileOut(`chr2syl`);
    wfDebug( "chr2sylxml:" . $output . "\n");
    return $output;
 }
 function renderChr2Syl( $input, $argv ) {
    global $IP;
    $output = $input;
    $cmd = "$IP/chr/chr2syl -unicode -s " . wfEscapeShellArg($input);
    wfDebug( "chrcmd:" . $cmd . "\n");
    wfProfileIn(`chr2syl`);
    $output = wfShellExec( $cmd ); 
    wfProfileOut(`chr2syl`);
    wfDebug( "chr2syl:" . $output . "\n");
    return $output;
 }
 function renderSyl2Chr( $input, $argv ) {
    global $IP;
    $output = $input;
    $cmd = "$IP/chr/chr2syl -ph -s " . wfEscapeShellArg($input);
    wfDebug( "sylcmd:" . $cmd . "\n");
    wfProfileIn(`chr2syl`);
    $output = wfShellExec( $cmd ); 
    wfProfileOut(`chr2syl`);
    wfDebug( "syl2chr:" . $output . "\n");
    return $output;
 }

Modern Otali (Oklahoma) Syllabary Mappings for drifted language variants[edit]

Following are Syllabary maps which map drifted Otali Dialect Language Characters and syllables back into the Sequoyah Syllabary. Drifted Otali words which use the following constructs should be mapped to the Sequoyah Syllabary characters in this table.

  • nah-32:(0) Ꮐ (nah)
  • hna-31:(1) Ꮏ (hna)
  • qua-38:(2) Ꮖ (qua)
  • que-39:(3) Ꮗ (que)
  • qui-40:(4) Ꮘ (qui)
  • quo-41:(5) Ꮙ (quo)
  • quu-42:(6) Ꮚ (quu)
  • quv-43:(7) Ꮛ (quv)
  • dla-60:(8) Ꮬ (dla)
  • tla-61:(9) Ꮭ (tla)
  • tle-62:(10) Ꮮ (tle)
  • tli-63:(11) Ꮯ (tli)
  • tlo-64:(12) Ꮰ (tlo)
  • tlu-65:(13) Ꮱ (tlu)
  • tlv-66:(14) Ꮲ (tlv)
  • tsa-67:(15) Ꮳ (tsa)
  • tse-68:(16) Ꮴ (tse)
  • tsi-69:(17) Ꮵ (tsi)
  • tso-70:(18) Ꮶ (tso)
  • tsu-71:(19) Ꮷ (tsu)
  • tsv-72:(20) Ꮸ (tsv)
  • hah-79:(21) Ꮿ (ya)
  • gwu-11:(22) Ꭻ (gu)
  • gwi-40:(23) Ꮘ (qui)
  • hla-61:(24) Ꮭ (tla)
  • hwa-73:(25) Ꮹ (wa)
  • gwa-38:(26) Ꮖ (qua)
  • hlv-66:(27) Ꮲ (tlv)
  • guh-11:(28) Ꭻ (gu)
  • gwe-39:(29) Ꮗ (que)
  • wah-73:(30) Ꮹ (wa)
  • hnv-37:(31) Ꮕ (nv)
  • teh-54:(32) Ꮦ (te)
  • qwa-6:(33) Ꭶ (ga)
  • yah-79:(34) Ꮿ (ya)
  • na-30:(35) Ꮎ (na)
  • ne-33:(36) Ꮑ (ne)
  • ni-34:(37) Ꮒ (ni)
  • no-35:(38) Ꮓ (no)
  • nu-36:(39) Ꮔ (nu)
  • nv-37:(40) Ꮕ (nv)
  • ga-6:(41) Ꭶ (ga)
  • ka-7:(42) Ꭷ (ka)
  • ge-8:(43) Ꭸ (ge)
  • gi-9:(44) Ꭹ (gi)
  • go-10:(45) Ꭺ (go)
  • gu-11:(46) Ꭻ (gu)
  • gv-12:(47) Ꭼ (gv)
  • ha-13:(48) Ꭽ (ha)
  • he-14:(49) Ꭾ (he)
  • hi-15:(50) Ꭿ (hi)
  • ho-16:(51) Ꮀ (ho)
  • hu-17:(52) Ꮁ (hu)
  • hv-18:(53) Ꮂ (hv)
  • ma-25:(54) Ꮉ (ma)
  • me-26:(55) Ꮊ (me)
  • mi-27:(56) Ꮋ (mi)
  • mo-28:(57) Ꮌ (mo)
  • mu-29:(58) Ꮍ (mu)
  • da-51:(59) Ꮣ (da)
  • ta-52:(60) Ꮤ (ta)
  • de-53:(61) Ꮥ (de)
  • te-54:(62) Ꮦ (te)
  • di-55:(63) Ꮧ (di)
  • ti-56:(64) Ꮨ (ti)
  • do-57:(65) Ꮩ (do)
  • du-58:(66) Ꮪ (du)
  • dv-59:(67) Ꮫ (dv)
  • la-19:(68) Ꮃ (la)
  • le-20:(69) Ꮄ (le)
  • li-21:(70) Ꮅ (li)
  • lo-22:(71) Ꮆ (lo)
  • lu-23:(72) Ꮇ (lu)
  • lv-24:(73) Ꮈ (lv)
  • sa-44:(74) Ꮜ (sa)
  • se-46:(75) Ꮞ (se)
  • si-47:(76) Ꮟ (si)
  • so-48:(77) Ꮠ (so)
  • su-49:(78) Ꮡ (su)
  • sv-50:(79) Ꮢ (sv)
  • wa-73:(80) Ꮹ (wa)
  • we-74:(81) Ꮺ (we)
  • wi-75:(82) Ꮻ (wi)
  • wo-76:(83) Ꮼ (wo)
  • wu-77:(84) Ꮽ (wu)
  • wv-78:(85) Ꮾ (wv)
  • ya-79:(86) Ꮿ (ya)
  • ye-80:(87) Ᏸ (ye)
  • yi-81:(88) Ᏹ (yi)
  • yo-82:(89) Ᏺ (yo)
  • yu-83:(90) Ᏻ (yu)
  • yv-84:(91) Ᏼ (yv)
  • to-57:(92) Ꮩ (do)
  • tu-58:(93) Ꮪ (du)
  • ko-10:(94) Ꭺ (go)
  • tv-59:(95) Ꮫ (dv)
  • qa-73:(96) Ꮹ (wa)
  • ke-7:(97) Ꭷ (ka)
  • kv-12:(98) Ꭼ (gv)
  • ah-0:(99) Ꭰ (a)
  • qo-10:(100) Ꭺ (go)
  • oh-3:(101) Ꭳ (o)
  • ju-71:(102) Ꮷ (tsu)
  • ji-69:(103) Ꮵ (tsi)
  • ja-67:(104) Ꮳ (tsa)
  • je-68:(105) Ꮴ (tse)
  • jo-70:(106) Ꮶ (tso)
  • jv-72:(107) Ꮸ (tsv)
  • a-0:(108) Ꭰ (a)
  • e-1:(109) Ꭱ (e)
  • i-2:(110) Ꭲ (i)
  • o-3:(111) Ꭳ (o)
  • u-4:(112) Ꭴ (u)
  • v-5:(113) Ꭵ (v)
  • s-45:(114) Ꮝ (s)
  • n-30:(115) Ꮎ (na)
  • l-2:(116) Ꭲ (i)
  • t-52:(117) Ꮤ (ta)
  • d-55:(118) Ꮧ (di)
  • y-80:(119) Ᏸ (ye)
  • k-6:(120) Ꭶ (ga)
  • g-6:(121) Ꭶ (ga)


Installation[edit]

Place require_once("$IP/extensions/Chr2Syl.php"); inside LocalSettings.php. You also need the chr2syl program code (tar.gz) for high performance conversion. The source code for the chr2syl program is released under GPLv3. After downloading the tar.gz and building the program, copy it into a directory /chr created under your main MediaWiki base directory ($IP).

External links[edit]

Chr2Syl Source Code[edit]

//#define WINDOWS

#define LINUX

#ifdef WINDOWS

#define strncasecmp strnicmp

#include "windows.h"
#include "winioctl.h"
#include "winuser.h"
#include "stdarg.h"
typedef UCHAR BYTE;
typedef USHORT WORD;
#include "stdio.h"
#include "stdlib.h"
#include "ctype.h"
#include "conio.h"

#endif

#ifdef LINUX

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <ctype.h>
#include <string.h>
#include <ncurses.h>
#include <termios.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <net/if.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sched.h>
#include <ctype.h>

#endif

typedef struct _CHR_UNI
{
   unsigned char c1;
   unsigned char c2;
   unsigned char c3;
} chr_uni;

chr_uni unicode_table[]=
{
   { 0xE1, 0x8E, 0xA0 }, { 0xE1, 0x8E, 0xA1 }, { 0xE1, 0x8E, 0xA2 }, 
   { 0xE1, 0x8E, 0xA3 }, { 0xE1, 0x8E, 0xA4 }, { 0xE1, 0x8E, 0xA5 }, 
   { 0xE1, 0x8E, 0xA6 }, { 0xE1, 0x8E, 0xA7 }, { 0xE1, 0x8E, 0xA8 }, 
   { 0xE1, 0x8E, 0xA9 }, { 0xE1, 0x8E, 0xAA }, { 0xE1, 0x8E, 0xAB }, 
   { 0xE1, 0x8E, 0xAC }, { 0xE1, 0x8E, 0xAD }, { 0xE1, 0x8E, 0xAE }, 
   { 0xE1, 0x8E, 0xAF }, { 0xE1, 0x8E, 0xB0 }, { 0xE1, 0x8E, 0xB1 }, 
   { 0xE1, 0x8E, 0xB2 }, { 0xE1, 0x8E, 0xB3 }, { 0xE1, 0x8E, 0xB4 }, 
   { 0xE1, 0x8E, 0xB5 }, { 0xE1, 0x8E, 0xB6 }, { 0xE1, 0x8E, 0xB7 }, 
   { 0xE1, 0x8E, 0xB8 }, { 0xE1, 0x8E, 0xB9 }, { 0xE1, 0x8E, 0xBA }, 
   { 0xE1, 0x8E, 0xBB }, { 0xE1, 0x8E, 0xBC }, { 0xE1, 0x8E, 0xBD }, 
   { 0xE1, 0x8E, 0xBE }, { 0xE1, 0x8E, 0xBF }, { 0xE1, 0x8F, 0x80 }, 
   { 0xE1, 0x8F, 0x81 }, { 0xE1, 0x8F, 0x82 }, { 0xE1, 0x8F, 0x83 }, 
   { 0xE1, 0x8F, 0x84 }, { 0xE1, 0x8F, 0x85 }, { 0xE1, 0x8F, 0x86 }, 
   { 0xE1, 0x8F, 0x87 }, { 0xE1, 0x8F, 0x88 }, { 0xE1, 0x8F, 0x89 }, 
   { 0xE1, 0x8F, 0x8A }, { 0xE1, 0x8F, 0x8B }, { 0xE1, 0x8F, 0x8C }, 
   { 0xE1, 0x8F, 0x8D }, { 0xE1, 0x8F, 0x8E }, { 0xE1, 0x8F, 0x8F }, 
   { 0xE1, 0x8F, 0x90 }, { 0xE1, 0x8F, 0x91 }, { 0xE1, 0x8F, 0x92 }, 
   { 0xE1, 0x8F, 0x93 }, { 0xE1, 0x8F, 0x94 }, { 0xE1, 0x8F, 0x95 }, 
   { 0xE1, 0x8F, 0x96 }, { 0xE1, 0x8F, 0x97 }, { 0xE1, 0x8F, 0x98 }, 
   { 0xE1, 0x8F, 0x99 }, { 0xE1, 0x8F, 0x9A }, { 0xE1, 0x8F, 0x9B }, 
   { 0xE1, 0x8F, 0x9C }, { 0xE1, 0x8F, 0x9D }, { 0xE1, 0x8F, 0x9E }, 
   { 0xE1, 0x8F, 0x9F }, { 0xE1, 0x8F, 0xA0 }, { 0xE1, 0x8F, 0xA1 }, 
   { 0xE1, 0x8F, 0xA2 }, { 0xE1, 0x8F, 0xA3 }, { 0xE1, 0x8F, 0xA4 }, 
   { 0xE1, 0x8F, 0xA5 }, { 0xE1, 0x8F, 0xA6 }, { 0xE1, 0x8F, 0xA7 }, 
   { 0xE1, 0x8F, 0xA8 }, { 0xE1, 0x8F, 0xA9 }, { 0xE1, 0x8F, 0xAA }, 
   { 0xE1, 0x8F, 0xAB }, { 0xE1, 0x8F, 0xAC }, { 0xE1, 0x8F, 0xAD }, 
   { 0xE1, 0x8F, 0xAE }, { 0xE1, 0x8F, 0xAF }, { 0xE1, 0x8F, 0xB0 }, 
   { 0xE1, 0x8F, 0xB1 }, { 0xE1, 0x8F, 0xB2 }, { 0xE1, 0x8F, 0xB3 }, 
   { 0xE1, 0x8F, 0xB4 }, 
};

char *phonetic_syl[]=
{
   "a", "e", "i", "o", "u", "v", 
   "ga", "ka", "ge", "gi", "go", "gu", "gv", 
   "ha", "he", "hi", "ho", "hu", "hv", 
   "la", "le", "li", "lo", "lu", "lv", 
   "ma", "me", "mi", "mo", "mu", 
   "na", "hna", "nah", "ne", "ni", "no", "nu", "nv", 
   "qua", "que", "qui", "quo", "quu", "quv", 
   "sa", "s", "se", "si", "so", "su", "sv", 
   "da", "ta", "de", "te", "di", "ti", "do", "du", "dv", 
   "dla", "tla", "tle", "tli", "tlo", "tlu", "tlv", 
   "tsa", "tse", "tsi", "tso", "tsu", "tsv", 
   "wa", "we", "wi", "wo", "wu", "wv", 
   "ya", "ye", "yi", "yo", "yu", "yv", 
};

char *phonetic_opt[]=
{
   "na", "hna", "nah", "ne", "ni", "no", "nu", "nv", 
   "qua", "que", "qui", "quo", "quu", "quv", 
   "dla", "tla", "tle", "tli", "tlo", "tlu", "tlv", 
   "tsa", "tse", "tsi", "tso", "tsu", "tsv", 
   "ga", "ka", "ge", "gi", "go", "gu", "gv", 
   "ha", "he", "hi", "ho", "hu", "hv", 
   "ma", "me", "mi", "mo", "mu", 
   "da", "ta", "de", "te", "di", "ti", "do", "du", "dv", 
   "la", "le", "li", "lo", "lu", "lv", 
   "sa", "se", "si", "so", "su", "sv", 
   "wa", "we", "wi", "wo", "wu", "wv", 
   "ya", "ye", "yi", "yo", "yu", "yv", 
   "a", "e", "i", "o", "u", "v", "s",
};

char *wikichr[]=
{
   "&#5024", "&#5025", "&#5026", "&#5027", "&#5028", "&#5029", "&#5030", 
   "&#5031", "&#5032", "&#5033", "&#5034", "&#5035", "&#5036", "&#5037", 
   "&#5038", "&#5039", "&#5040", "&#5041", "&#5042", "&#5043", "&#5044", 
   "&#5045", "&#5046", "&#5047", "&#5048", "&#5049", "&#5050", "&#5051", 
   "&#5052", "&#5053", "&#5054", "&#5055", "&#5056", "&#5057", "&#5058", 
   "&#5059", "&#5060", "&#5061", "&#5062", "&#5063", "&#5064", "&#5065", 
   "&#5066", "&#5067", "&#5068", "&#5069", "&#5070", "&#5071", "&#5072", 
   "&#5073", "&#5074", "&#5075", "&#5076", "&#5077", "&#5078", "&#5079", 
   "&#5080", "&#5081", "&#5082", "&#5083", "&#5084", "&#5085", "&#5086", 
   "&#5087", "&#5088", "&#5089", "&#5090", "&#5091", "&#5092", "&#5093", 
   "&#5094", "&#5095", "&#5096", "&#5097", "&#5098", "&#5099", "&#5100", 
   "&#5101", "&#5102", "&#5103", "&#5104", "&#5105", "&#5106", "&#5107", 
   "&#5108", 
};


char unicode_map[85][4];
int chr_opt_map[85];
char buffer[8192];
char work[4096];

unsigned char *strtoken(unsigned char **s, const char *ct)
{
   char *sbegin = *s, *end;

   if (sbegin == NULL)
      return NULL;

   end = strpbrk(sbegin, ct);
   if (end)
      *end++ = '\0';
   *s = end;

   return sbegin;
}

int get_syl_map(char *syl)
{
   register int i;

   for (i=0; i < 85; i++)
   {
      register int len = strlen(phonetic_syl[i]);
      register int slen = strlen(syl);

      if ((len == slen) && !strncasecmp(syl, phonetic_syl[i], len))
      {
#if VERBOSE
         printf("%s = %s (%d)\n", syl, phonetic_syl[i], len);
#endif
         return i;
      }
   }
   return -1;
}

int printph(unsigned char *p)
{
   register int i;

   for (i=0; i < 85; i++)
   {
      if (!memcmp(p, unicode_map[i], 3))
      {
         printf("%s", phonetic_syl[i]);
         return 3;
      }
   }
   return 0;
}

int chr_syllable(char *p, int mkword, int xml_esc)
{
   register int i;

   for (i=0; i < 85; i++)
   {
      register int len = strlen(phonetic_opt[i]);

      if (!strncasecmp(p, phonetic_opt[i], len))
      {
#if VERBOSE
         printf("%s = %s (%d)\n", p, phonetic_opt[i], len);
#endif
         if (mkword)
         {
            register int j = chr_opt_map[i];
            if (j < 85)
            {
               if (xml_esc)
                  printf("%s", wikichr[j]);
               else
                  printf("%s", unicode_map[j]);
            }
         }

         return len;
      }
   }
   return 0;
}

int chr_word(char *p, int *count, int mkword, int xml_esc)
{
    register int i, len = 0;
    register char *s = work;

    if (count)
       *count = 0;

    if (p)
       len = strlen(p);

    if (!len)
       return 0;

    // 'i' and 'a', which are ambiguous in English, are never used
    // as single vowel morphemes in the Cherokee language.  
    if (len == 1 && ((tolower(*p) == 'i') || (tolower(*p) == 'a')))
       return 0;

    work[0] = '\0';
    strcpy(work, p);
    for (i=0; i < 4096; i++)
    {
       if (!work[i])
          break;

       if ((work[i] == ' ') || (work[i] == '\r') || (work[i] == '\n'))
       {
          work[i] = '\0';
          break;
       }
    }
    
    len = strlen(work);
    if (!len)
       return 0;

    s = work;
    while (s && *s)
    {
       i = chr_syllable(s, mkword, xml_esc);
       if (!i)
          break;

       s += i;
       if (count)
          *count += i;
    }

    return (len == *count);
}

int printchr(char *s, int xml_esc)
{
    int cnt = 0;
    char *p = s;
    register int i, len = 0, count = 0;

    if (!chr_word(s, &cnt, 0, 0))
       return 0;

    if (p)
       len = strlen(p);

    if (!len)
       return 0;

    while (p && *p)
    {
       while ((*p == '(') || (*p == ')'))
       {
          putchar(*p++);
          count++;
       }

       while (*p == ' ')
       {
          putchar(*p++);
          count++;
       }

       i = chr_syllable(p, 1, xml_esc);
       if (!i)
          break;

       p += i;
       count += i;
    }

#if VERBOSE
    if (len != count)
       printf("%s l:%d c:%d\n", s, len, count);
#endif

    return (len == count);
}

char str[4096 * 4];
char *str_p = NULL;
char vstr[4096 * 4];
char *vstr_p = NULL;

int main(int argc, char *argv[])
{
    register int i, id = 1, total = 0, xml_esc = 0, string = 0;
    unsigned char *p, *s;
    int tl = 0, cs = 0, syl2ph = 0, unicode = 0;

#if VERBOSE
    printf("building syllabary maps\n");
#endif
    for (i=0; i < 85; i++)
    {
       chr_opt_map[i] = get_syl_map(phonetic_opt[i]);
#if VERBOSE
       printf("%s-%d:", phonetic_opt[i], chr_opt_map[i]);
#endif
    }
#if VERBOSE
    printf("\n");
#endif

#if VERBOSE
    printf("building syllabary unicode map\n");
#endif
    for (i=0; i < 85; i++)
    {
       unicode_map[i][0] = unicode_table[i].c1;
       unicode_map[i][1] = unicode_table[i].c2;
       unicode_map[i][2] = unicode_table[i].c3;
       unicode_map[i][3] = '\0';
#if VERBOSE
       printf("%s:", unicode_map[i]);
#endif
    }
#if VERBOSE
    printf("\n");
#endif

    for (i=0; i < argc; i++)
    {
       if (!strcmp(argv[i], "--help"))
       {
          printf("USAGE:  chr2syl -[unicode|ph|xml] -s str [ < in > out ]\n"); 
          printf("        -unicode - Cherokee Phonetics to Syllabary\n");
          printf("        -ph      - Syllabary to Cherokee Phonetics\n");
          printf("        -xml     - Output Syllabary in XML Format\n");
          printf("        -s       - Use arguments instead of stdin/stdout\n");
          exit(0);
       }

       if (!strcmp(argv[i], "-unicode"))
       {
          unicode = 1;
          id = i + 1;
       }

       if (!strcmp(argv[i], "-ph"))
       {
          syl2ph = 1;
          id = i + 1;
       }
       
       if (!strcmp(argv[i], "-xml"))
       {
          xml_esc = 1;
          id = i + 1;
       }

       if (!strcmp(argv[i], "-s"))
       {
          string = 1;
          id = i + 1;
          if (argv[id])
          {
             s = argv[id];
             id = i + 1;
          }
       }
    }

    if (string)
    {
       if (syl2ph)
       {
          register int j;
          unsigned char *ph;

          ph = s;
          while (1)
          {
             if (*ph && *ph == 0xE1)
             {
                j = printph(ph);
                if (!j)
                   putchar(*ph++);
                else
                   ph += j;
             }
             else
             {
                if (*ph)
                   putchar(*ph++);

                if (!*ph)
                  break;
             }
          }
          return 0;
       }

       str[0] = '\0';
       str_p = &str[0];
       vstr[0] = '\0';
       vstr_p = &vstr[0];
       while (1)
       {
          if (*s && (isalpha(*s) || (*s > 127)))
          {
             *str_p++ = tolower(*s);
             *str_p = '\0';

             *vstr_p++ = *s;
             *vstr_p = '\0';
          }
          else
          {
             if (!printchr(str, xml_esc))  
                printf("%s", vstr);

             if (*s) 
                putchar(*s);

             if (!*s)
                break;

             str[0] = '\0';
             str_p = &str[0];
             vstr[0] = '\0';
             vstr_p = &vstr[0];
          }
          s++;
       }
       return 0;
    }

    if (syl2ph)
    {
       register int j;
       unsigned char *ph;

       while (ph = (unsigned char *)fgets(buffer, 8192, stdin))
       {
          while (*ph)
          {
             if (*ph == 0xE1)
             {
                j = printph(ph);
                if (!j)
                   putchar(*ph++);
                else
                   ph += j;
             }
             else
                putchar(*ph++);
          }
       }
       return 0;
    }

    while (s = fgets(buffer, 8192, stdin))
    {
       while (*s)
       {
          str[0] = '\0';
          str_p = &str[0];
          vstr[0] = '\0';
          vstr_p = &vstr[0];
          while (*s)
          {
             if (isalpha(*s) || (*s > 127))
             {
                *str_p++ = tolower(*s);
                *vstr_p++ = *s++;
             }
             else
             {
                *str_p = '\0';
                *vstr_p = '\0';

                if (!printchr(str, xml_esc))  
                   printf("%s", vstr);
                putchar(*s++);
                break;
             }
          }
       }
    }

    return 0;
}