Topic on User talk:TJones (WMF)/Notes/Khmer Reordering/Examples

បេឡា (talkcontribs)

Hi, I'm just a new user who could not care less about the quality of my little Khmer community on Wikipedia. Fortunately, I happened to come across your page and this is one of many issue on Khmer script on site! with my understanding of the language, I would like lend you a hand on this one:

(Pardon if I'm coming out as being rude; I taught myself English.)

??? : The subscript for ត and ដ​ are identical (almost); there are no rule for the order since they cannot be used on the same consonant.

from the context; the subscript of ត(្ត) are grammatically used (ស្តា : ស​ + ្ត + ា), though you can use ដ(្ដ) with the same effect (ស្ដា : ស + ្ដ + ា).

Syllable boundary errors: The original កុំែ ( ក + ុ + ំ + ែ )​ is in correct order, កុែំ ( ក + ុ + ែ + ំ ) is not. The same goes for the other involve ុំ(ុ + ំ), ុះ​ (ុ​ + ះ) and ​េះ(េ + ះ), using the order as above. (the context are full of grammatical typos)

Another one is ពា្ឈ ( ព + ា + ្ឈ ). From the grammatical standpoint, that is certainly incorrect, but I think the author are using it to achieve this effect: ញ​ + ្ឈ = ញ្ឈ . The possible explanation is Khmer Unicode wasn't fully developed back then, so the writer had to substitute it with ព+ ា = ពា .

ច៎ា ( ច + ៎ + ា ) is the correct one. ៎ is used to emphasize the sound of ច to make it sound more short and sharp, so the reorder one is incorrect.

That is all that I can help for now. And thanks for your hard works!

(P.S. half of the the sample you're using are broken beyond recognition; common reordering won't make them readable)

TJones (WMF) (talkcontribs)

Thanks for the feedback—your English is great and I appreciate the help!

I've been off this project for a while, so I may have some more questions later if you are available. I have some questions right now:

Should I just ignore the weird case with both subscript ត and ដ as an error in the text? Or should I try to fix it? Would it make sense to treat subscript ត and ដ as the same since they look the same? That would be easy to do.

I'm not sure how to understand កុំែ. In most fonts, it has a dotted circle (like this: ) at the end. That means the font can't display it correctly. Other fonts display it correctly.

For ពា្ឈ / ញ្ឈ, is there a consistent rule to apply? Should "ព + ា" followed by a subscript consonant be converted to "ញ"? That seems like it could cause errors, but I would have to test it.

Is there a more general pattern for the case of ច៎ា? Should ៎ always be close to the main consonant, or just ច, or just in this one word?

Thanks for the help!

Reply to "Some Corrections?"