Talk:Content translation/Translation tools

From mediawiki.org
Latest comment: 10 years ago by DChan (WMF) in topic Stemming

Stemming[edit]

I like this introduction very much. One thing puzzling me is the term approximate stemming. Another thing puzzling me is the focus on stemming so much.

In my opinion stemming is always an approximation of the end result we want to achieve: find the dictionary form of the word. Stemming algorithms for many languages exist. I can also think of other methods, like fuzzy matching, machine learning with user training or morphological analysis, which is available for example for Finnish.

The above processed can be combined with some parsing to learn whether word is a noun or verb for example, which makes a huge difference in English.

Also, the section about word segmentation does not say anything about identifying wikipedia:multiword expressions, which are also important for effective use of glossaries and dictionaries.

Finally, perhaps the external links could be converted to internal links with wikipedia: prefix to avoid the ugly icons. --Nikerabbit (talk) 14:01, 3 March 2014 (UTC)Reply

Thanks; I've updated the page based on your comments. Let me know what you think! DChan (WMF) (talk) 15:17, 3 March 2014 (UTC)Reply