Topic on User talk:TJones (WMF)/Notes/Esperanto Stemmer Analysis

Next Steps: Implementation and Deployment

1
TJones (WMF) (talkcontribs)

@Brion Vibber (WMF), @Dominik, and any others who are interested: My plan is to go ahead with the implementation of the Esperanto stemmer as an Elasticsearch plugin, and then deployment on Esperanto-language wikis. The results look reasonable to me, several people have given generally positive feedback, and the questions and concerns people have had don’t indicate that the word groupings are so bad that they wouldn’t be useful, overall.

Based on the feedback people have kindly shared, the stemmer has improved its accuracy and coverage, and the exception list is very much improved. There are still some ambiguous and incorrect stems, but that just makes Esperanto seem more like a natural language! I’m not too concerned about how non-Esperanto words are treated, because that’s a problem all rules-based stemmers have, especially on projects like Wikipedia and Wiktionary, which contain plenty of foreign words.

The next step will be to wrap the stemmer into an Elasticsearch plugin, which will take a small amount of work. I’ll start on it after I finish my current project. If you have any objections to me continuing with this implementation plan, please let me know. Thanks to everyone who has commented and asked questions!

Reply to "Next Steps: Implementation and Deployment"