Topic on Talk:ORES/BWDS review

This approach doesn't really work

2 comments • 21:54, 21 October 2019 4 years ago

2

Wladek92 (talkcontribs)

Thank you for your point of view and the explainations. It is not said the method is universal but it gives a basic filtering. In FRench the explained method works. In any case, we are encouraged to update the lists manually per language (if your 7000 entries stuck performances, select at least most common NOrv subsets). These words participate to an automatic process and if you do not declare them, always a human rereading process should filter them later (...and declare manually).

Christian FR (talk) 06:46, 25 October 2019 (UTC)

Reply 06:46, 25 October 2019 4 years ago

Jeblad (talkcontribs)

The set has some common items, but then turn into a “very biggely” (!) long-tail. The phrases “hæstpeis”, “hæstkuk”, “hæstskjit”, and “mainskjit” are winners, but also such terms as “hyspeis” (dick of a haddock) and “frosk-kuk” (dick of a frog) can be found in in the long tail. I got some feedback a few years ago, and it seems like the same thing exists in other cultures too.

My idea from back then was to create lists of terms, and merge terms to form composite words. To do so it would be necessary to have affix rules, in particular infix rules. This would be used to brute force create composite words. This approach gives a list of regular expressions of order $O(N\times M)$ . A better solution would be to look for forms that can be merged. Only after sufficient coverage of a word is achieved it is flagged as a match, and further processing triggered. This approach gives a list of regular expressions of order $O(N+M)$ .

Reply Edited 16:50, 25 October 2019 4 years ago

Reply to "This approach doesn't really work"