User:Jeblad/Readabillity

Readabillity and possible good metrics.


 * Had some good results with a variation of Readability. The easy words are extracted from a large text corpus, sorted on the most common words and taking the most common ones as candidates. Those words are then stemmed and used in the later processing. Example implementation is at w:no:Bruker:Jeblad/stats.js.
 * An alternate idea that could be useful goes like "if a sentence contains to much information it will be hard to digest". Solution is to keep sentences short, as the penalty rises rapidly due to summation. See also Entropy_(information_theory).
 * A variation says that short time memory is limited so weight previous words within a sliding window, but otherwise do as in the previous. A correct solution would weight the words that reaches the end of the window more than words at the beginning of the window.
 * Give penalty as a small constant for simple words (DC easy wordds), and as a high constant for each new word within the reading context.