Readabillity and possible good metrics.

  • Had some good results with a variation of w:en:Readability#The Dale–Chall formula. The easy words are extracted from a large text corpus, sorted on the most common words and taking the most common ones as candidates. Those words are then stemmed and used in the later processing. Example implementation is at w:no:Bruker:Jeblad/stats.js.
  • An alternate idea that could be useful goes like "if a sentence contains to much information it will be hard to digest". Solution is to keep sentences short, as the penalty rises rapidly due to summation. See also w:en:Entropy_(information_theory).
  • A variation says that short time memory is limited so weight previous words within a sliding window, but otherwise do as in the previous. A correct solution would weight the words that reaches the end of the window more than words at the beginning of the window.
  • Give penalty as a small constant for simple words (DC easy wordds), and as a high constant for each new word within the reading context.