What I find surprising, even this early in the operation of the tool, is the skew towards the upper end of the ratings. It appears that readers hand out 3s, 4s and 5s (particularly 4s, I suspect) without much thought. It's surprising to find stubs that receive average ratings of 3.5+ for completeness.
I'm unsure how to encourage readers to use a bigger range of ratings (1–5) and to be more demanding. Or is it that in interpreting the scores, we should regard the difference between an average of 3 and of 4 to be critical? (That is, 3 bad; 4 good.) Should there be an implicit aim to reach 4+ in all four aspects? tony 12:58, 19 August 2011 (UTC)