Topic on Talk:XTools

Evolution and evolvability (talkcontribs)

Would it be possible to calculate this as the root square to measure absolute differences?

I.e. net change average of +3, -4 and -5 = ±4

Otherwise stats like average edit size can be skewed by deletions and lead to misleadingly low (or even negative) changes.

Alternatively ignore all -ve values when calculating and report average addition size and average reduction size or equivalent.

MusikAnimal (talkcontribs)

Maybe you're talking about variance? If so then yes, this should be an easy change thanks to MySQL built-in functions. They also have a function for standard deviation.

MusikAnimal (talkcontribs)
Evolution and evolvability (talkcontribs)

Although variance is the most mathematically reasonable, it is possibly a little technical for many.

I'd actually intended a simpler-to-interpret measure of average byte change whilst ignoring whether its +ve or -ve. For example:

  • edit A, +2 bytes
  • edit B, +3 bytes
  • edit C, -4 bytes
  • edit D, -5 bytes

Average

Average change

Average addition size

accounting only for addition edits A and B

Average reduction size

accounting only for deletion edits C and D

For the four edits above, the average is -1, which is tricky to interpret. Do they just make small deletions? Do they make large deletions and additions that happen to almost net cancel? By splitting the additions and reductions, it becomes possible to see whether they e.g. mostly make lost of small additions with the occasional massive deletion.

MusikAnimal (talkcontribs)

So much math! =P The issue here is we are bound by what MySQL can do for us. Otherwise we have to pull in the edit size of every edit, and run our calculations, which will consume too much memory for some users. I can say of the options you've laid out, the average addition and reduction size should be doable, assuming the query is still fast enough. Note we do show the number of small edits (< -20 diff size) vs large edits (> 1000), so I hope that in a way also gives an idea of the size of edits a user typically makes.

Seppi333 (talkcontribs)

I think the average addition size and average reduction size would be interesting statistics to know, but being a statistician makes me more of a dork than the typical editor. The edit size distribution for most editors is probably highly positively skewed, so it might be worthwhile to report to median addition size and median reduction size (either as an alternative to or in addition to those averages).

Reply to "Average edit size"