Talk:Article feedback/Public Policy Pilot/Design Phase 2

From mediawiki.org

Thanks for putting this together. Personally, I think the jquery popup would be a little too in-your-face, though I'd be ok with testing it is there is a "don't show this again" checkbox.

Some more comments:

  • Measurement: There are a number of little experiments we're running and I want to make sure we're going to be able to get quality data that we can base decisions on. From your design page, the two main interface items we're testing are:
1) Bottom of page vs. side-bar: Which one give us higher volume of ratings? A secondary question could be is there a correlation between volume and quality of ratings?
2) Submit vs. no-submit: With the submit button, do users feel like they need to rate all categories? If so, the completion rates we're observing would be inflated (and correspondingly, the quality of ratings might be lower since users users are simply filling out categories they may not have a real opinion on).
  • An appropriate measurement of #1 would be something like (ratings submitted) / (times ratings tool is shown) for bottom of page vs. side-bar. So far, page views have been used as the denominator, but unfortuantely, page views won't distinguish between the type of rating tool that's shown. We'll need some way to reliably estimating this denominator. The simplest way is to track the % weighting for each tool. But things become a little hairy if weightings change. Is there any way to track the display of ratings tool by type?
  • #2 has the additional complexity of what qualifies as a submit. In the case where there is a submit button, the answer is clear. But what about the case where there is not? If a user rates Well-Sourced, but then comes back 2 days later to rate Readable, does that count as one submit action, or multiple? My inclination is to define a submit (in the no-submit button case) to encompass all ratings that occur within one session. But we don't have session tracking, so is there another way to get this type of approximation?
  • Weightings: Can we change the weightings easily when the feature is in production?
  • Stale vs. Expired: What's the difference between Stale and Expired? They're both clearly an indication that the rating is out of date, but is the difference that a stale rating is still being used in average calculations and an expired rating is not? My gut reaction to this is that we're overcomplicating this feature. I'm not sure if users care about this level of detail. Also, it may be easier to come up with a simpler definition without the _and_ condition, e.g., a rating is stale if either 10 revisions or 20% of the article has changed post rating.

Howief 20:19, 15 November 2010 (UTC)Reply