Topic on Talk:WMF product development process

Quiddity (WMF) (talkcontribs)
(Copying a post by @Wittylama that was misplaced due to a pagemove problem)

Could you perhaps discuss in this document the role of the "Beta features" system?

This was created as a place for people to opt in to new tools but has been used somewhat randomly. Sometimes new features bypass the beta system altogether - going straight into production (a recent example being the new notification icons) and sometimes things sit in the beta system indefinitely. Personally I'd really like it if there was a consistency about how the beta system was used, and also some objective, public, and measurable goals applied to each feature as the criteria for 'graduation' being an opt-out/default feature. Currently the beta system tells you how many people have opted-in to a given too, but not the retention rate or how many people that is as a % of active users etc.etc. No one likes surprises, and you've got a perfect platform to test and receive feedback, but it's just not seemingly part of the standard procedures. Wittylama (talk) 18:11, 5 November 2015 (UTC)

Qgil-WMF (talkcontribs)

@Wittylama, I agree that the use of Beta features should be systematic and documented. In my opinion, the use of Beta features should be part of the Release plan (as a requirement to enter the Release stage) and the intended results should be integrated in the quality criteria for release (as a requirement to move from individual opt-in to regular deployment).

See also the related discussion about Scope of "release" stage.

Qgil-WMF (talkcontribs)

@Jdforrester (WMF), @Quiddity (WMF), could you help clarifying whether there is a documented process defining when and how Beta Features should be used? If not, is there a task in Phabricator to fix this?

Jdforrester (WMF) (talkcontribs)

See Beta Features/Package. The guidance is that Beta Features should only be used for bigger changes on which you're not sure; trivial things like changing icons are not in scope. The system was built back when desktop vs. mobile was still a thing, and there's no good plan for resolving that. In general, I would rather we killed Beta Features than added yet more bureaucracy to it.

Wittylama (talkcontribs)

On the one hand what you say sounds quite reasonable, but then again the WMF did just recently rolled-out a change that was considered a cosmetic adjustment to buttons (splitting the notification icon into two) that had to be rolled-back for an important fix. It's a quite recent example of where a 2 week 'beta' period would have been perfectly sensible:

*the change wasn't urgent

*the audience for the change was active editors

*the success criteria for default rollout are easy to define (no slowdown in load time, no broken extensions, WCAG accessibility compliant

*whatever it was that the change was designed to improve in the first place).

I completely agree that minor changes should not have unnecessary bureaucracy, but that should speak more to the way that a 'beta testing' period should be handled as a smooth and natural transition. If the change isn't urgent, what's the harm in having people who opt-in testing it for two weeks? If there's no problem, no problem. If there's an issue raised then it can be quickly fixed without the embarrassment of having to undo a rollout.

P.s. Jdforrester (WMF) can you explain what you mean by "back when desktop vs. mobile was still a thing"? Wittylama (talk) 09:30, 4 December 2015 (UTC)

Jdforrester (WMF) (talkcontribs)

By "back when desktop vs. mobile was still a thing", I was making a slightly snippy off-hand reference to the out-dated idea that we wanted to build two different websites, one for desktop and one for mobile, instead of working towards a single site for users of any and all device sizes. What we would have considered acceptable then isn't something we'd be OK with now. Sorry for the confusion.

Wittylama (talkcontribs)

It remains my understanding that the mobile development and the desktop development are undertaken separately, and that the features, user-experience and workflows are intentionally different. See, for example, this recent comment by @Jdlrobson on Phabricator https://phabricator.wikimedia.org/T118338#1840066. I'm not saying that either approach is wrong, but I am getting very mixed messages about how the mobile development and the desktop development are undertaken.

Jdforrester (WMF) (talkcontribs)

Sorry, I don't run the Reading department, I was just relaying what I'd understood from them.

Wittylama (talkcontribs)

Stop pretending, @Jdforrester (WMF) - we all know that you secretly run everything ;-)

TheDJ (talkcontribs)

I actually highly doubt that such a problem would have made itself obvious in beta mode. If it wasn't obvious in mediawiki.org and test.wikipedia.org, then it wouldn't have been obvious in beta either.

It's actually a good example of what James meant I think: Beta will not protect you from bugs or implementation errors. Regression testing and performance metrics will.

Beta, at most it allows you to do gauge interest and gather feedback on something you are 'considering'. But unfortunately, it's not instrumented well enough to actually do so in practice.

And there is another problem, that in the current form Beta is not much more then 'gadgets on steroids', which also means it comes with implementation limitations, that require you to reimplement whatever you tested. If we ever want to use it more, it's entire architectural pinnings will need to be reworked to change that....

Wittylama (talkcontribs)

Thanks for the reply user:TheDJ, even if your response is rather depressing. If that's the case, it basically tells me that the Beta features system is pretty much a wasted project in the first place.

P.S. There's a related thread on Phabricator where I've just mentioned this discussion here: https://phabricator.wikimedia.org/T76573

TheDJ (talkcontribs)

Part of this has to do with that fact that we are 'hyper optimized'. Unfortunately, that also makes it very difficult to 'vary' inside the software stack, since the stack was never designed to allow for much variance. MediaWiki and Wikipedia specifically was always designed on the premise of "Everyone gets the same, where possible", and it's very difficult to change all those thousands of assumptions throughout the system into: "Everyone should be able to get something totally different" and not tumble the entire website.

I think we will have to change that going into the future, but it will require massive changes, not only in our software, but definitely also in our operations and hosting. What ops is doing with new deployment tools might help with that, but it's a long road. It's what is needed to make 'Beta' actually work and to make that a 'Core' feature.

TheDJ (talkcontribs)

And I definitely wouldn't say it is useless/wasted. Every system has limitations, you just need to be aware of them and understand how they impact your usage of the system. But currently there are soooo many limitations, that the system is basically not worth the extra effort required to make use of it.

It can be improved, the linked ticket describes a few things that could be worthwhile improvements to at least get better/more understandable/interpretable feedback out of a deployed beta feature.

But using it to 'catch bugs' when only about 5% of a couple thousand active editors will ever use the system is unrealistic. You need the test group of 120000 active english Wikipedians at some point to really reach those 50 editors that, more than once a week, use the thing you changed (on Chrome 35). This is part of our challenge, we have areas of the software that really have a minute amount of users, yet those few people can be impacted greatly by even the smallest change.

Pginer-WMF (talkcontribs)

I think that being able to expose features gradually to our users has many benefits.

Some features may need more steps than others. For example, a feature can be available as opt-in for those users that look for it, then it can be announced in a relevant context inviting more users to try, later it can become the default with an option to opt-out, and finally become just the default experience. Other features may need less steps, or none at all for the most trivial cases (although the devil is in the details and most uncontroversial changes may have more ramifications than the ones expected initially).

Being able to try a new feature in a real context (not an example in a testing server that may or may not fit with your workflows) and having always a way out to go back to normal in case things go wrong, seems valuable and helps to set expectations about the feature.

Communication is a big part of this in order to make users aware about how the feature is evolving, indicate whether they consider the feature is working for them and describe which issues they experience.

I think many of the aspects mentioned above connect with several of the challenges teams face when launching products and result in much time spent afterwards. Beta features is not completely supporting all of them, but I think that what we need is to improve the platform. I think that problems such as few people joining, or the need to make the process more agile require less effort than the problems of not having a way to gradually early and often get our users on board.

Jdforrester (WMF) (talkcontribs)

Yup, this is right. Beta Features isn't for "beta testing" of whether the code works overall, or has performance issues like with the notification badge split (that's the job of the developers to get right before it ever is seen by a user). It's more like a tool for User acceptance testing of bigger changes.

Deskana (WMF) (talkcontribs)

Beta Features is essentially intended for performing user acceptance testing, i.e. intended to answer the question "Does this feature meet the user's needs?". It's not intended to be used as a way of finding regressions like those in the recent notifications rollout, although of course any time anyone's using software bugs and regressions can be found.

test.wikipedia.org and en.wikipedia.beta.wmflabs.org are intended to find regressions and test integration, i.e. "Does this feature simply not work, or break other features?". In theory, the regression with notifications should've been caught here. However, it is known that these wikis are not really enough like production to catch these regressions. That's something that @Greg (WMF) et. al. have been interesting in improving for a while, but the problem is very complex and would take significant work.

Qgil-WMF (talkcontribs)

Let's try to summarize this long discussion. All what this product development process would define are checkpoints about public testing in the Develop and Release stages. These checkpoints could define some expectations:

  • Software quality: prototype, alpha, beta, stable opt-in...
  • Target groups for each announcement: tech ambassadors, Beta Feature users...
  • Minimum duration of the testing
  • Feedback channels used

This process should not require the use Beta Features or any other specific technology to get early feedback from users.

Is this a good summary?

Wittylama (talkcontribs)

I'd say that one thing missing from that bullet-point list is that it needs specific and measurable criteria for acceptance. Depending on the type of thing that might be something needing community consultation to define or might be a statistical measurement (like 'no slowdown in load time'). My primary criticism of the Beta Features system (other than confusion about when it is used or not used) is that there's no way to tell whether something is "successful" and should be promoted, or "unsucessful" and needs to be demoted. At present, things just live there in limbo-land. I refer again to my favourite ever rollout 'go/no go' critieria in Wikimedia history - the "Usability Initiative" (for building the Vector skin). Its published in advance critieria was "80% opt-in user retention". This was a measurable, achievable, and objective criteria. It also baked-in the concept of valuing the feedback from the early-adopters who chose to drop-out, which resulted in those people trying it again and eventually becoming advocates for it within the community (ping @Trevor Parscal (WMF) who was part of that team).

Trevor Parscal (WMF) (talkcontribs)

@Wittylama I also think that was a good approach. One thing to note, however, is that we actively surfaced the feature's availability by adding links to the personal tools and running banners. This dramatically increases awareness of features, but it also requires we do this one-at-a-time and once-in-a-while or there will be advertisement fatigue.

Reply to "Beta features"