User:AKlapper (WMF)/Sandbox

TODO: Should use proper references to literature instead of inline. And all these topics are so mingled they are really hard to structure...

Influential factors
TODO: Split into: Contributor managing to create a patch (set up dev env, finding right place in code, docs and their size and findability, finding right feedback place); Factors for time to review and time to merge; Factors for likeliness to get the patch reviewed and merged

Factors: social, technical, organizational. Roles: contributor, reviewer. Aspects: Acceptance rate, number of revisions per patch before merge.


 * 1) Social: Time to review for first patch of a contributor is crucial
 * 2) Some patches need many iterations; takes time for everybody: Poor quality requires big effort and makes reviewers ignore instead of reviewing again and again with CR-1.
 * 3) Social: Prioritization / weak open source culture: more pressure to write new code than to review patches contributed?
 * 4) Social: Lack of sync between developer teams: team A stuck because team B doesn't review their patches?
 * 5) Organizational: Reviewer workload; too much on the plate already
 * 6) Organizational: Not enough reviewers in general?
 * 7) Organizational: Not enough people with CR+2 rights?
 * 8) Social/Organizational: Finding the right reviewer and making sure they are added; Unclear or no responsibilites: "Changes failing to capture a reviewer's interest remain unreviewed" due to self-selecting process of reviewers, or everybody expects another person in the team to review
 * 9) Organizational: Unclear responsibilites: Outdated documentation how to find maintainer to ping
 * 10) Technical: CR tool to allow a reviewer to notify them of patches in their areas of interest
 * 11) Social: Unmaintained repositories: Lack of owners / maintainers, or under-resourced
 * 12) Technical/Social: Unmaintained repositories: Lack of disclaimers for potential contributors
 * 13) Social: Time to review for first patch of a contributor is crucial

Potential actions

 * 1) CR tool to allow finding / explicitly marking first contributions. korma.wmflabs.org to list recent first contributions and their time to review
 * 2) Patches should be "small, independent, and complete". Stakeholders with different expertise areas to review aspects need to split reviewing parts of a larger patch.
 * 3) Introduce and foster routine / habit across developers to spend a certain amount of time each day for reviewing patches (or part of standup), and team peer review on complex / controversial patches
 * 4) Blocked-on-* projects in Phabricator?
 * 5) Tool support to propose reviewers or display on how many unreviewed patches a reviewer is already added so the author can choose other reviewers. Proposal to add reviewers to patches but needs good knowledge of community members as otherwise creating noise; "two reviewers find an optimal number of defects - the cost of adding more reviewers isn't justified [...]"
 * 6) Capacity building: Recognize contributors (supported by korma stats?) and "somehow" encourage them to become habitual and trusted reviewers; actively nominate to become maintainers
 * 7) Review current CR+2 handout practice (cf. changes in numbers), total number of volunteers and staff plus ratio. Consider handing out code review rights to more volunteers? Recognize people not executing their CR+2 rights anymore. Related: Trust levels.
 * 8) Need for better tools to identify such areas within a codebase. Then clarify within teams. "Assign reviews that nobody selects." Share knowledge: There might be (old) code areas that only one or zero developers understand.
 * 9) Automate updating the outdated and manual Developers/Maintainers; publically documented (and findable) Team/Maintainer <-> Codebase/Repository relations.
 * 10) Existing via https://www.mediawiki.org/wiki/Gerrit/watched_projects but limited
 * 11) Need for better technical implementation to display information; allow contributor to act
 * 12) Structured patch review process: Three steps proposed by Sarah Sharp for maintainers / reviewers, quoting:
 * 13) Is the idea behind the contribution sound? / Do we want this? Yes, no. If the contribution isn’t useful or it’s a bad idea, it isn’t worth reviewing further. Or “Thanks for this contribution!  I like the concept of this patch, but I don’t have time to thoroughly review it right now.  Ping me if I haven’t reviewed it in a week.” The absolute worst thing you can do during phase one is be completely silent.
 * 14) Is the contribution architected correctly? Squash the nit-picky, perfectionist part of yourself that wants to comment on every single grammar mistake or code style issue.  Instead, only include a sentence or two with a pointer to coding style documentation, or any tools they will need to run their contribution through.
 * 15) Is the contribution polished? Get to comment on the meta (non-code) parts of the contribution.  Correct any spelling or grammar mistakes, suggest clearer wording for comments, and ask for any updated documentation for the code
 * 1) Is the contribution polished? Get to comment on the meta (non-code) parts of the contribution.  Correct any spelling or grammar mistakes, suggest clearer wording for comments, and ask for any updated documentation for the code

Misc stuff to merge
Code unit factors (code size, functionality) and reviewers (presence of specific reviewers) are most influential factors for review effectiveness.


 * Useful CR comments: identifying functional issues; identifying corner cases potentially not covered; suggestions for APIs/designs/code conventions to follow.
 * Somewhat useful CR comments: coding guidelines; identifying alternative implementations or refactoring
 * Not useful CR comments: Authors consider reviewers praising on code segments, reviewers asking questions to understand the implementation, and reviewers pointing out future issues not related to the specific code (should be filed as tasks) as not useful.

"among the factors we studied, non-technical (organizational and personal) ones are betters predictors compared to traditional metrics such as patch size or component, and bug priority."
 * Patch Size: "Review time weakly correlated to the patch size" but "Smaller patches undergo fewer rounds of revisions"
 * Priority: significant correlation between priority in issue tracker and positivity
 * Component: "no relation between positivity and the component factor"
 * Review Queue Length: "the shorter the queue, the more likely the reviewer is to do a thorough review and respond quickly" and the longer the more likely it takes longer but "better chance of getting in" (due to more sloppy review?)
 * Organization: "both Apple and Google are more positive about their own patches than 'foreign' patches" to WebKit
 * Reviewer Activity: "choice of reviewers plays an important role on reviewing time. More active reviewers provide faster responses"; "no correlation between the amount of reviewed patches on the reviewer positivity"
 * Patch Writer Experience: "more experienced patch writers receive faster responses" plus more positive ones. Contributors' very first patch is likely to get positive feedback in WebKit; for their 3rd-6th patch it is harder.

Reviewers: Changesets:
 * Reviewers who have prior experience give more useful comments as they have more knowledge about design constraints and implementation.
 * "[R]eviewers from different teams gave slightly more useful comments than reviewers from the same team [...] however, the magnitude of the difference is quite small"
 * TODO: Future: Automatic reviewer suggestion systems?
 * TODO: "we recommend including inexperienced reviewers so that they can gain the knowledge and experiences required to provide useful comments to change authors"
 * Vague: "Project management can also identify weak reviewers and take necessary steps to help them become efficient."
 * "[I]f there are more files to review, then a thorough review takes more time and effort"
 * "review effectiveness decreases with the number of files in the change set."
 * Small patches (max 4 lines changed) "have a higher chance to be accepted than average, while large patches are less likely to be accepted" (probability) but "one cannot determine that the patch size has a significant influence on the time until a patch is accepted" (time) Small, independent, complete patches are more likely to be accepted.

Patch acceptance/positivity: Developer experience, patch maturity; Review time impacted by submission time, number of code areas affected, number of suggested reviewers, developer experience.

Reasons for rejecting a patch (not all are equally decisive; "less decisive reasons are usually easier to judge" when it comes to costs explaining rejections): Mismatch of judgement: Patch reviewers consistently consider test failures, incomplete fix, introducing new bugs, suboptimal solution, inconsistent docs way more decisive for rejecting than authors.
 * Problematic implementation or solution: Compilation errors; Test failures; Incomplete fix; Introducing new bugs; Wrong direction; Suboptimal solution 9works but there is a more simple or efficient way); Solution too aggressive for end users; Performance; Security
 * Difficult to read or maintain: Including unnecessary changes (to split into separate patch); Violating coding style guidelines; Bad naming (e.g. variable names); Patch size too large (but rarely matters as it's ambiguous - if necessary it's not a problem); Missing docs; Inconsistent or misleading docs; No accompanied test cases (TODO: How much is this a NoGo in WM?); Integration conflicts with existing code; Duplication; Misuse of API; risky changes to internal APIs; not well isolated
 * Deviating from the project focus or scope: Idea behind is not of core interest; irrelevant or obsolete
 * Affecting the development schedule / timing: Freeze; low urgency; Too late
 * Lack of communication or trust: Unresponsive patch authors; no discussion prior to patch submission; patch authors' expertise and reputation

Proposed guidelines for writing acceptable patches:
 * Authors should make sure that patch is in scope and relevant before writing patch
 * Authors should be careful to not introduce new bugs instead of only focussing on the target
 * Authors should not only care if the patch works well but also whether it's an optimal solution
 * Authors should not include unnecessary changes and should check that corner cases are covered
 * Authors should update or create related documentation

Related pages

 * https://www.mediawiki.org/wiki/Manual:Coding_conventions
 * https://www.mediawiki.org/wiki/Gerrit/Code_review for reviewers
 * https://www.mediawiki.org/wiki/Gerrit/Code_review/Getting_reviews for authors (too long?)
 * https://meta.wikimedia.org/wiki/Grants:IdeaLab/Making_Gerrit_access_easier_for_developers_new_to_MediaWiki

T115852, T1287 - Unclear maintenance responsibilities for some parts of MediaWiki core repository

 * Related: Automate updating the outdated and manual Developers/Maintainers; publically documented (and findable) Team/Maintainer <-> Codebase/Repository relations.

Influential factors

 * 1) Social: Unclear responsibilites: everybody expecting another person to review
 * 2) Organizational: The last developer with knowledge of a code area has been moved to a different code area

Potential actions

 * 1) Need for better tools to identify such areas within a codebase. Then clarify within teams.

T113707 - SLA on time to review for CR

 * TODO: Nothing found in literature. Andre blogged: https://blogs.gnome.org/aklapper/2015/12/01/volunteer-contributions/
 * Requires clear responsibilities who owns what.

T113706 - Cut-off date to abandon patches in CR

 * TODO: Nothing found in literature. Andre blogged: https://blogs.gnome.org/aklapper/2015/12/01/volunteer-contributions/

Volunteer onboarding

 * "If a project does not make a good first impression, newcomers may wait a long time before giving it a second chance."
 * 3 different profiles of newcomers: "Newcomers can be novice developers who are starting their career, people who are experienced developers from industry but are not used to OSS projects, or people who are migrating from other OSS projects."
 * Entry barriers are not always bad: Some lead to improved contributions in the long run.
 * "Successful FOSS projects grow their communities outward to drive contribution to the core project. To build that community, a project needs to develop three onramps for software users, developers, and contributors, and ultimately commercial contributors."
 * "Sometimes, a contribution barrier cannot be lowered, but instead, the FLOSS project may shift the contribution barrier to the inside of the community's onion model."
 * "Make it easy to contribute by making the software easy to configure, build, and test to a known state. The more time you save outside developers that might be interested in contributing, the more time they have to work on the contribution they want to make, rather than losing time and possibly interest in trying to get past building the software."
 * Dialogs shown to first time contributors trigger additional contributions.
 * AFAIK no study exists to compare successful and unsuccessful contributors specifically.
 * TODO: READ about Onboarding and Mozilla::

Barriers faced by FOSS newcomers
Steinmacher et al. went through 291 studies and considered 20 relevant. The majority relies on quantitative past data instead of qualitative studies. Via grounded evidence they analyzed 15 newcomer contribution barriers in 5 categories with 3 origins (Newcomers, Community, Product). The following items and quotes are from Steinmacher et al. if not noted differently.


 * 1) Social interaction.
 * 2) Lack of social interaction with project members. Newcomer's social network; social status and need to build an identity. All 7 studies "show a correlation between the centrality of necomers' social relationships and newcomers' successful permanence of a contributor. However, there is no clear evidence of the causal relationship between social network centrality and newcomer success."
 * 3) Receiving an improper answer. "Three studies brought evidence of the negative impact of the content of answers received by newcomers" when it comes to answering "politely or positively". "[N]ewcomers demand attention and firendly hands to start contributing".
 * 4) Not receiving a (timely) answer. Feeling demotivated or unimportant. (Studies with contradictory results, however:) "absence of responses, improper answers, and not receiving recognition [...] can lead to newcomer dropout." "[...] coudl nominate people with social skills to receive newcomers in the communication channels.", "avoiding the use of project specific terms and jargons", "need to receive proper directions in a positive way". Potentially "make newcomers aware of the average time to receive answers [...] to help manage their expectations."
 * 5) Newcomers' previous knowledge
 * 6) Lack of domain expertise. Not indicative. People who contribute are nearly always users. No specific study exists about domain expertise though!
 * 7) Lack of technical expertise. Practical hands-on experience strongly associated with continued contribution; knowledge received via academic education not significant for successful contributing. A contributor can benefit from presenting skills and demonstrating expertise by sending mail or patches to influence the perception of the community. "[B]efore contributing to a project, newcomers must, for example, verify whether their skills math wtih skills needed". "Newcomers that showed proactivity [...] were better received by the community" and "the content of a message influences the reception of a newcomer". "[S]ocial and political behavior was important for newcomers to become long-term contributors or to be accepted [...] as members."
 * 8) Lack of knowledge of project practices. Not indicative, but "making it clear what is expected from them and what the process and practices are that must be followed to contribute" by having a "more informative and less technical initial environment" which looks "less geeky/daunting for newcomers".
 * 9) Finding a way to start
 * 10) Difficulty to find an appropriate first task. "the community wants the newcomers to pick the task themselves; however, newcomers have no clue of how to do this." Capiluppi and Michlmayr report that it is easier for new contributors to work on newer codebases than on older codebases.
 * 11) Difficulty to find a mentor. Not indicative for FOSS projects as no specific studies. Not common in FOSS to offer formal mentorship.
 * 12) Documentation
 * 13) Outdated docs. Not indicative for FOSS projects as no specific studies. Creates uncertainty whether docs can be trusted. Potential waste of time on already existing features. TODO: "make newcomers aware of the status of the documents".
 * 14) Too much docs. Supported by two experimental studies. Overwhelming information overload. "The projects need to provide easy ways to find and navigate the information provided by the projects, linking different sources of information and enabling the recommendation of relevant parts of the group memory" for the newcomer's task.
 * 15) Unclear code comments. Rather irrelevant.
 * 16) Technical hurdles (in general rather poorly studied)
 * 17) Setting up a local development environment. Getting stuck setting it up due to disproportionate configuration effort, reuse which creates dependencies and complicates building, platform diversity due to build configurations, constant changes in the build process requiring the developer who once mastered has to constantly keep up, prior experience, (not applicable:) interpreted languages, nobody in charge to simplify the build configuration. Hence provide a VM like Vagrant.
 * 18) Code complexity. Too hard to fix in this scope. "code complexity negatively influenced newcomers' decision to contribute". Hence "directing newcomers to peripheral modules" is better: TODO: e.g. do not list MediaWiki core as first item on Annoying little bugs? Furthermore, fear of introducing new issues and feeling embarrassed, introducing platform specific bugs (e.g. different database backends), only superficially testing and creating unpredictable side effects can be reduced by providing easy access to unit tests via CI (Jenkins).


 * 1) Software Architecture complexity. Providing visual information (e.g. class diagrams) reduces newcomers' challenges. Too hard to fix in this scope.

Hospitality and tone

 * cf. https://en.wikisource.org/wiki/Hospitality,_Jerks,_and_What_I_Learned
 * "the level of politeness in the communication process among developers does have an effect on the time required to fix issues and, in the majority of the analysed projects, it has a positive correlation with attractiveness of the project to both active and potential developers. The more polite developers were, the less time it took to fix an issue, and, in the majority of the analysed cases, the more the developers wanted to be part of project, the more they were willing to continue working on the project over time."; "Issue fixing time for polite issues is faster than issue fixing time for impolite issues for 10 out of 14 analysed projects."; "Findings. In the majority of projects Magnet and Sticky are positively correlated with Politeness."; ”Politeness is the practical application of good manners or etiquette. It is a culturally defined phenomenon and therefore what is considered polite in one culture can sometimes be quite rude or simply eccentric in another cultural context. The goal of politeness is to make all of the parties relaxed and comfortable with one another.”
 * "Would you be interested in contributing a fix and a test case for this as well?" style instead of "this isn't the forum to clarify support requests"?; "The latest changeset isn't applying cleanly to git master anymore - could you resubmit it please?"
 * Cultural differences: Gupta et al tested four strategies (Brown and Levinson's theory): Direct ("Do X", "You could do X"), Approval ("Would it be possible/Could you do X"), Autonomy ("I'm wondering whether it would be possible if"/"Could you possibly do X"), Indirect ("X is not done yet", "Someone should do X") on 26 people (11 british, 15 indian, mixed gender, 20-30y) asked to rate on a scale between overpolite to excessively rude. While Brown and Levinson posit that the indirect strategy should be the politest form, to Indian people it sometimes sounds "like a complaint or sarcasm" as "English and Indian native speakers of English have different perceptions of politeness". Still "utterances to strangers need to be much more polite than those to friends".

T78639 - Addressing the long tail of low priority tasks in active projects
Saha et al. looked at 7 FOSS projects using Bugzilla and tried to find out why 125 manually analyzed bug reports open for more than a year take so long to get fixed. (Andre challenges some smaller assumptions but in general the authors have a clue, e.g. that triaging does not always and immediately take place, that severity and priority fields often keep their default values etc). Reasons are diverse:
 * Hard to understand and to find the right place where to fix in the code
 * Uncertain how to fix due to missing the bigger picture or related technical debt
 * Hard to fix due to required complex solution or architecture
 * Risky to fix with regard to release schedule
 * Incomplete fix due to missing corner cases
 * Importance not realized until duplicates were reported or someone comments, then activity follows
 * Hard to reproduce, e.g. steps to reproduce missing
 * Scheduling: Developer's workload and personal to-do lists
 * Unaware of corresponding task for a code change fix, hence not closed
 * Infrequent use case: Destructive bug but in a not very common area
 * Others, e.g. blocked by other tasks; developer vacations etc
 * No specific reasons / as-usual delay: limited resources etc, "there are many bug-fixes that got delayed without any specific reason".
 * Added by Andre: Misclassification in wrong basket/project hence not on the screen of devs?
 * Solution for more important tasks: Careful prioritization, predicting severity, change effort, and change impact. Added by Andre: Also by just saying "No until you do it yourself?" TODO: Expectation setting, again - maybe "The developers have a lot to do but if you feel strongly about this particular issue, please consider a code contribution which the developers would be happy to review."
 * Misc: "40% of long-lived bug fixes involved only a few changes in only one file."

Bounties / Crowdfunding

 * cf. T88265


 * "All bounty programs are characterized by a winner-take-all incentive structure. [...] a developer must consider if they should spend a considerable amount of time working for an uncertain prize of not enter." (page 176; cf. game theory).
 * Advantages from Company/corporation perspective: Can reduce development costs; can lead to greater number of alternative implementation designs due to competition/rivalry; can create broader interest in product; might change priorities for developer community. (pages 174--179)
 * Disadvantages from developer perspective: Risk of not making it due to getting paid only in case of success; hard for developer to evaluate difficulty of task and quality of documentation (pages 174--179)
 * Disadvantages from Company/corporation perspective: Can be hard to define the amount of money - if amount is too low might not attract qualified developers; potentially attracting students who wish to learn; potentially attracting developers in countries where salaries are low (cf. GSoc?); in FOSS it could be work that someone else might have done for free at some point been (cf. Free rider problem); time spend to judge contribution is a cost (though also with non-bounty reviews). (pages 174--179); Target group: "Typically, bounties have been won by a few people who have worked on projects for a long time." (page 177)
 * Risks for project perspective: might influence direction that project is heading; short-term orientation may create direction that is not scalable and creates maintenance costs (but also a non-bounty issue?) hence long-term orientation required; potential rivalry with volunteer developers and potential withdrawing of them; potential bypassing of maintainers if bounty organizers decide which tasks are pushed/prioritized instead of actual maintainers/community; attracting developers which do not know the direction and context of project. (page 177--178)

Annoying little bugs / Onboarding programs
Labuschagne and Holmes looked at Mozilla's 'Good First Bugs' (GFB) and 'Mentored Bugs' (MeB; something that Wikimedia does not have) programs taking quantitative data from Bugzilla and a qualitative survey with 11 developers with at least 10 contributions who started in one of these programs.
 * "newcomers who make 10 contributions take an average of 203 days to complete this work"
 * Interviewees appreciated mentorship, list a variety of starting points (Bugsahoy, Bugzilla search, IRC, Codefirefox.com), and underestimated the required effort to contribute compared to size and impact of the contribution.
 * "The likelihood that a developer's first contribution is successful" is 86% for Meb&&GFB, 80% for MeB, 67% for GFB, 73% for not in any program, hence "The dropout rate for program participants is much higher than for developers that did not start in a program" and "the data suggests that while developers in onboarding programs are more likely to succeed with their first attempts they are generally less likely to become long-term contributors". Possible reasons: "attract individuals who would not otherwise have attempted to make a contribution at all" who are "less equipped to transition into long-term contributors". "downplaying the difficulty of tackling a GFB may in fact increase the chances of failure since the expectation may be that these bugs should be 'easy'".
 * Note that "it is unclear whether these successful contributors would have started contributing without these programs, this paper provides quantitative evidence that these programs alone do not automatically improve the odds of a new developer becoming a long-term contributor" but newcomers might potentially "continue to be actively involved in the project in other ways, such as reporting bugs" (which was not measured in this paper).

Other related tasks

 * T114320, T114311 - CR migration to Differential
 * T114419 - Make CR not suck, especially for volunteers
 * T89907 - MW developer community governance model
 * T102920 - Unmaintained/inactive repos
 * T115659 - Collaboration on prioritization

Misc

 * Large single vendor governed FOSS projects tend to be controversial; non-profit foundations seem to be successful
 * "Solid Engineering Practices + Strong Community Governance + Clear IP Management enables Growth."
 * Magnetism and Stickiness: A project is Magnetic (magnetism) if it attracts new developers over time. A project is Sticky (stickiness) if it keeps its developers over time.
 * Magnetism is the portion of new active developers during the observed time interval, in our example 2/10 (dev 6 + dev 7 were active in 2011 but not in 2010).
 * Stickiness is the portion of active developers that were also active during next time interval, in our example 3/7 (dev 1, dev 2, dev 3 were active in 2011 and in 2012).
 * Covered by korma's Demographics page
 * Time to merge
 * OpenStack: Number of active core reviewers growing, 0% of the changesets that landed into master were merged in less than 3 days since the review process was opened.
 * Linux Kernel: 33% of the patches make it into the Linux kernel within 3-6 months; Factors are submission time, affected subsystems, number of requested reviewers