Analytics/Wikimetrics/CommitGuidelines

Document Goal
The goal of this document is to define a criteria to evaluate commit "goodness" when it comes to quality. It should be a short guide that a new hire or a volunteer can use to learn how commits should be structured to contribute to wikimetrics codebase.

Feature versus Atomic
We prefer Atomic Commits and we are trying to get towards more incremental, more "1-change" commits iteratively. We shouldn't block things that are not atomic but it is to be understood that, when possible, commits should be very focus.

What does Atomic mean?
It means that commits should not contain multiple unrelated changes Rather they are specific to one task. Try and make piecemeal changes if you can. In most cases, atomic commits will be small in size but that is not always the case. We could have, for example, a commit that changes the format of every line we are logging to the logs. Change might be across a bunch of files, but it's one change.

Value Proposition
Atomic Commits are easier to CR, easier to spot failure on and easier to rollback. However, their advantages do not come without cost, in a large stream of commits it might be hard to see what is the overall goal. We try to keep track of changes that belong together using "topics" in Gerrit.

Incremental improvements
In many occasions you will be working on a task and, as part of your commit you see a way to do things better in existing code that requires some refactor. The rule of thumb we try to use about refactoring when working on a task:


 * If we come across a situation where we think something needs refactoring, and it takes <50% of the time allocated for the task, we /try/ to refactor in a separate commit as part of that one task.


 * If it's >50% or "try" does not work out, we file a bug about it, we might have code in the codebase that does the same thing in two different fashions until the bug gets resolved.

Writing tests
We should emphasize when possible unit testing versus integration testing. Strive for unit-testing coverage as much as possible, using mocking, etc.

Application level testing
Before merging your code you should test it in the sandbox to make sure you do not run into issues when deploying to production. The sandbox development environment is mostly identical to production, it just holds its own wikimetrics database separated from the production one. The labs db database (read only) is shared with production.

Deployability
We should strive to do consistent, complete, environment-aware commits Environment-aware means that all required repos should have a patch prepared (need not yet be merged). Example: your change adds a new piece of infrastructure that needs to be started by puppet. You should submit two patches (or maybe three) for CR. The changes to wikimetrics codebase and the changes to puppet modules. Please make sure that ops code reviews puppet changes.

Depending on the change puppet code might need to be deployed before or after (likely after) but, regardless of deployment schedule, the puppet code should be reviewed about the same time the wikimetrics code is reviewed. Changesets on wikimetrics depot should link to puppet changesets if those are needed. Commits should not require to be deployed in groups.

Continuous Integration
Wishlist: we should set up Continuous Integration to development sandbox and move stable candidate to staging once ready. If needed, we should be able to take over sandbox environment and test there.

Documentation

 * Not updating READMEs, diagrams and inline documentation all that stuff will be a blocker during CR.
 * Updating Wikipages is optional, but should be done. Shared responsibility with the Product Owner.
 * Diagrams will be updated using yED.

Gerrit: Tool Limitations
We are using Gerrit as our code review tool, it works reasonably well but we see two main limitations: merge commits and usability.

Merge Commits
Merge commits are annoying in Gerrit cause you just ...cannot do them, Gerrit is opinionated as to how commit history should look like and the tool is geared towards "rebases".

Usability
Gerrit is definitely not intuitive, specially if you have used other CR tools. Developers that might have used source control for years have a hard time wrapping their minds around the fact that there are no branches nor merges. It definitely has a learning curve for people that are coming from outside. Now, that being said, once learned, Gerrit behaves very consistently.