Wikimedia Technology/Annual Plans/FY2019/TEC3: Deployment Pipeline

''NOTE: This is a continuation of the

Program Goal
We will streamline and integrate the delivery of services, by building a new production platform for integrated development, testing, deployment and hosting of applications.

Wikimedia developers experience a tooling parity between our Continuous Integration (CI) and production environments which enables them to release code more frequently by continuously reducing risk.

Outcome 1: Continuous Integration is unified with production tooling and developer feedback is faster

 * Output 1.1
 * Convert current CI builds to use the new tooling (Blubber).


 * Output 1.2
 * Setup test execution time profiling with a report, make a prioritized list of improvements to how tests are run.


 * Output 1.3
 * Research and share a report of our options for implementing delta-only/code path aware testing.

Outcome 2: Deployers have a better assessment of risk with each deploy

 * Output 2.1
 * Create a deployments report with metrics from the Code Health Group.


 * Output 2.2
 * Stretch: Create a dashboard for real-time insight to the deployment report


 * Output 2.3
 * Improve our incident response, post-mortem, and follow-up management tooling.

Outcome 3: Deployments happen through percentage based stages (eg: canaries, 10%, 100%)

 * Output 3.1
 * Migration of services currently on our "shared service cluster" into Kubernetes deployments with staged rollout

Outcome 6: Developers and deployers are aware of the platform, its benefits and how to make use of it

 * Output 6.1
 * Create a developer portal for the Deployment Pipeline platform with documentation and instructions


 * Output 6.2
 * Promote the platform's adoption

Outcome 1: Continuous Integration is unified with production tooling and developer feedback is faster

 * Target 1.1
 * All Continuous Integration jobs are migrated to use production deployment tooling (eg: helm, minikube, docker, and blubber).


 * Measurement method
 * 1) This is measured by the number of Jenkins Jobs migrated to using our production deployment tooling (eg: Blubber).

Outcome 2: Deployers have a better assessment of risk with each deploy

 * Target 2.1
 * We reduce the number of MediaWiki deployment incidents by 10%


 * Measurement method
 * 1) This is measured by the number of rollback inducing deployments either through the weekly release train or SWAT deploys.

Outcome 3: Deployments happen through percentage based stages (eg: canaries, 10%, 100%)

 * Target 3.1
 * All services currently on our "shared service cluster" are deployed through percentage based stages.


 * Measurement method
 * 1) This is measured by identifying which services are deployed on Kubernetes through a percentage based rollout method.

Outcome 4: Developers are able to create services that achieve production level standards with minimal overhead

 * Targets
 * 1) 100% of new services following our own coding standards will have their logs collected, their metrics exposed and monitored and will be using encryption
 * Measurement method
 * 1) Number of Phab tasks under https://phabricator.wikimedia.org/tag/service-deployment-requests/

Outcome 5: Services and the deployment pipeline are hosted on production-level infrastructure

 * Targets
 * 1) 99% of availability for the deployment pipeline
 * Measurement method
 * 1) CI availability metrics

Outcome 6: Developers and deployers are aware of the platform, its benefits and how to make use of it

 * Targets


 * Measurement method
 * 1) Survey among the target audience

Dependencies

 * MediaWiki Platform: This program requires cross-team collaboration and planning for deploying MediaWiki and Services on a Kubernetes cluster.