Enhancing both ops and public monitoring to :
- notice potential outages sooner,
- increase transparency to the community,
- support progress tracking required in the 5-year plan.
Status: We use Nagios for systems/load monitoring, but we haven't taken the time to tune its alert throwing to be really useful to us. We need to increase tooling to better monitor performance metrics. One example would be page load time in target markets (such as India).
(e.g. link to relevant Bugzilla queries)