Reading/Web/Notable incidents

< Reading‎ | Web
Jump to navigation Jump to search

This page is similar to the Release timeline but specifically to record trends/bugs in the site. For dates different branches went live see MediaWiki_1.33/Roadmap.




Performance regression noted on site. However we believe this is likely a problem with the tooling or the Chrome browser not the site itself



MobileWebMainMenuClickTracking broken in deploy. Disabled shortly after.



Spike in EventLogging errors during a deploy of the broken main menu schema


MobileWebMainMenuClickTracking was broken in train deploy (T220016)


8th (1.33.0-wmf.25)

Error spikes 75k! Seems to be related to T219841


29th (1.33.0-wmf.23)

Error spikes 5-15k.

7th (1.33.0-wmf.20)

Errors spikes to 12k-13k. SWAT fix for T217820 shortly after has little to no impact so this is likely a new error.


21st (1.33.0-wmf.18)

a new deploy causes errors to spike to around 8K an hour (a little less than the spike on 15th). Some of these appear to relate to (I4db0551a7661eb5c41d7b2a27e78afb885bb9ce5) which probably should have been shipped in wmf.19 NOT 1.33.0-wmf.18.


Around 2-4k errors as bugs related to caching ceased.

15th (1.33.0-wmf.17)

An error spike on MinervaClientError's (12K an hour) up from the usual 3k. The problem seems to mostly effect US and Japanese users on and 1395 errors occurred on a single page in an hour period on Android Chrome Mobile but I couldn't replicate any issues even with the page and browser version available. Finally, I managed to replicate the problem: caching. Explanation here: Shipped in 1.33.0-wmf.17, probably should have shipped in 18.

7th (1.33.0-wmf.16)

The regression went live. No notable incident was recorded so it's likely impact was low.



A patch was deployed Explicitly pass in parseHTML with the hope of dealing with many of the issues that appeared on 17th.


Grafana is missing some events (most notably ReadingDepth, VirtualPageViews), although they were recorded correctly in the EL databases.

This was due to a PDU issue that affected prometheus1003.


MinervaClientError jumps again from 30 to 120k 2019-01-17 at 18:00:00 UTC(?). Not seeing anything obvious in or Deployment calendar. Stephen saw a problematic banner so this might also be related. 35% of errors come from iOS and 74% of traffic is on enwiki. The Steward nominations banner does appear to be throwing an error and when looking at referrer traffic for client side errors, the pages impacted do seem to coincide with places the banner is running. This was tracked and fixed but bugs were still at normal levels with the majority coming from iOS. Some of these bugs may be related to the page issues deploy so we are looking more closely...


MinervaClient errors jump from 4k a minute to 40k a minute with the 1.33.0-wmf.12 deploy. Owch. It turned about to be due to QuickSurveys being disabled on English WIkipedia but some surveys still being active in cached HTML. We promptly pushed it back to normal levels.




Bug fix deployed: T211986



[MobileFrontend refactor bug]

Bug T208605 was squashed. Minerva.WebClientError returns to baseline.



[MobileFrontend refactor bug]

A suspected iOS Safari bug caused a huge error spike in number of errors in Minerva.WebClientError. (~30k to ~120k) Error in MediaWiki_1.33/wmf.1 (T208605).