Reading/Web/Notable incidents

This page is similar to the Release timeline but specifically to record trends/bugs in the site. For dates different branches went live see MediaWiki_1.33/Roadmap.

January
13th Site scripts and styles (e.g. MediaWiki:Common.css) were loaded on mobile and swiftly reverted (Caught via grafana). Luckily never hit production. https://phabricator.wikimedia.org/T237050#5800024

September
5th

Performance regression noted on site. However we believe this is likely a problem with the tooling or the Chrome browser not the site itself https://phabricator.wikimedia.org/T232174

August
1st

MobileWebMainMenuClickTracking broken in deploy. Disabled shortly after.

July
30th

Spike in EventLogging errors during a deploy of the broken main menu schema

25th

MobileWebMainMenuClickTracking was broken in train deploy (T220016)

April
8th (1.33.0-wmf.25)

Error spikes 75k! Seems to be related to T219841

March
29th (1.33.0-wmf.23)

Error spikes 5-15k.

7th (1.33.0-wmf.20)

Errors spikes to 12k-13k. SWAT fix for T217820 shortly after has little to no impact so this is likely a new error.

February
21st (1.33.0-wmf.18)

a new deploy causes errors to spike to around 8K an hour (a little less than the spike on 15th). Some of these appear to relate to skins.minerva.top (I4db0551a7661eb5c41d7b2a27e78afb885bb9ce5) which probably should have been shipped in wmf.19 NOT 1.33.0-wmf.18.

20th

Around 2-4k errors as bugs related to caching ceased.

15th (1.33.0-wmf.17)

An error spike on MinervaClientError's (12K an hour) up from the usual 3k. The problem seems to mostly effect US and Japanese users on en.wiki and jp.wiki. 1395 errors occurred on a single page in an hour period on Android Chrome Mobile but I couldn't replicate any issues even with the page and browser version available. Finally, I managed to replicate the problem: caching. Explanation here: https://phabricator.wikimedia.org/T208915#4958060. Shipped in 1.33.0-wmf.17, probably should have shipped in 18.

7th (1.33.0-wmf.16)

The regression https://phabricator.wikimedia.org/T217820 went live. No notable incident was recorded so it's likely impact was low.

January
23rd

A patch was deployed Explicitly pass in parseHTML with the hope of dealing with many of the issues that appeared on 17th.

15th-17th

Grafana is missing some events (most notably ReadingDepth, VirtualPageViews), although they were recorded correctly in the EL databases.

This was due to a PDU issue that affected

17th

MinervaClientError jumps again from 30 to 120k 2019-01-17 at 18:00:00 UTC(?). Not seeing anything obvious in https://wikitech.wikimedia.org/wiki/Server_Admin_Log or Deployment calendar. Stephen saw a problematic banner so this might also be related. 35% of errors come from iOS and 74% of traffic is on enwiki. The Steward nominations banner does appear to be throwing an error and when looking at referrer traffic for client side errors, the pages impacted do seem to coincide with places the banner is running. This was tracked and fixed but bugs were still at normal levels with the majority coming from iOS. Some of these bugs may be related to the page issues deploy so we are looking more closely...

9th

MinervaClient errors jump from 4k a minute to 40k a minute with the 1.33.0-wmf.12 deploy. Owch. It turned about to be due to QuickSurveys being disabled on English WIkipedia but some surveys still being active in cached HTML. We promptly pushed it back to normal levels.

December
20th

Bug fix deployed: T211986

November
5th

[MobileFrontend refactor bug]

Bug T208605 was squashed. Minerva.WebClientError returns to baseline.

October
19th

[MobileFrontend refactor bug]

A suspected iOS Safari bug caused a huge error spike in number of errors in Minerva.WebClientError. (~30k to ~120k) Error in MediaWiki_1.33/wmf.1 (T208605).