Analytics/Server Admin Log/Archive/2015

From mediawiki.org

2015-12-30[edit]

  • 15:23 ottomata: killing oozie legacy_tsv job 0102159-150605005438095-oozie-oozi-B to restart it without mobile, 5xx-mobile and zero outputs

2015-11-10[edit]

  • 03:14 ottomata: restarted eventlogging

2015-11-09[edit]

  • 14:40 ottomata: restarting eventlogging to see if it is ok after enabling firewall rules on kafka1014

2015-11-06[edit]

  • 15:51 joal: Change replication factor to 2 in cassandra per_article_flat keyspace
  • 15:47 ottomata: deploying aqs

2015-11-05[edit]

  • 18:24 ottomata: deploying aqs

2015-10-29[edit]

  • 10:35 joal: Gzipped already archived pageview files
  • 10:34 joal: restarted pageview job to archive gzipped files
  • 10:34 joal: refinery deployed

2015-10-28[edit]

  • 19:16 joal: Downsizing cassandra replication from 3 to to 2 on per_article_flat keyspace
  • 19:07 joal: Restart load job (based on IMPORTED flag)
  • 15:48 joal: Deploying refinery
  • 15:40 joal: deploying refinery-source v0.0.22

2015-10-27[edit]

  • 19:06 ottomata: deploying aqs
  • 18:24 joal: deploying refinery
  • 16:46 joal: Releasing refinery-source v0.0.21
  • 10:34 joal: manual aggregator launch after small bug correction

2015-10-26[edit]

  • 18:42 joal: refine bundle, pageview_hourly and projectview_hourly coord restarted
  • 18:41 joal: refinery deployed on HDFS
  • 14:33 joal: truncating "local_group_default_T_pageviews_per_article".data on aqs
  • 09:58 joal: Restart cassandra on aqs1001

2015-10-22[edit]

  • 20:24 ottomata: deploying aqs
  • 09:51 joal: restart cassandra on aqs1003

2015-10-21[edit]

  • 22:53 milimetric: deployed EventLogging and tried to backfill data lost on 2015.10.14 but failed
  • 18:24 joal: Stopped per article loading in cassandra
  • 13:39 ottomata: deploying aqs

2015-10-20[edit]

  • 10:12 joal: restart cassandra on aqs1002

2015-10-19[edit]

  • 18:35 ottomata: restarting eventlogging with change to parse schema names out of errored events

2015-10-16[edit]

  • 20:38 joal: restarted cassandra on aqs100[1,2,3]

2015-10-15[edit]

  • 12:17 joal: Refinery deploy needed before restart --> Deploying
  • 12:12 joal: Restarting daily and monthly mobile unique coordinators with new patch
  • 12:12 joal: Rerunning daily mobile unique jobs for days 2015-08-[03,04,11,12,12,14,17], 2015-09-16
  • 12:10 joal: Stopped daily and monthly mobile unique coordinators

2015-10-14[edit]

  • 15:22 ottomata: restarting lagging eventlogging mysql consumer

2015-10-09[edit]

  • 19:26 ottomata: releasing refinery 0.20
  • 15:19 ottomata: moved camus property files out of refinery repository and into puppet. Camus properties now live on an27 at /etc/camus.d, and camus log files are in /var/log/camus
  • 14:54 joal: Cassandra restarted on aqs1003
  • 09:15 joal: Restart cassandra on aqs1002

2015-10-08[edit]

  • 17:38 joal: Backfilling load from hadoop to cassandra from beginning of october

2015-10-07[edit]

  • 16:32 joal: Started cassandra load jobs from 2015-10-01

2015-10-01[edit]

  • 16:27 valhallasw`cloud: testing again
  • 16:13 valhallasw`cloud: test

2015-09-29[edit]

  • 10:51 joal: cluster back to normql state. Some errors are still not explained, need to be carefull.

2015-09-28[edit]

  • 14:56 joal: backfilling various load jobs having failed at earlier stages than check_sequence_statistics
  • 13:03 joal: Errors on cluster, dome refine jobs have failed, investigating.

2015-08-19[edit]

  • 18:20 ottomata: does this log work?

March 25[edit]

  • 22:09 qchris: starting HDFS balance for unhealty node analytics1016.eqiad.wmnet with healty nodes analytics1037.eqiad.wmnet,analytics1040.eqiad.wmnet

February 25[edit]

  • 16:07 ottomata: hello?

February 7[edit]

  • 02:10 qchris: Ran kafka leader re-election as analytics1021 dropped out of it's partition leader role.
  • 01:32 qchris: name nodes died with error "Java heap space" and did not come back up. Bumping heap allowed to resurrect them (See task T88871).

February 4[edit]

  • 23:22 qchris: Manual failover of Hadoop namenode from analytics1001 to analytics1002, as analytics1001 had Heap space errors
  • 07:49 qchris: Manual failover of Hadoop namenode from analytics1002 to analytics1001, as analytics1002 had Heap space errors

January 30[edit]

  • 20:21 ottomata: deployed refinery 0.0.4
  • 19:37 ottomata: released refinery 0.0.4

January 25[edit]

  • 21:53 qchris: Marked raw text webrequest partition for 2015-01-24T00/1H ok (See task T87545)

January 23[edit]

  • 22:46 qchris: Marked raw upload webrequest partition for 2015-01-16T12/1H ok (The partition only needed deduping)
  • 22:23 qchris: Marked raw upload webrequest partition for 2015-01-16T01/1H ok (The partition only needed deduping)
  • 22:11 qchris: Marked raw upload webrequest partition for 2015-01-15T17/1H ok (The partition only needed deduping)
  • 22:04 qchris: Marked raw text webrequest partition for 2015-01-15T15/1H ok (The partition only needed deduping)
  • 22:01 qchris: Marked raw mobile webrequest partition for 2015-01-16T01/1H ok (The partition only needed deduping)

January 15[edit]

  • 08:25 qchris: Ran kafka leader re-election to bring analytics1021 back into the set of leaders

January 10[edit]

January 6[edit]

  • 12:15 qchris: Marked raw mobile+text webrequest partitions for 2015-01-05T17/1H ok (See task T85918)

January 4[edit]

  • 12:06 qchris: Marked raw mobile and upload webrequest partition for 2015-01-03T10/1H ok (See task T85758)

January 2[edit]

  • 21:21 qchris: Ran kafka leader re-election to bring analytics1021 back into the set of leaders
  • 21:07 qchris: Marked raw bits, text, and upload webrequest partition for 2014-12-11T14/1H ok (See task T85712)
  • 19:05 qchris: Marked raw text+upload webrequest partitions for 2014-12-26T06/1H ok (See task T85709)
  • 15:51 qchris: Marked raw text webrequest partition for 2014-12-11T20/1H ok (See task T85699)
  • 12:39 qchris: Marked raw mobile webrequest partition for 2014-12-29T17/1H ok (See task T85695)
  • 11:21 qchris: Marked raw text webrequest partition for 2014-12-30T20/1H ok (See task T85692)

January 1[edit]

  • 20:26 qchris: Marked raw webrequest partitions for 2014-12-10T14/2H ok (See task T85675)