Last check: 3.1.2012
March 2012
- 01.03.2012: tcx090, tcx110, tcx130 rebooted by IT: CVMFS installation
February 2012
- 28.02.2012: tcx080, tcx120, tcx110 rebooted by IT: CVMFS installation
- 10.02.2012: tcx120 rebooted by IT: [09:06] memory problem (janetd)
- 09.02.2012: tcx110 rebooted by IT: [11:43] memory problem (janetd?)
- 09.02.2012: tcx120 rebooted by IT: [08:48] memory problem (janetd)
January 2012
- 18.01.2012: all HH rebooted by IT: ???
- 12.01.2012: tcx090 rebooted by IT: [08:52] maintenance
- 10.01.2012: all rebooted by IT: [18:00] maintenance
- 09.01.2012: tcx130 rebooted by IT: [20:03] lustre errors, no login possible
- 06.01.2012: tcx080 rebooted by IT: [15:26]
December 2011
- 16.12.2011: tcx080 rebooted by IT: [09:45]
- 15.12.2011: tcx080 rebooted by IT: memory problems
- 05.12.2011: tcx110 rebooted by IT [15:08]: lustre errors, no login possible
- 05.12.2011: tcx120 rebooted by IT [11:14]: lustre errors, no login possible
November 2011
- 04.11.2011: tcx110 rebooted by IT [13:29]: out of memory (swapping) (janetd?, proof master)
September 2011
- 01.09.2011: tcx090 rebooted by IT: Zeuthen downtime (cooling)
August 2011
- 25.08.2011: all rebooted by IT: downtime
- 02.08.2011: tcx100 rebooted by IT [14:37]: out of memory (luzgomez, sframe)
June 2011
- 23.06.2011: tcx090 rebooted by IT [08:33]: problems with installation (AFS affected)
- 03.06.2011: tcx080 rebooted by IT [07:24]: out of memory?
- 07.06.2011: tcx120 rebooted by IT [20:09]: out of memory?
- 09.06.2011: tcx110 rebooted by IT [15:32]: out of memory?
May 2011
- 10.05.2011: tcx080,tcx090: AFS client upgrade
- 11.05.2011: tcx100,tcx110: AFS client upgrade
- 12.05.2011: tcx120,tcx130: AFS client upgrade
- 20.05.2011: tcx110 reboot by IT [12:16]: Lustre(?) problems in the night
March 2011
- 28.03.2011: tcx100 reboot by IT [04:28] sar log stopped at 11:02pm of 15.3.2011, memory problem, gfischer?
gfischer pts/6 tcsh16-vm1.naf.d Fri Mar 25 22:14 - crash (2+07:13) luzgomez pts/3 tcsh15-vm1.naf.d Fri Mar 25 19:37 - crash (2+09:49) luzgomez pts/1 tcsh15-vm1.naf.d Fri Mar 25 19:25 - crash (2+10:01) katzy pts/5 localhost:14.0 Thu Mar 24 09:37 - crash (3+19:50) katzy pts/4 tcsh16-vm1.naf.d Thu Mar 24 09:36 - crash (3+19:50) leffhalm pts/0 tcsh16-vm1.naf.d Tue Mar 22 09:02 - crash (5+20:24)
- 22.3.2011: all interactive by IT [06:20] maintenance slot
- 21.3.2011: tcx130 reboot by IT [06:58] sar log stopped at 8:50pm of 19.3.2011, memory problem, gfischer?
gfischer pts/21 tcsh5-vm1.naf.de Sat Mar 19 20:26 - crash (1+10:31) wildt pts/19 tcsh6-vm1.naf.de Sat Mar 19 20:12 - crash (1+10:46) wasicki pts/18 tcsh6-vm1.naf.de Sat Mar 19 20:02 - crash (1+10:55) glazov pts/7 tcsh5-vm1.naf.de Sat Mar 19 15:45 - crash (1+15:12) luzgomez pts/4 tcsh5-vm1.naf.de Sat Mar 19 13:39 - crash (1+17:19) glazov pts/0 tcsh5-vm1.naf.de Sat Mar 19 08:23 - crash (1+22:35) leyton pts/2 tcsh5-vm1.naf.de Sat Mar 19 07:14 - crash (1+23:44) leyton pts/1 tcsh5-vm1.naf.de Sat Mar 19 07:14 - crash (1+23:44) leyton pts/22 tcsh6-vm1.naf.de Fri Mar 18 10:06 - crash (2+20:52) leffhalm pts/17 tcsh6-vm1.naf.de Fri Mar 18 09:50 - crash (2+21:08) mbecking pts/43 tcsh5-vm1.naf.de Thu Mar 17 16:03 - crash (3+14:55) mbecking pts/24 tcsh6-vm1.naf.de Thu Mar 17 11:40 - crash (3+19:18) tkohno pts/11 tcsh5-vm1.naf.de Thu Mar 17 10:00 - crash (3+20:58) tkohno pts/13 tcsh6-vm1.naf.de Tue Mar 15 14:30 - crash (5+16:28) warsinsk pts/5 tcsh5-vm1.naf.de Fri Mar 11 21:33 - crash (9+09:25)
- 18.3.2011: tcx100 reboot by IT [06:46] sar log stopped at 10pm of 17.3.2011, memory problem, gfischer?
luzgomez pts/27 tcsh6-vm1.naf.de Thu Mar 17 21:56 - crash (08:50) gfischer pts/42 tcsh5-vm1.naf.de Thu Mar 17 21:41 - crash (09:05) stanescu pts/41 tcsh6-vm1.naf.de Thu Mar 17 21:21 - crash (09:25) wasicki pts/2 tcsh5-vm1.naf.de Thu Mar 17 20:10 - crash (10:36) wildt pts/36 :pts/29:S.0 Thu Mar 17 15:23 - crash (15:23) wildt pts/29 tcx130.naf.desy. Thu Mar 17 15:23 - crash (15:23) efeld pts/26 tcsh5-vm1.naf.de Thu Mar 17 13:05 - crash (17:41) almutp pts/40 tcsh5-vm1.naf.de Thu Mar 17 12:23 - crash (18:23) almutp pts/39 tcsh5-vm1.naf.de Thu Mar 17 12:23 - crash (18:23) almutp pts/38 tcsh5-vm1.naf.de Thu Mar 17 12:22 - crash (18:23) almutp pts/37 tcsh5-vm1.naf.de Thu Mar 17 12:22 - crash (18:24) efeld pts/33 tcsh5-vm1.naf.de Thu Mar 17 10:47 - crash (19:59) mzvolsky pts/10 tcsh5-vm1.naf.de Thu Mar 17 09:19 - crash (21:27) efeld pts/7 tcsh5-vm1.naf.de Wed Mar 16 15:38 - crash (1+15:08) efeld pts/34 tcsh5-vm1.naf.de Wed Mar 16 12:02 - crash (1+18:44) wolter pts/16 tcsh5-vm1.naf.de Wed Mar 16 10:55 - crash (1+19:51) efeld pts/17 tcsh5-vm1.naf.de Mon Mar 14 17:18 - crash (3+13:28) efeld pts/1 tcsh5-vm1.naf.de Mon Mar 14 17:12 - crash (3+13:34) mijovic pts/4 tcsh6-vm1.naf.de Sat Mar 12 19:36 - crash (5+11:10) warsinsk pts/3 tcsh6-vm1.naf.de Fri Mar 11 19:37 - crash (6+11:09) boehler pts/9 tcsh16-vm1.naf.d Fri Mar 11 10:16 - crash (6+20:29) katzy pts/21 tcsh5-vm1.naf.de Wed Mar 9 16:14 - crash (8+14:32) finnern pts/11 tcsh6-vm5.naf.de Tue Mar 8 10:33 - crash (9+20:13)
February 2011
- 28.2.2011: tcx080 reboot [18:29] lustre problems (IT asked for advice)
- 24.2.2011: tcx100 reboot [13:35] had problems
- 1.2.2011: all work group server [morning] update kernels in downtime (firewall upgrade)
January 2011
- 17.1.2011: tcx080 reboot [17 15:44] kswap problem (high load) ATLAS not notified
- 6.1.2011: tcx040 reboot [10:39] kswap problem (high load) WE reported
- 6.1.2011: tcx080 reboot [10:26] kswap problem (high load) WE reported
- 6.1.2011: tcx120 reboot [11:21] kswap problem (high load) WE reported
December 2010
- 15.12.2010: tcx080 reboot [11:37] kswap problem
- 14.12.2010: tcx120 reboot [07:54] ??? (sar: normal load)
13.12.2010: tcx060 reboot [08:58] no ssh login possible, nothing obvious from IT (sar: high load -> kswapd)
- 9.12.2010: tcx080 reboot [10:58] kswap problem
8.12.2010: tcx120 reboot [18:04] nothing obvious (sar: high load -> kswapd)
November 2010
- 17.11.2010: tcx080 reboot [14:27] under investigation
- 17.11.2010: tcx120 reboot [09:49] two long running jobs, not killable
- 11.11.2010: tcx040, tcx060 reboot [09:04] reboot for AFS client update (reduce dead lock)
- 8.11.2010: tcx080, tcx120 [09:11] reboot for AFS client update (reduce dead lock)
October 2010
- 11.10.2010: tcx040 reboot [6:27]
- 6.10.2010: tcx040 reboot [8:34] (offline due to hardware problems since 23.9.2010, partly replaced by tutorial machines)
- 5.10.2010: dCache problems: crash on one pool - same as 1.10.2010 (LOCALGROUPDISK ())
- 4.10.2010: dCache problems: problems with dcache-atlas17-02, DATADISK
- 1.10.2010: dCache problems: crash on two pools (LOCALGROUPDISK (13597 recoverd, 8+18+2697 lost), DATADISK (13901 recovered, 40 lost (SAM test, ...)))
September 2010
- 23.9.2010:
- all work group server reboots due to down time
- 18.9.2010:
- reboot all work group server due to kernel update [14:45]
- 9.9.2010:
- various logging/AFS problems (10:00)
- tcx120 reboot [11:28]
- 1.9.2010:
- various logging/AFS problems (13:40)
August 2010
- 30.8.2010:
- tcx060 reboot [07:06] ? (29.8.2010)
- various logging/AFS problems (11:05, 11:35, 17:21)
- 18.8.2010:
- tcx120 reboot [11:30] lustre problems
- tcx080 reboot [12:27], memory full (started around 11:14)
- login problem (around 11:14)
- 16.8.2010:
- load balancing [around 11:00] (two machines off, all to tcx060 with load of 6, tcx080 load of 0.9)
- tcx120 reboot (13:08), off since 11.8.2010
- tcx040 reboot (12:59), off since 12.8.2010
- tcx080 reboot (15:11), see 7.8.2010
- Lustre problem: /scratch/hh/lustre/atlas/users/mwildt/FirstData/SFrame_Output/DATA_7TeV (14:16)
- 11.8.2010:
- Various, short loging problems (whole day) AFS instabilities
- 10.8.2010:
- Varous, short login problems, AFS instabilities
- 7.8.2010:
- tcx080 needs reboot due to infiniband problems (blocked for users in load balancing only), reboot on 16.8.2010 (missing ATLAS support feedback due to vacation)
- 5.8.2010:
- reboot tcx060 (19:03)
- 2.8.2010:
- reboot tcx040 (13:37, 12:50) drained, IB problems
- reboot tcx080 (15:20) 100% swap
July 2010
- 15.7.2010:
- maintenance with kernel upgrade
- 27.7.2010:
- load balancing problem [10:30-10:45]
- load balancing problem [13:40-14:10]
- reboot of tcx080 (10:47, 14:18), tcx060 (11:03), tcx120 (10:48, 14:21)
- problem with PROOF lite and kswapd
- 28.7.2010:
- reboot tcx060 (12:31), tcx080 (12:39)
- problem with PROOF lite and kswapd