From nobody@hyperreal.com Thu Feb 6 07:23:30 1997 Received: by taz.hyperreal.com (8.8.4/V2.0) id HAA26040; Thu, 6 Feb 1997 07:23:30 -0800 (PST) Message-Id: <199702061523.HAA26040@taz.hyperreal.com> Date: Thu, 6 Feb 1997 07:23:30 -0800 (PST) From: Rob Leggett Reply-To: rleggett@mcs.kent.edu To: apbugs@hyperreal.com Subject: multiple kernel panics under SunOS 4.1.3_U1 sun4m since upgrading to 1.2b6 X-Send-Pr-Version: 3.2 >Number: 164 >Category: os-sunos >Synopsis: multiple kernel panics under SunOS 4.1.3_U1 sun4m since upgrading to 1.2b6 >Confidential: no >Severity: critical >Priority: medium >Responsible: apache >State: closed >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Thu Feb 6 07:30:01 1997 >Last-Modified: Sun Jun 29 17:36:41 PDT 1997 >Originator: rleggett@mcs.kent.edu >Organization: >Release: 1.2b6 >Environment: SunOS hermes 4.1.3_U1 4 sun4m, gcc version 2.6.3 >Description: We are getting multiple kernel panics after upgrading to 1.2b6. This is the third panic we received in the last two days. Folowing is the dump from the messages file. Notice the line: "pid 10160, `httpd': Data access exception" I am going back to 1.1 to and will see if the panics stop. Feb 6 09:25:52 hermes vmunix: BAD TRAP: cpu=3 type=9 rp=f1aced54 addr=20 mmu_fsr=326 rw=1 Feb 6 09:25:52 hermes vmunix: MMU sfsr=326: Invalid Address on supv data fetch at level 3 Feb 6 09:25:52 hermes vmunix: regs at f1aced54: Feb 6 09:25:52 hermes vmunix: psr=1f4010c7 pc=f0020654 npc=f0020658 Feb 6 09:25:52 hermes vmunix: y: 40000000 g1: f0020648 g2: effffd98 g3: ffffff00 Feb 6 09:25:52 hermes vmunix: g4: 400 g5: f1acf000 g6: 0 g7: 0 Feb 6 09:25:52 hermes vmunix: o0: ffbfffff o1: 88001 o2: f1acefe0 o3: 0 Feb 6 09:25:52 hermes vmunix: o4: f1acefe0 o5: f1acf000 sp: f1aceda0 ra: fd00a800 Feb 6 09:25:52 hermes vmunix: pid 10160, `httpd': Data access exception Feb 6 09:25:52 hermes vmunix: kernel read fault at addr=0x20, pme=0x0 Feb 6 09:25:52 hermes vmunix: MMU sfsr=326: Invalid Address on supv data fetch at level 3 Feb 6 09:25:52 hermes vmunix: rp=0xf1aced54, pc=0xf0020654, sp=0xf1aceda0, psr=0x1f4010c7, context=0xf6 Feb 6 09:25:52 hermes vmunix: g1-g7: f0020648, effffd98, ffffff00, 400, f1acf000, 0, 0 Feb 6 09:25:52 hermes vmunix: Begin traceback... sp = f1aceda0 Feb 6 09:25:52 hermes vmunix: Called from f0063c1c, fp=f1acee00, args=1 ff66328c 6 1 f1aceeb0 0 Feb 6 09:25:52 hermes vmunix: Called from f006697c, fp=f1acee60, args=ff66328c 6 1 ff654a80 0 ff654a80 Feb 6 09:25:52 hermes vmunix: Called from f013fbd8, fp=f1aceec0, args=f1acefe0 348 f01bc1d8 f1369cfc ff654a80 f1acefe0 Feb 6 09:25:52 hermes vmunix: Called from f0005cf4, fp=f1acef58, args=f1acf000 f1acefb4 f1acefe0 f1acf000 f1acf000 f1acefb4 Feb 6 09:25:52 hermes vmunix: Called from 4e4c, fp=effffc60, args=7 6 1 effffcc4 4 f1acefb4 Feb 6 09:25:52 hermes vmunix: End traceback... Feb 6 09:25:52 hermes vmunix: panic on cpu 3: Data access exception Feb 6 09:25:52 hermes vmunix: syncing file systems... done Feb 6 09:25:52 hermes vmunix: 06252 low-memory static kernel pages Feb 6 09:25:52 hermes vmunix: 04104 additional static and sysmap kernel pages Feb 6 09:25:52 hermes vmunix: 00000 dynamic kernel data pages Feb 6 09:25:52 hermes vmunix: 01336 additional user structure pages Feb 6 09:25:52 hermes vmunix: 00000 segmap kernel pages Feb 6 09:25:52 hermes vmunix: 00000 segvn kernel pages Feb 6 09:25:52 hermes vmunix: 00436 current user process pages Feb 6 09:25:52 hermes vmunix: 01424 user stack pages Feb 6 09:25:52 hermes vmunix: 13552 total pages (3388 chunks) Feb 6 09:25:52 hermes vmunix: Feb 6 09:25:52 hermes vmunix: dumping to vp fb01ce54, offset 1454200 Feb 6 09:25:52 hermes vmunix: VAC ENABLED in COPYBACK mode Feb 6 09:25:52 hermes vmunix: SunOS Release 4.1.3_U1 (HERMES) #4: Tue Jun 13 12:36:29 EDT 1995 Feb 6 09:25:52 hermes vmunix: Copyright (c) 1983-1993, Sun Microsystems, Inc. Feb 6 09:25:52 hermes vmunix: cpu = SUNW,SPARCsystem-600 Feb 6 09:25:52 hermes vmunix: mod0 = Ross,RT625 (mid = 8) Feb 6 09:25:52 hermes vmunix: mod1 = Ross,RT625 (mid = 9) Feb 6 09:25:52 hermes vmunix: mod2 = Ross,RT625 (mid = 10) Feb 6 09:25:52 hermes vmunix: mod3 = Ross,RT625 (mid = 11) Feb 6 09:25:52 hermes vmunix: mem = 523912K (0x1ffa2000) Feb 6 09:25:52 hermes vmunix: avail mem = 510738432 Feb 6 09:25:52 hermes vmunix: cpu0 at Mbus 0x8 0x224000 Feb 6 09:25:52 hermes vmunix: cpu1 at Mbus 0x9 0x228000 Feb 6 09:25:52 hermes vmunix: cpu2 at Mbus 0xa 0x22c000 Feb 6 09:25:52 hermes vmunix: cpu3 at Mbus 0xb 0x230000 Feb 6 09:25:52 hermes vmunix: entering multiprocessor mode Feb 6 09:25:52 hermes vmunix: Ethernet address = 8:0:20:f:1a:e9 Feb 6 09:25:52 hermes vmunix: dma0 at SBus slot f 0x81000 Feb 6 09:25:52 hermes vmunix: esp0 at SBus slot f 0x80000 pri 4 (onboard) Feb 6 09:25:52 hermes vmunix: sd0 at esp0 target 3 lun 0 Feb 6 09:25:52 hermes vmunix: sd0: Feb 6 09:25:52 hermes vmunix: sd2 at esp0 target 2 lun 0 Feb 6 09:25:52 hermes vmunix: sd2: Feb 6 09:25:52 hermes vmunix: sd16 at esp0 target 5 lun 0 Feb 6 09:25:52 hermes vmunix: sd16: Feb 6 09:25:52 hermes vmunix: sr0 at esp0 target 6 lun 0 Feb 6 09:25:52 hermes vmunix: lebuffer0 at SBus slot f 0x40000 Feb 6 09:25:52 hermes vmunix: le0 at SBus slot f 0x60000 pri 6 (onboard) Feb 6 09:25:52 hermes vmunix: zs0 at obio 0x100000 pri 12 (onboard) Feb 6 09:25:52 hermes vmunix: zs1 at obio 0x0 pri 12 (onboard) Feb 6 09:25:52 hermes vmunix: audio0 at obio 0x500000 pri 13 (onboard) Feb 6 09:25:52 hermes vmunix: root on sd0a fstype 4.2 Feb 6 09:25:52 hermes vmunix: swap on sd0b fstype spec size 781320K Feb 6 09:25:52 hermes vmunix: dump on sd0b fstype spec size 781300K >How-To-Repeat: I looked in the access_log, saw accesses to pages at 09:20 and after the reboot, but nothing at 09:25. >Fix: n >Audit-Trail: State-Changed-From-To: open-analyzed State-Changed-By: marc State-Changed-When: Fri Feb 7 11:46:23 PST 1997 State-Changed-Why: Could this be due to connections building up in the FIN_WAIT_2 state? Do a netstat after running 1.2b6 for a while and see if there are lots of connections in FIN_WAIT_2. See http://www.apache.org/docs/misc/fin_wait_2.html for some details on that problem. If the problem remains when you go back to 1.1.x and it wasn't happening before, you may want to see if you have hardware problems. State-Changed-From-To: analyzed-feedback State-Changed-By: fielding State-Changed-When: Mon Feb 10 04:41:15 PST 1997 State-Changed-Why: If it does appear to be related to the FIN_WAIT_2 problem, then you should disable KeepAlives (see httpd.conf). There are problems with current browser implementations of HTTP Keep-Alive that will cause a server OS without a FIN_WAIT_2 timeout (as is the case for SunOS4) to exceed the kernel memory allocation for TCP buffers. Please let us know if you can narrow the problem. State-Changed-From-To: feedback-closed State-Changed-By: dgaudet State-Changed-When: Sun Jun 29 17:36:41 PDT 1997 State-Changed-Why: We're now recommending SunOS4 systems put "KeepAlive off" into their httpd.conf files (in place of the KeepAlive on already there). This should solve this problem. Thanks for using Apache. Dean >Unformatted: