From nobody@hyperreal.com Tue Apr 22 07:46:21 1997 Received: (from nobody@localhost) by hyperreal.com (8.8.4/8.8.4) id HAA07104; Tue, 22 Apr 1997 07:46:21 -0700 (PDT) Message-Id: <199704221446.HAA07104@hyperreal.com> Date: Tue, 22 Apr 1997 07:46:21 -0700 (PDT) From: Bob Ramstad Reply-To: rramstad@nfic.com To: apbugs@hyperreal.com Subject: httpd crashes so hard the system reboots X-Send-Pr-Version: 3.2 >Number: 454 >Category: os-sunos >Synopsis: httpd crashes so hard the system reboots >Confidential: no >Severity: critical >Priority: medium >Responsible: apache >State: closed >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Tue Apr 22 07:50:01 1997 >Last-Modified: Sun Jun 29 18:34:38 PDT 1997 >Originator: rramstad@nfic.com >Organization: >Release: 1.2b8 >Environment: SunOS 4.1.3_U1 rev B with all recommended patches gcc 2.7.2 >Description: BAD TRAP: cpu=0 type=9 rp=f0aacd54 addr=20 mmu_fsr=326 rw=1 MMU sfsr=326: Invalid Address on supv data fetch at level 3 regs at f0aacd54: psr=404010c5 pc=f001d37c npc=f001d380 y: 0 g1: f001d370 g2: 4b000 g3: ffffff00 g4: 13390000 g5: f0aad000 g6: 0 g7: 0 o0: ffbfffff o1: 88001 o2: f0aacfe0 o3: 0 o4: f0aacfXD f0112d68 pid 20111, `httpd': Data access exception kernel read fault at addr=0x20, pme=0x0 MMU sfsr=326: Invalid Address on supv data fetch at level 3 rp=0xf0aacd54, pc=0xf001d37c, sp=0xf0aacda0, psr=0x404010c5, context=0x6f3 g1-g7: f001d370, 4b000, ffffff00, 13390000, f0aad000, 0, 0 Begin traceback... sp = f0aacda0 Called from f005796c, fp=f0aace00, args=1 ff64340c 6 1 f0aaceb0 fd00a840 Called from f005a6cc, fp=f0aace60, args=ff64340c 6 1 ff64c180 0 ff64c180 Called from f0112d68, fp=f0aacec0, args=f0aacfe0 348 f017ea50 f0826ffc ff64c180 f0aacfe0 Called from f0005cd0, fp=f0aacf58, args=f0aad000 f0aacfb4 f0aacfe0 f0aad000 f0aa d000 f0aacfb4 Called from 505c, fp=effffc08, args=6 6 1 effffc6c 4 f0aacfb4 End traceback... panic on cpu 0: Data access exception syncing file systems... done >How-To-Repeat: i can't repeat it. this has happened about every 48 hours since i upgraded from 1.2b7 -- and i've gone back to using 1.2b7 as this is a production machine. >Fix: nope, but i'm happy to help in any way i can i.e. if you can tell me how to get a meaningful core file i'll be happy to do that. %0 >Audit-Trail: State-Changed-From-To: open-analyzed State-Changed-By: coar@decus.org State-Changed-When: Wed Apr 23 08:51:02 PDT 1997 State-Changed-Why: [Out-of-band discussions ongoing between submittor, Dean, and Randy.] From: coar@decus.org (Rodent of Unusual Size) To: apbugs@apache.org, Coar@decus.org Subject: Re: os-sunos/454: Date: Tue, 27 May 1997 12:47:07 -0400 [Message from customer entered as PR#625; text moved here since it's about this PR]. ------------------------------------------------------------------------ (hope this goes to the right list. this is regarding PR#454 which i filed a few months ago.) i upgraded to Apache 1.2b10 and removed one of the two processors on the machine. it crashed within 24 hours. if there is anything i can do to help debug the situation, please let me know. this problem has been consistent since 1.2b8 -- 1.2b7 seemed somewhat more stable but still had the problem. BAD TRAP: cpu=0 type=9 rp=f0bbdd54 addr=20 mmu_fsr=326 rw=1 MMU sfsr=326: Invalid Address on supv data fetch at level 3 regs at f0bbdd54: psr=404000c7 pc=f001d37c npc=f001d380 y: 0 g1: f001d370 g2: 8000000 g3: ffffff00 g4: 0 g5: f0bbe000 g6: 0 g7: 0 o0: 2 o1: f0bbde04 o2: 47000 o3: 44000 o4: 1000 o5: fb097738 sp: f0bbdda0 ra: 1000 pid 27029, `httpd': Data access exception kernel read fault at addr=0x20, pme=0x0 MMU sfsr=326: Invalid Address on supv data fetch at level 3 rp=0xf0bbdd54, pc=0xf001d37c, sp=0xf0bbdda0, psr=0x404000c7, context=0x270 g1-g7: f001d370, 8000000, ffffff00, 0, f0bbe000, 0, 0 Begin traceback... sp = f0bbdda0 Called from f005796c, fp=f0bbde00, args=1 ff65f70c 6 1 f0bbdeb0 200 Called from f005a6cc, fp=f0bbde60, args=ff65f70c 6 1 ff651a00 0 ff651a00 Called from f0112d68, fp=f0bbdec0, args=f0bbdfe0 348 f017ea50 f080bb18 ff651a00 f0bbdfe0 Called from f0005cd0, fp=f0bbdf58, args=f0bbe000 f0bbdfb4 f0bbdfe0 f0bbe000 f0bbe000 f0bbdfb4 Called from 50cc, fp=effffad8, args=6 6 1 effffb3c 4 f0bbdfb4 End traceback... panic on cpu 0: Data access exception syncing file systems... done 02560 low-memory static kernel pages 02572 additional static and sysmap kernel pages 00000 dynamic kernel data pages 01096 additional user structure pages 00000 segmap kernel pages 00000 segvn kernel pages 00424 current user process pages 01200 user stack pages 07852 total pages (1963 chunks) dumping to vp fb0039ec, offset 248760 -- Bob Ramstad | 222 Third Street, Suite 174 Banta NFIC | Cambridge, MA 02142 http://www.nfic.com/ | voice (617) 497-6811 x112 info@nfic.com | FAX (617) 441-9265 From: "Roy T. Fielding" To: Bob Ramstad Subject: Re: os-sunos/454 Date: Tue, 27 May 1997 15:27:04 -0700 Apache does not directly cause kernel panics -- they are due to either a hardware failure (e.g., bad SIMM) or a bug in the kernel which is triggered by something Apache is doing. In this case, you are probably running out of TCP mbufs, which occurs when KeepAlive is enabled and connections get stuck in FIN_WAIT_2 state. More info on that is at http://www.apache.org/docs/misc/fin_wait_2.html Please try setting KeepAlive Off in httpd.conf and let us know if that helps. Configure should already be setting -DNO_LINGCLOSE during the complile. ....Roy From: Bob Ramstad To: fielding@kiwi.ICS.UCI.EDU Subject: Re: os-sunos/454 Date: Thu, 29 May 1997 11:38:41 -0400 (EDT) in httpd.conf and let us know if that helps. Configure should already be setting -DNO_LINGCLOSE during the complile. Configure is NOT setting -DNO_LINGCLOSE during the compile. i will set this option manually and see what happens. -- Bob From: Bob Ramstad To: fielding@kiwi.ICS.UCI.EDU Subject: Re: os-sunos/454 Date: Thu, 29 May 1997 11:44:04 -0400 (EDT) err, yeah, not any coffee this morning yet. conf.h sets NO_LINGCLOSE if SUNOS4 is defined. i'll try turning keep alives off. -- Bob From: Marc Slemko To: Bob Ramstad Subject: Re: os-sunos/454: Date: Tue, 27 May 1997 23:54:06 -0600 (MDT) > ------------------------------------------------------------------------ > (hope this goes to the right list. this is regarding PR#454 which i > filed a few months ago.) > > i upgraded to Apache 1.2b10 and removed one of the two processors on > the machine. it crashed within 24 hours. > > if there is anything i can do to help debug the situation, please let > me know. this problem has been consistent since 1.2b8 -- 1.2b7 seemed > somewhat more stable but still had the problem. > Are you using compiler optimization (eg. -O or -O2 when compiling)? Apache defaults to doing so. Try disabling optimization; to do this, uncomment the "OPTIM=" line in the Configuration file and leave it blank. Verify that it is properly compiling without optimization by being sure there is no -O or -O2 flag on the gcc command lines when you make it. State-Changed-From-To: analyzed-closed State-Changed-By: dgaudet State-Changed-When: Sun Jun 29 18:34:38 PDT 1997 State-Changed-Why: We're now recommending that SunOS4 users use "KeepAlive off" in their httpd.conf. Without that setting it is known that the kernel will panic eventually when it runs out of mbufs. Please also upgrade to 1.2 release. Thanks Dean >Unformatted: