From nobody@hyperreal.com Tue Dec 3 10:45:01 1996 Received: by taz.hyperreal.com (8.8.3/V2.0) id KAA15180; Tue, 3 Dec 1996 10:45:01 -0800 (PST) Message-Id: <199612031845.KAA15180@taz.hyperreal.com> Date: Tue, 3 Dec 1996 10:45:01 -0800 (PST) From: Richard Drage Reply-To: richard@proweb.net To: apbugs@hyperreal.com Subject: Child processes don't expire X-Send-Pr-Version: 3.2 >Number: 23 >Category: os-bsdi >Synopsis: Child processes don't expire >Confidential: no >Severity: critical >Priority: medium >Responsible: apache >State: closed >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Tue Dec 3 10:50:01 1996 >Last-Modified: Thu Feb 27 08:29:38 PST 1997 >Originator: richard@proweb.net >Organization: >Release: 1.1.1 >Environment: BSDI 2.1 Pentium p166+, 64 meg ram, 3x2.1 gig ide h/ds gcc (flags -m486) >Description: Child processes don't expire and hang in 'w' Sending Reply state. Effect: processes, so far always in pairs, comsume all available processor time, server stops ! Both appeared to be posting to gci scripts, both local and remote. Noticed when running top on console. Examples: Acc M Cpu SS Conn Child Slot Host 1/19/1118 W 0.20 100542 50.1 0.10 5.31 ppp00.proweb.co.uk POST 1/21/372 W 0.27 190884 0.9 0.04 0.97 www-proxy.pobox.co.uk POST >How-To-Repeat: Problem occurs after approx 24 hours during busy periods only >Fix: I currently restart the server every 24 hours and the problem hasn't occured since, but it's a pain if I forget >Audit-Trail: State-Changed-From-To: open-closed State-Changed-By: fielding State-Changed-When: Wed Dec 4 17:47:58 PST 1996 State-Changed-Why: I believe we fixed this in 1.2b1. It was caused by a missing reduction of the "remaining" counter on the loop that ate additional request input after a CGI dies. This code was entirely replaced in 1.2b1, with care taken not to repeat the reported bug. Responsible-Changed-From-To: gnats-admin (GNATS administrator)-fielding Responsible-Changed-By: fielding Responsible-Changed-When: Wed Dec 4 17:47:58 PST 1996 Responsible-Changed-Why: Roy wrote the new code in mod_cgi.c that should prevent this problem in 1.2b1 State-Changed-From-To: closed-analyzed State-Changed-By: fielding State-Changed-When: Mon Dec 9 19:32:48 PST 1996 State-Changed-Why: Nope, wasn't fixed after all. The problem appears to be related to BSDI and many processes being left in FIN_WAIT_2 state, which is what happens when a browser cuts-off the connection prematurely. You can see this by running 'netstat -f inet -n'. It is possible that the only way to fix this is to fix the OS so that it sets a timeout for FIN_WAIT_2, which I believe was discussed on some newsgroups a while back. Input from the BSDI folks would be desirable. Responsible-Changed-From-To: fielding-gnats-admin Responsible-Changed-By: fielding Responsible-Changed-When: Mon Dec 9 19:32:48 PST 1996 Responsible-Changed-Why: Needs BSDI owner State-Changed-From-To: analyzed-feedback State-Changed-By: marc State-Changed-When: Sat Feb 15 20:33:53 PST 1997 State-Changed-Why: Is this problem with child processes hanging in the sending reply state still an issue? State-Changed-From-To: feedback-closed State-Changed-By: marc State-Changed-When: Thu Feb 27 08:29:37 PST 1997 State-Changed-Why: User reports problem gone under 1.2b7. >Unformatted: