Received: (qmail 16371 invoked by uid 2012); 13 Nov 1999 04:59:55 -0000 Message-Id: <19991113045955.16370.qmail@hyperreal.org> Date: 13 Nov 1999 04:59:55 -0000 From: Zoli Kiss Reply-To: zoli1000@yahoo.com To: apbugs@hyperreal.org Subject: When server side includes are enabled paths such as: http://server/index.html/foo/foo/ are accepted. X-Send-Pr-Version: 3.2 >Number: 5300 >Category: mod_include >Synopsis: When server side includes are enabled paths such as: http://server/index.html/foo/foo/ are accepted. >Confidential: no >Severity: critical >Priority: medium >Responsible: apache >State: closed >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Fri Nov 12 21:10:01 PST 1999 >Last-Modified: Sat Nov 27 19:50:01 PST 1999 >Originator: zoli1000@yahoo.com >Organization: >Release: 1.3.6 >Environment: Solaris 2.6 Latest patches >Description: When server side includes are enabled, I can enter additional junk on the end of the URL, and Apache does not complain. This may not really be an error, but somehow while Verity is spidering the site, it gets caught in endless loops trying to insert URLs like: http://www.server.com/index.html/IT/info/ If I change my Option line in access.conf, and remove Includes, then I get a page not found error, as I would expect, from Apache. I thought I got rid of this problem when I disabled MultiViews, but I guess not. I cannot disable SSI since tons of our pages use it, but I can't get Verity to work correctly either. I would be very happy to receive any fixes, work arounds, comments, etc. Please Help Thanks, Zoli >How-To-Repeat: To duplicate this, type in the following URL: http://www.apache.org/index.html/foo/foo/foo/foo/ Then click on one of the relative links, like FAQ or Foundation You will see that it will not make it to these URLs, but it will accept them. If you update the conf file, and remove the Include Option, this same URL will not be allowed. >Fix: I guess, the mod_include code would need to verify that the full/entire path is valid ? >Audit-Trail: State-Changed-From-To: open-closed State-Changed-By: marc State-Changed-When: Fri Nov 12 21:21:47 PST 1999 State-Changed-Why: That is correct, and the behaviour is that way on purpose. There are lots of ways a spider can be too dumb for itself; you can't prevent against stupidity. There are numerous other PRs that explain the reasons for the behaviour in more detail. From: Zoli Kiss To: marc@apache.org Cc: apbugs@Apache.Org Subject: Re: mod_include/5300: When server side includes are enabled paths such as: http://server/index.html/foo/foo/ are accepted. Date: Sun, 14 Nov 1999 22:50:52 -0800 (PST) Hi Marc, I didn't realize that I could do searches against each module in the PR database; I guess it was just too late on a Friday night ... I was trying to find the info doing free text searches. Sorry. I've searched through all of the PRs for mod_include, but did not find any answers/solutions. Personally, I don't like Verity all that much, but this problem is possibly related to user error, i.e. links like: http://server/index.html/ Since the site I administer has 20000+ links, I can't catch everything. I don't think that it's Verity's fault when the web server allows links like: http://server/index.html/foo/junk/junk/ This doesn't work when SSIs are turned off. Shouldn't a ? be used to separate a path from path_info? I can't understand why there is a need for this, the standard server config, i.e. without SSIs doesn't allow this behavior. I don't see how a search engine can combat this, and I don't think that this can be blamed on Verity, or any standard spidering tools. Is there no work around for this ? I didn't see an explanation of why there would be a need for: http://server/index.html/foo/foo/ instead of: http://server/index.html?/foo/foo/ Thanks, Zoli --- marc@apache.org wrote: > > > Synopsis: When server side includes are enabled > paths such as: http://server/index.html/foo/foo/ are > accepted. > > State-Changed-From-To: open-closed > State-Changed-By: marc > State-Changed-When: Fri Nov 12 21:21:47 PST 1999 > State-Changed-Why: > That is correct, and the behaviour is that way on > purpose. > There are lots of ways a spider can be too dumb for > itself; > you can't prevent against stupidity. > > There are numerous other PRs that explain the > reasons for > the behaviour in more detail. > > ===== __________________________________________________ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com From: Zoli Kiss To: marc@apache.org, apache-bugdb@apache.org Cc: apbugs@Apache.Org Subject: Re: mod_include/5300: When server side includes are enabled paths such as: http://server/index.html/foo/foo/ are accepted. Date: Mon, 15 Nov 1999 18:40:29 -0800 (PST) I just realized that I incorrectly stated something in my previous note. Just to clarify: I don't understand why html pages would every require path info. If a URL: http://server/index.cgi/foo/foo/ is used, this is fine, but if a URL: http://server/index.html/foo/foo/ is used, why is this ever needed ? Would it make sense to check if the requested file is executable, and then make a decision on whether or not to return a page not found error ? I agree that I don't quite understand the need for this on regular html files, but I could not find any answers in the PR database. Thanks ===== __________________________________________________ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com From: Rodent of Unusual Size To: Apache bug database Cc: Subject: Re: mod_include/5300: When server side includes are enabled paths such as: http://server/index.html/foo/foo/ are accepted. Date: Sat, 27 Nov 1999 22:45:52 -0500 Zoli Kiss wrote: > > I don't understand why html pages would every require > path info. If a URL: > http://server/index.cgi/foo/foo/ > is used, this is fine, but if a URL: > http://server/index.html/foo/foo/ > is used, why is this ever needed ? Because "index.html" might be a PHP file, or use server-side includes, or otherwise be more than just a text file. >Unformatted: [In order for any reply to be added to the PR database, you need] [to include in the Cc line and make sure the] [subject line starts with the report component and number, with ] [or without any 'Re:' prefixes (such as "general/1098:" or ] ["Re: general/1098:"). If the subject doesn't match this ] [pattern, your message will be misfiled and ignored. The ] ["apbugs" address is not added to the Cc line of messages from ] [the database automatically because of the potential for mail ] [loops. If you do not include this Cc, your reply may be ig- ] [nored unless you are responding to an explicit request from a ] [developer. Reply only with text; DO NOT SEND ATTACHMENTS! ]