Received: (qmail 50232 invoked by uid 501); 11 Mar 2002 07:53:55 -0000 Message-Id: <20020311075355.50231.qmail@apache.org> Date: 11 Mar 2002 07:53:55 -0000 From: Gyula "Mészáros" Reply-To: mgy@borland.hu To: submit@bugz.apache.org Subject: declarations do not accept accented characters. It works in 1.3.23. X-Send-Pr-Version: 3.110 >Number: 10125 >Category: general >Synopsis: Need to strip first three bytes of microsoftish utf-8 misencoding for .conf/.htaccess >Confidential: no >Severity: non-critical >Priority: medium >Responsible: apache >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Mon Mar 11 00:00:00 PST 2002 >Closed-Date: Thu Apr 04 20:24:18 PST 2002 >Last-Modified: Thu Apr 04 20:24:18 PST 2002 >Originator: mgy@borland.hu >Release: 2.0.32 >Organization: >Environment: Win2000 OEM ENU (English US) SP2 >Description: I tried to biuld a ScriptAlias for a directory which contained an accented "E" ("É") in the middle. The ScriptAlias declaration itself was accepted but the corresponding declaration caused Apache service not to start anymore (ApacheMonitor has died when I tried to use it to restart the service after the configuration change.) There was no entry in the error log. When the character in question was replaced with its accentless counterpart the things run smoothly. I remembered it worked earlier with 1.3.23 on the same PC. I could try it with 1.3.22 on WinNT 4 OEM ENU SP6.0a and it worked, too. So gimme back my accented letters, please! :-) >How-To-Repeat: (I think it should be straightforward.) >Fix: (Unfortunately not.) >Release-Note: >Audit-Trail: From: =?ISO-8859-1?Q?M=E9sz=E1ros?= Gyula To: "William A. Rowe, Jr." Cc: apbugs@Apache.Org Subject: Re: general/10125: declarations do not accept accented characters. It works in 1.3.23. Date: Tue, 12 Mar 2002 07:50:14 +0100 Hi Bill, I made some tests and really, if I save httpd.conf from Notepad in UTF-8 format and then delete the first 3 bytes (the UTF-8 header) with another editor, then Apache service starts and uses the directories containing accented characters normally. So I can live with it for now and wait for the next version. Thanks a lot for your efforts! Best regards: Gyula Mészáros Borland Hungary William A. Rowe, Jr. wrote: > Then Apache has a bug [well, sort of] because Microsoft likes to add the > FEFF signature, > utf-8 encoded, as the first three bytes. > > I will look at accepting utf-8 files with those three leading > characters, possibly look > at utf-8 decoding a real Unicode file, and possibly at decoding 'local > code pages'. > But for the most part, Apache 2.0 was redesigned [on Windows] to deal > with all > files, commands, etc by using utf-8, which is a text format [e.g. isn't > wide chars], > that allows us to access any file on a FAT-32 or NTFS volume. > > Look for at least that first feature to be fixed by 2.0.34. Would .conf > files in > utf-8 solve your entire problem? > > Yours, > > Bill > > At 08:14 AM 3/11/2002, you wrote: > >> Sorry (sometimes I tend to think I understand something about >> computers, but then there are moments...). >> >> The result: if I save the httpd.conf in UTF-8 from Notepad of Windows >> 2000 then Apache service does not start even if there is no accented >> character in the .conf file at all. (The one saved in ANSI format >> still lets Apache start if the accented character appears only in the >> ScriptAlias line.) >> >> Gyula >> >> >> William A. Rowe, Jr. wrote: >> >>> Gyula, >>> I didn't ask you to send the file, I asked if you would open it in >>> notepad, >>> and choose File - Save As - Encoding: UTF-8. >>> Apache doesn't understand 'character sets' or 'code pages', but the >>> APR >>> library now handles -every- filename as UTF-8. That means requests to >>> every possible filename work, and you should be able to use any text as >>> a directory name, or log file name, and so on. >>> Save the conf file as UTF-8 and let me know if that solves your >>> problem. >>> Bill >> >> >> > > State-Changed-From-To: open-analyzed State-Changed-By: wrowe State-Changed-When: Mon Mar 25 23:55:57 PST 2002 State-Changed-Why: Thanks for the detailed experiments ... pending for the patch resolution to .34. Synopsis-Changed-From: declarations do not accept accented characters. It works in 1.3.23. Synopsis-Changed-To: Need to strip first three bytes of microsoftish utf-8 misencoding for .conf/.htaccess Synopsis-Changed-By: wrowe Synopsis-Changed-When: Mon Mar 25 23:55:57 PST 2002 State-Changed-From-To: analyzed-closed State-Changed-By: wrowe State-Changed-When: Thu Apr 4 20:24:18 PST 2002 State-Changed-Why: This will be fixed in 2.0.34. The new patch simply eats those first three bytes of EF BB BF whenever a config file is opened [that will -include- any .htaccess or Include foo.conf directives.] So we are now at 80% on utf-8 i18n. Too bad it's always the last 20% that eat up 80% of the effort :) Thanks for your reports and thorough experiments! >Unformatted: [In order for any reply to be added to the PR database, you need] [to include in the Cc line and make sure the] [subject line starts with the report component and number, with ] [or without any 'Re:' prefixes (such as "general/1098:" or ] ["Re: general/1098:"). If the subject doesn't match this ] [pattern, your message will be misfiled and ignored. The ] ["apbugs" address is not added to the Cc line of messages from ] [the database automatically because of the potential for mail ] [loops. If you do not include this Cc, your reply may be ig- ] [nored unless you are responding to an explicit request from a ] [developer. Reply only with text; DO NOT SEND ATTACHMENTS! ]