Re: Behavioural problem of cache/proxy (latest version)

Henrik Frystyk (frystyk@bay.lcs.mit.edu)
Thu, 6 Oct 1994 02:06:06 +0100

Date: Thu, 6 Oct 1994 02:06:06 +0100
Message-Id: <9410052205.AA00859@bay.lcs.mit.edu>
From: frystyk@bay.lcs.mit.edu (Henrik Frystyk)
To: Multiple recipients of list <www-proxy@www0.cern.ch>
Subject: Re: Behavioural problem of cache/proxy (latest version)

Hi

First let me correct a small mistake: The result of the following URL

http://echo.brunel.ac.uk:4040/path=<a_ufn_to_dn_path>?ufn=<ufn>

is when it has parsed the proxy

http://echo.brunel.ac.uk:4040/path%3D<a_ufn_to_dn_path>?ufn=<ufn>

and _not_

http://..../path%3D......?ufn%3D....

as stated. I will later explain why this is essential for understanding
the problem. The '=' _is_ a reserved character in the path according to
the URL specifications in RFC1630 - please take a look at

http://info.cern.ch/hypertext/WWW/Addressing/Addressing.html

for more information. However, it is _not_ illegal to have a '=' sign
after a ';' in the URL. The ';' indicates a set of parameters and both
WAIS and FTP URLs use it as a delimiter between the path of the URL and
a set of parameters. For FTP it is

;type=<DATA_TYPE>

and for WAIS it is the fields necessary for accessing a document from a
WAIS server. The server doesn't generally recognize the ';' as a
delimiter as this is a relatively new feature but I will fix this in
the next release (now it only works for WAIS)

Another thing is that there is nothing wrong in escaping parts of the
URL path which normally can be sent unescaped, so the URL generated by
the proxy as shown above is just as good as the unescaped one. However,
many clients (and scripts) are not aware of this :-(

The search string is _not_ specified in the RFC1630 and there are
therefore no escaped characters in this part of the URL. This is also
what is shown above.

Hence, the current version of the proxy _does_ behave correctly
according to the URL specs in this example even though there _is_ a bug
in the handling of ';'. However, I think we have to specify when
exactly URL most be escaped/unescaped in the configuration file and the
scripts.

The whole reason for changing the behavior of the proxy in this release
is that it now uses a canonicalized URL when accessing the server
cache and the host name cache, This means that the URLs

http://INFO.CeRn.CH = http://info.cern.ch:80 =
http://info.cern.ch./ = http://info/ = http://info.cern.ch

(The `http://info' is only valid inside the cern.ch `domain') all refer
to the SAME entry in the cache. See more information on treatment of
URLs in

http://info.cern.ch/hypertext/WWW/Library/User/Paper/LibraryPaper.html

-- cheers --

Henrik Frystyk
frystyk@dxcern.cern.ch
+ 41 22 767 8265
World-Wide Web Project,
CERN, CH-1211 Geneva 23,
Switzerland

> On Wed, 5 Oct 1994 03:15:53 -0700 (PDT) Ari Luotonen wrote:
> > >
> > > However by the time it has passed through the cache/proxy it ends up as:
> > >
> > > http://..../path%3D......?ufn%3D....
> > >
> > You are absolutely right about this, and I *did* fix it, and I'm 100%
> > sure about it. I just checked my last version 3.0pre6, and it worked
> > fine, and then the newest one 3.0, and noticed that it was indeed
> > broken again... :-( So with pre6 it works as you describe and as it
> > should. I suppose when they've cleaned up the libwww they've by
> > mistake removed the special case that made sure that the URI was
> > passed on exactly as it came in.
>
> Thanks for the reply Ari, I was sure I wasnt ghosting memories :)
>
> Okay so it ended up even worse as it was also escaping the & delimiters between
> fields so the result is unusable :(
>
> I have just backed our cache server back down to pre5 (I never had pre6), until
> this one is fixed; but in the mean time, I think that people will find that a
> lot of form handling is back to being broken again :( :( :(