Re: Behavioural problem of cache/proxy (latest version)

Paul Wain (Paul.Wain@brunel.ac.uk)
Thu, 6 Oct 1994 09:53:15 +0100

Date: Thu, 6 Oct 1994 09:53:15 +0100
Message-Id: <ECS9410060953B@brunel.ac.uk>
From: Paul Wain <Paul.Wain@brunel.ac.uk>
To: Multiple recipients of list <www-proxy@www0.cern.ch>
Subject: Re: Behavioural problem of cache/proxy (latest version)

Sorry if Ive hacked Henrik's original around a bit. Ive tried to preserve what
he said on URL escaping without destroying the flow....

On Thu, 6 Oct 1994 02:07:30 +0100 Henrik Frystyk wrote:
> as stated. I will later explain why this is essential for understanding
> the problem. The '=' _is_ a reserved character in the path according to
> the URL specifications in RFC1630 However, it is _not_ illegal to
> have a '=' sign
> after a ';' in the URL. The ';' indicates a set of parameters and both
> WAIS and FTP URLs use it as a delimiter between the path of the URL and
> a set of parameters.
>
> Another thing is that there is nothing wrong in escaping parts of the
> URL path which normally can be sent unescaped, so the URL generated by
> the proxy as shown above is just as good as the unescaped one. However,
> many clients (and scripts) are not aware of this :-(

Actually this is wrong. Say I had a URL that was off the form:

http://host/path=x%3D2y/

Since the form correctly escaped "x=2y" to "x%3Dy". As far as I can see then,
the following URL is not equivilant:

http://host/path%3Dx%3Dy/

It should be:

http://host/path%3Dx%253D2y/

i.e. since %'s are also reserved chars, they should be escaped too :)

Okay now the real problem was not that it was escaping the ='s afterall. There
is a nice workaround for that if you work hard. No, it was that it was escaping
the '&'s in a form output. I havent checked the RFC on this one but if you had
the URL:

http://host/?exp1=2x%3Dy&exp2=y%3D2

this ends up as:

http://host/?exp1%3D2x%3Dy%26exp2%3Dy%3D2

Now from the 1st example you can see that the delimiter between fields is an &.
Also that each field can contain an escaped sequence. Thus, the 2 are most
definatly not equivilant. Unescape it and you end up with:

exp1=2x=y&exp2=y=2

When what you are looking for is:

exp1=2x%3D&exp2=y%3D2

Another example. Supose that Im passing logical expressions around, and I want
to pass y=x&z and z=a&b. Quite correctly I have the output as:

http://host/?expr1=y%3Dx%26z&expr2=z%3Da%26b

The cache/proxy spits out:

http://host/?expr1%3Dy%3Dx%26z%26expr2%3Dz%3Da%26b

Decode that back to what it should be without knowing what it was in the 1st
place! (Assume that you are also allowed expressions of the form x=y=z). YOU
CANT DO IT!

So there are 2 fixes:

1) If you are going to insist on re-escaping everything, just reescapes at that
point (i.e. without unescapeing first). Or as Ari says:

2) Dont touch it. Ari gave one good reason; you dont know what might happen in
6 weeks/months/years time. Also as is often said on the www-talk list; if you
can possibly help it dont break old implementations! I really favour this way.

> The whole reason for changing the behavior of the proxy in this release
> is that it now uses a canonicalized URL when accessing the server
> cache and the host name cache,

Hostnames can be done that way. BUT why touch the URI? It has nothing to do
with hostnames right?

Paul

.--------Paul Wain ( X.500 Project Engineer and WWW Person at Brunel)---------.
| Brunel WWW Support: www@brunel.ac.uk MPhil Email: Paul.Wain@brunel.ac.uk |
| Work Email (default): Paul.Wain@brunel.ac.uk (Brunel internal extn: 2391) |
| http://http2.brunel.ac.uk:8080/paul or http://http2.brunel.ac.uk/~eepgpsw |
`-------------------So much to fit in, and so little space!-------------------'