Date: Thu, 6 Oct 1994 09:53:15 +0100
Message-Id: <ECS9410060953B@brunel.ac.uk>
From: Paul Wain <Paul.Wain@brunel.ac.uk>
To: Multiple recipients of list <www-proxy@www0.cern.ch>
Subject: Re: Behavioural problem of cache/proxy (latest version)
Sorry if Ive hacked Henrik's original around a bit. Ive tried to preserve what
he said on URL escaping without destroying the flow....
On Thu, 6 Oct 1994 02:07:30 +0100 Henrik Frystyk wrote:
> as stated. I will later explain why this is essential for understanding
> the problem. The '=' _is_ a reserved character in the path according to
> the URL specifications in RFC1630 However, it is _not_ illegal to
> have a '=' sign
> after a ';' in the URL. The ';' indicates a set of parameters and both
> WAIS and FTP URLs use it as a delimiter between the path of the URL and
> a set of parameters.
>
> Another thing is that there is nothing wrong in escaping parts of the
> URL path which normally can be sent unescaped, so the URL generated by
> the proxy as shown above is just as good as the unescaped one. However,
> many clients (and scripts) are not aware of this :-(
Actually this is wrong. Say I had a URL that was off the form:
Since the form correctly escaped "x=2y" to "x%3Dy". As far as I can see then,
the following URL is not equivilant:
It should be:
i.e. since %'s are also reserved chars, they should be escaped too :)
Okay now the real problem was not that it was escaping the ='s afterall. There
is a nice workaround for that if you work hard. No, it was that it was escaping
the '&'s in a form output. I havent checked the RFC on this one but if you had
the URL:
http://host/?exp1=2x%3Dy&exp2=y%3D2
this ends up as:
http://host/?exp1%3D2x%3Dy%26exp2%3Dy%3D2
Now from the 1st example you can see that the delimiter between fields is an &.
Also that each field can contain an escaped sequence. Thus, the 2 are most
definatly not equivilant. Unescape it and you end up with:
exp1=2x=y&exp2=y=2
When what you are looking for is:
exp1=2x%3D&exp2=y%3D2
Another example. Supose that Im passing logical expressions around, and I want
to pass y=x&z and z=a&b. Quite correctly I have the output as:
http://host/?expr1=y%3Dx%26z&expr2=z%3Da%26b
The cache/proxy spits out:
http://host/?expr1%3Dy%3Dx%26z%26expr2%3Dz%3Da%26b
Decode that back to what it should be without knowing what it was in the 1st
place! (Assume that you are also allowed expressions of the form x=y=z). YOU
CANT DO IT!
So there are 2 fixes:
1) If you are going to insist on re-escaping everything, just reescapes at that
point (i.e. without unescapeing first). Or as Ari says:
2) Dont touch it. Ari gave one good reason; you dont know what might happen in
6 weeks/months/years time. Also as is often said on the www-talk list; if you
can possibly help it dont break old implementations! I really favour this way.
> The whole reason for changing the behavior of the proxy in this release
> is that it now uses a canonicalized URL when accessing the server
> cache and the host name cache,
Hostnames can be done that way. BUT why touch the URI? It has nothing to do
with hostnames right?
Paul
.--------Paul Wain ( X.500 Project Engineer and WWW Person at Brunel)---------.
| Brunel WWW Support: www@brunel.ac.uk MPhil Email: Paul.Wain@brunel.ac.uk |
| Work Email (default): Paul.Wain@brunel.ac.uk (Brunel internal extn: 2391) |
| http://http2.brunel.ac.uk:8080/paul or http://http2.brunel.ac.uk/~eepgpsw |
`-------------------So much to fit in, and so little space!-------------------'