Date: Mon, 7 Nov 1994 15:20:02 +0100
Message-Id: <22338.784216346@snipe>
From: "N.G.Smith" <N.G.Smith@ukc.ac.uk>
To: Multiple recipients of list <www-proxy@www0.cern.ch>
Subject: Chaining proxies, was Re: Handbook On Running A WWW Service
## Please note:
##
## I have posted this to both www-proxy@info.cern.ch and
## www-managers@list.Stanford.EDU as the discussion started in the latter,
## but is, I believe more appropriate to the former. I would like to see
## the discussion continue on www-proxy@info.cern.ch.
>As I understand it you can setup the current CERN proxy to chain if it
>doesnt find a file in its cache using the *_proxy environment variables.
Unless there is something new, this is merely the firewall support, no?
The more useful capability would be for the *_proxy variables to be
chosen dynamically, based on the domain in the URL that is being
requested.
>2) Version controls. I for example would be worried linking to a site
>that used versions of cache/proxy service that I didnt know about. The
>problems with the current CERN cache are a good example of this.
Yes, this would have to be a supported part of the mainstream CERN server.
>3) Is there really a need to take the abstraction one level higher than
>the local cache? I would say that at the moment there isnt. We are
>getting a hit rate of around 30% on our cache. Now that may sound low
>but considering its a 250Mb cache serving around 8-10 requests/second(!!!)
>I would say that is very good. I dont think that there is a real nmeed
>to start chaining caches just yet, but there will be soon I think
I see no reason why we should hang back just because we feel that the
current delays are short enough. For me, they are not and I work from a
1.6Gb cache with greater than 50% hit rate. I still feel the bandwidth
bite when I access anything other than the most popular pages.
However, when it comes to chaining caches I think that we need to consider
the client cache chain and the server cache chain as different entities.
When I say client, I imagine a user at a UK university wishing to
retrieve a popular document for the US, for example. Their university
may have a small cache which would obviously be searched first. If the
document is not found there, then I see no reason why their server
shouldn't then contact a national cache like HENSA Unix. In turn HENSA
Unix retrieves the document from the states if necessary.
I don't believe that the reverse is true. A user in the states wishing
to retrieve a document from the UK university should not go through
HENSA Unix. The exception to this might be if the university had a
particularly slow link, or didn't want the load. In this case, they
could have a scheme whereby they only accept connections from a cache
and redirect all other connections to use a cache.
Reinier Post implemented the code to provide dynamic choice of
secondary caches in Lagoon. The necessary routines probably already
exist in the CERN server, it's a matter or making the *_proxy variable
take a pattern matching expression.
For example:
# Proxy config for a small UK university, small.ac.uk
#
# These lines define which cache to visit if we don't have a document
#
# Anything from within the UK is likely to be V.Fast anyway
# so don't bother HENSA
#
ProxyChain http://*.uk/* Direct
#
# Anything outside the UK is likely to be V.Slow so we might as
# well go and see if HENSA have it first.
#
ProxyChain * http://www.hensa.ac.uk/
Obviously this hand crafting of configuration files is not a good
thing. However, I believe that with this kind of capability already
built into the servers, an autonomous URN type scheme would be easier
to implement. It would simply be a database of the relationships
between caches. (Perhaps `simply' wasn't the right word :-)
>I am sure that there are more issues I can raise. Are they relevant here
>or should these perhaps be raised in a splinter group :) I feel here is
>as good a place as any.
>
>Paul
I believe that this is the way to go. There is a definite, but
technically simple, step going from the current CERN server to one that
supports dynamic choice of secondary proxy/cache. Once this is in
place, providing a URN type service should be significantly simpler
than making the whole URL->URN transition in one go.
Neil G. Smith,
HENSA/Unix Administrator,
The University of Kent at Canterbury.