Chaining proxies, was Re: Handbook On Running A WWW Service

N.G.Smith (N.G.Smith@ukc.ac.uk)
Mon, 7 Nov 1994 15:20:02 +0100

Date: Mon, 7 Nov 1994 15:20:02 +0100
Message-Id: <22338.784216346@snipe>
From: "N.G.Smith" <N.G.Smith@ukc.ac.uk>
To: Multiple recipients of list <www-proxy@www0.cern.ch>
Subject: Chaining proxies, was Re: Handbook On Running A WWW Service

## Please note:
##
## I have posted this to both www-proxy@info.cern.ch and
## www-managers@list.Stanford.EDU as the discussion started in the latter,
## but is, I believe more appropriate to the former. I would like to see
## the discussion continue on www-proxy@info.cern.ch.

>As I understand it you can setup the current CERN proxy to chain if it
>doesnt find a file in its cache using the *_proxy environment variables.

Unless there is something new, this is merely the firewall support, no?
The more useful capability would be for the *_proxy variables to be
chosen dynamically, based on the domain in the URL that is being
requested.

>2) Version controls. I for example would be worried linking to a site
>that used versions of cache/proxy service that I didnt know about. The
>problems with the current CERN cache are a good example of this.

Yes, this would have to be a supported part of the mainstream CERN server.

>3) Is there really a need to take the abstraction one level higher than
>the local cache? I would say that at the moment there isnt. We are
>getting a hit rate of around 30% on our cache. Now that may sound low
>but considering its a 250Mb cache serving around 8-10 requests/second(!!!)
>I would say that is very good. I dont think that there is a real nmeed
>to start chaining caches just yet, but there will be soon I think

I see no reason why we should hang back just because we feel that the
current delays are short enough. For me, they are not and I work from a
1.6Gb cache with greater than 50% hit rate. I still feel the bandwidth
bite when I access anything other than the most popular pages.

However, when it comes to chaining caches I think that we need to consider
the client cache chain and the server cache chain as different entities.

When I say client, I imagine a user at a UK university wishing to
retrieve a popular document for the US, for example. Their university
may have a small cache which would obviously be searched first. If the
document is not found there, then I see no reason why their server
shouldn't then contact a national cache like HENSA Unix. In turn HENSA
Unix retrieves the document from the states if necessary.

I don't believe that the reverse is true. A user in the states wishing
to retrieve a document from the UK university should not go through
HENSA Unix. The exception to this might be if the university had a
particularly slow link, or didn't want the load. In this case, they
could have a scheme whereby they only accept connections from a cache
and redirect all other connections to use a cache.

Reinier Post implemented the code to provide dynamic choice of
secondary caches in Lagoon. The necessary routines probably already
exist in the CERN server, it's a matter or making the *_proxy variable
take a pattern matching expression.

For example:

# Proxy config for a small UK university, small.ac.uk
#
# These lines define which cache to visit if we don't have a document
#
# Anything from within the UK is likely to be V.Fast anyway
# so don't bother HENSA
#
ProxyChain http://*.uk/* Direct
#
# Anything outside the UK is likely to be V.Slow so we might as
# well go and see if HENSA have it first.
#
ProxyChain * http://www.hensa.ac.uk/

Obviously this hand crafting of configuration files is not a good
thing. However, I believe that with this kind of capability already
built into the servers, an autonomous URN type scheme would be easier
to implement. It would simply be a database of the relationships
between caches. (Perhaps `simply' wasn't the right word :-)

>I am sure that there are more issues I can raise. Are they relevant here
>or should these perhaps be raised in a splinter group :) I feel here is
>as good a place as any.
>
>Paul

I believe that this is the way to go. There is a definite, but
technically simple, step going from the current CERN server to one that
supports dynamic choice of secondary proxy/cache. Once this is in
place, providing a URN type service should be significantly simpler
than making the whole URL->URN transition in one go.

Neil G. Smith,
HENSA/Unix Administrator,
The University of Kent at Canterbury.