![]() |
| February 2001 | Get BSD | New to BSD? | Search BSD | Submit News | FAQ | Contact Us | Join Us |
|
In 2000, on the bsdi-users mailing list, a question was asked
about how to do transparent HTTP caching/proxying using BSD/OS.
At the time, I hinted at using the SO_BINDANY
socket option in BSD/OS, along with the Squid cache (http://squid.nlanr.net) to
implement this function. Recently, I was asked the specifics of
how to do this, so I spent a little time making this technology
go. What follows is a short writeup of the work I did to make
transparent Web caching work.
Transparent HTTP caching/proxying is the attempt to make a user's HTTP application (typically a Web browser) use a http cache or proxy without having to modify the configuration of the application. In some cases, the application cannot be directly configured to use a proxy. Transparent caching is a useful tactic to take when all HTTP traffic must be run through a single machine, either for routing or security purposes. It is also useful when you are attempting to reduce network bandwidth requirements by aggressively caching Web content close to the data consumers.
At the Squid website, in one of the FAQs, there is a list of the four basic actions that need to be accomplished to enable transparent caching/proxying:
For a more complete version of these actions and exactly what they entail, visit the Squid FAQ pages at http://squid.nlanr.net/Doc/FAQ/FAQ.html (at the time of this writing, FAQ 17 was the relevant section). This document is useful in that it spells out the specific tasks that must be accomplished to complete this task on BSD/OS.
BSD/OS 4.2 ships with a fairly recent, stable version of Squid
installed in /usr/contrib/bin/squid. The source
code for that version of squid is on the "Contributed Sources"
CD-ROM that ships with both the binary and source releases of
BSD/OS. I didn't try to make transparent caching work with any
version of Squid other than the version that is shipped with
BSD/OS 4.2, since that version of Squid is relatively recent. I
didn't know of any good reason to upgrade to a newer release of
Squid.
The first thing that needs to be done is a small set of changes
to Squid to make it accept HTTP connections for any destination
IP address. This is fairly easy to implement under BSD/OS,
using their proprietary SO_BINDANY socket option.
This socket option will allow an application to bind to and
accept connections for any IP address that get routed to
localhost for processing. To that end, I created a small set of
patches for Squid that turns on the SO_BINDANY
option inside of Squid. In order to apply these patches to
Squid, you will need to find your "Contributed Sources" CD-ROM
and put it in your CD-ROM drive to retrieve the Squid source
code.
# mount /cdrom
# cd /var/tmp
# gzcat < /cdrom/contrib/squid.tar.gz | tar xvf -
# umount /cdrom
Apply the patches in the squid.patch file.
# patch -p0 < squid.patch
Build the (now patched) Squid distribution.
# gmake all
First, save the original binary, in case you need to revert
back. Then copy the squid binary to the
installation directory.
# mv /usr/contrib/bin/squid /usr/contrib/bin/squid.FCS
# cp src/squid /usr/contrib/bin/squid
You have now completed the first step of the process -- your
squid binary now will set the socketoption
SO_BINDANY on the sockets it creates for accepting
connections.
You need to configure Squid to accept the connections from the hosts on your network. Configuring Squid to accept connections from a given set of hosts or networks is fairly well documented in the Squid FAQs, but it does requires careful reading. I've included the diff of the BSD/OS installed squid.conf.default and the configuration file that is being using on my proxy server.
Next you need to create a configuration file and apply the patches in the squid.conf.patch file.
# cd /var/www/squid/conf
# cp squid.conf.default squid.conf
# patch -p0 < squid.conf.patch
# vi squid.conf
Note: You WILL need to add an ACL for your network/hosts that allows them to connect to the proxy. I've called my ACL "dummy" and it allows anybody on 192.168.1.0/24 to connect to the proxy. You MUST edit this line of the configuration file for it work properly on your network. There are lots of other settings that you might want or need to tweak to have Squid do what you want or need to adjust.
Now, start the Squid process:
# /var/www/squid/bin/start-squid
Getting the cache server to actually accept the packets is
really easy under BSD/OS, assuming you have left the
IPFW option on in the kernel. BSD/OS 4.2 ships
with this facility turned on, so unless you turned it off
explicitly in the kernel you are running, you should have it in
your current kernel.
You need to install a small filter at the
pre-input location in the kernel. Here's a filter
similar to what is installed on my router host:
tcp && srcaddr(192.168.1.25) && dstport(service(http/tcp)) {
forcelocal;
accept;
}
This filter only turns on the transparent proxying for a single host (192.168.1.25) -- which is all that I needed for demonstrating that the proxying was working. In a normal situation, you will need to change the filter to allow your entire netblock(s) to connect to the service. This could be done by modifying the srcaddr(...) part in the above example to srcaddr(192.168.1.0/24). If you have multiple netblocks that you want to allow, you can list them like this: srcaddr(192.168.1.0/24, 192.168.2.0/24)
To install the filter, you will need to run the following commands.
# ipfwcmp -o /var/run/ipfw.pre-input /path/to/pre-input.filter
# ipfw pre-input -replace /var/run/ipfw.pre-input
Don't forget to put these commands in your startup files so
this filter will get installed each time your machine is
rebooted! I would suggest putting these commands at the end of
the /etc/netstart file.
The first command compiles the ASCII representation of the
filter into the binary format that the ipfw
command uses. The second command will actually download the
filter into the running kernel, replacing any existing
pre-input filter. If you need to make changes to the filter,
you can execute these commands again (after editing the filter
file) and implement the changes to the filtering rules in the
running kernel without having to reboot.
There is no step four, at least not for my network setup. In my configuration, there is a BSD/OS 4.2 machine that acts as the gateway to the Internet. This host terminates the Frame Relay connection directly on a serial interface and has multiple ethernet interfaces. It does the filtering for the network and now it runs the Squid cache too. If you are planning to run the Squid cache on a machine that doesn't have all the outbound network traffic already flowing through it, you will need to investigate the FAQs at the Squid homepage and see how to do that. They have notes about configuring some popular brands of routers to do just this.
After I wrote up all the above information, it was pointed out that there were several bugs in the version of Squid (2.3STABLE4) that have had patches posted. I looked at all the patches, and while some were gratuitous (in that they fixed code that wasn't enabled in the BSD/OS configuration of Squid) they all were very easy to apply. None of the posted patches at the Squid website conflict with the patches that I wrote. For your reference, the bugs, their descriptions and patches for them are located at http://squid.nlanr.net/Versions/v2/2.3/bugs/. It is probably worth the effort to get and apply these patches for the reported bugs.
After getting the transparent http caching working with BSD/OS and Squid, I used the system on and off for the next day and a half. My perception was that browsing the Internet without the Squid cache was definitely faster than browsing with the cache enabled. This was obviously not the best possible solution -- there is little point in running a cache if the perceived speed of the network connection goes down.
I examined the log files as they were written by the Squid proxy. Whenever a new Web site was visited, there was a long pause before the first log entry for a new website would be written into the access file. All the subsequent log entries would be written quite rapidly. This problem resembled a DNS lookup problem. The the Squid cache was making the end user wait while it resolved the name of the new Web host. This is not acceptable!
Reading more of the Web pages at the Squid Web server, I
stumbled across an extremely important piece of information.
With the 2.3 release of Squid, the default nameserver lookup
routines used were the internal proxy routines. In other words,
all the nameservice lookups were being done internally by the
program using the standard system resolver. While there is
nothing wrong with the system resolver, it does operate
synchronously. So, while the proxy waited for a nameserver to
respond, it wasn't handling relaying other http traffic and was
causing the end user to have to wait while the dns information
resolved. Simply rebuilding the squid executable
with the --disable-internal-dns option solved this
problem. This flag forces Squid to use the external
dnsserver program. After restarting the Web proxy
with this change, browsing through the proxy did not seem
noticeably slower than without the proxy.
The very small patch for the Makefile to specify this flag is available.
If you choose to perform this tuning, you will want to rebuild from scratch and reinstall the resulting binary:
# gmake clean
# gmake all
# cp src/squid /usr/contrib/bin/squid
Don't forget to kill and restart the squid daemon for the change to take effect!
dnsserver The notes on the Squid homepage that describe using the
external dnsserver program state that you should
always try to run at least as many copies of the
dnssserver program than the cache will have
nameservice requests outstanding. And then run two more copies
of the program for good measure. However, it doesn't appear
that Squid keeps track of how many requests each of the
dnsserver instances has handled, so figuring out
when there are enough dnsserver processes running
is not as simple as it could be.
Hacking a little code into the dnsserver
program to do this counting seemed like the right solution. So,
after a little work on the code to put in a call to
setproctitle(), you can now look at the dnsserver
processes with ps and see how many requests each
of the dnsserver processes has handled. The patches for the dnsserver program
are available.
# ps -auxw -U www | grep dnsserver
www 829 0.0 1.7 1396 492 ?? Is 10:30PM 0:00.06 (16 requests) (dnsserver)
www 830 0.0 1.7 1396 492 ?? Is 10:30PM 0:00.04 (1 requests) (dnsserver)
www 831 0.0 0.5 1060 144 ?? Is 10:30PM 0:00.02 (0 requests) (dnsserver)
www 832 0.0 0.5 1060 144 ?? Is 10:30PM 0:00.02 (0 requests) (dnsserver)
www 833 0.0 0.5 1060 144 ?? Is 10:30PM 0:00.02 (0 requests) (dnsserver)
This is much more useful than the default listing in the
process table for dnsserver, at least in my
opinion. If you patch dnsserver, you will need to
recompile everything and reinstall at least the
dnsserver binary. You should probably save the
original copy of the program, in case you need to revert back
to it for some reason.
# gmake clean
# gmake all
# mv /var/www/squid/bin/dnsserver /var/www/squid/bin/dnsserver.FCS
# install -c -o bin -g bin src/dnsserver /var/www/squid/bin/dnsserver
It's not completely obvious, but you will need to kill and
restart the squid daemon for the new version of
dnsserver to be started. This is necessary because
squid starts up all the copies of the
dnsserver program when it first starts and uses
them until the squid daemon is stopped.
Thanks go to Paul Borman (of BSDi) for explaining to me what I
didn't understand in the way that the SO_BINDANY
socket option works in BSD/OS.