A parent’s guide to Linux Web filtering
http://desktops.linux.com/desktops/04/07/01/1833212.shtml?tid=49&tid=99&tid=13
Title A parent’s guide to Linux Web filtering
Date 2004.07.01 13:20
Author warthawg
Topic
http://www.linux.com/article.pl?sid=04/07/01/1833212
Having converted quite a few people to the world of GNU/Linux, I am often asked by parents, “Can I set up parental Web filters for my children using Linux?” The answer is yes, and here’s how.
A Web filter is a software that can filter the type of content a Web browser displays. The filter checks the content of a Web page against a set of rules and replaces any unwanted content with an alternative Web page, usually an “Access Denied” page. The type of content to be filtered is usually controlled by a systems administrator or a parent. Web filters are used in schools, libraries, and homes to safeguard children from obscene content on the Internet.
Before you begin, you should be familiar with some basic networking concepts:
* A server, as in “Web server,” is nothing more than an application that runs on a computer and listens for incoming requests. It sends back, or serves, information to the source that requested the information. This information can be anything from Web pages to databases. Each server communicates through the use of an IP address and a port number.
* Ports are logical addresses that applications on a computer use in a way similar to how we use phone numbers. Each server program must have a unique port that it uses for communications.
* Every computer connected to the Internet has both an external IP (Internet Protocol) address, usually assigned by an Internet service provider, and an internal address of 127.0.0.1. The internal address allows the computer to “listen” and “talk” to itself and is referred to as the loopback address. Normally a server is set up to accept requests from other computers on the Internet by listening on its external address. Since this can present a security risk for our single computer, we will use the loopback address instead. This will cause our server to only listen for requests from the computer that the server resides on.
* A firewall is an application that controls the types of communication your computer can send and receive. GNU/Linux has an excellent firewall called netfilter/iptables, or simply iptables, built right into the kernel, which we will make use of to redirect users’ Web surfing through our Web filter.
Getting the software
The only software you need to set up parental filters under GNU/Linux is iptables, DansGuardian, and Squid.
DansGuardian is the actual filtering software. It supports phrase matching, which allow you to block out Web sites that contain certain phrases or words; PICS filtering, which blocks content that’s been labeled as possibly objectionable material by the creator of the Web site; URL filtering, to block content from specific sites that are known to contain offensive material; and blacklists, or lists of sites that contain content you want to block. Blacklists usually come from third parties, though you can create and maintain your own.
Squid is a Web proxy server that acts as a middleman between your computer and the Internet. You need a proxy server because DansGuardian isn’t able to fetch Web pages by itself. We’ll configure Squid as a transparent proxy, meaning we’ll hijack network traffic and redirect it to a new destination — our filter program, in this case — without the need for the user to know that it is happening.
Most modern distribution have packaged versions of Squid and DansGuardian available. If yours doesn’t then you will need to install them from source code. Both the Squid and DansGuardian Web sites have complete instructions for how to compile and install the programs from source.
Iptables is the firewall management tool used with the 2.4.x and higher kernels. Most modern distributions provide iptables. If yours doesn’t, you will need to compile a new kernel and enable iptables, which is beyond the scope of this article (and probably beyond the abilities of most parents). You’d probably be better off upgrading to a newer Linux distribution.
Configuring Squid
The default location for the Squid configuration file on most systems is /etc/squid/squid.conf. While most of the default settings for Squid are all right for our usage, you will need to edit the configuration file just a bit.
You will need to become the root user in order to make the changes and issue the commands shown in this article. You can do this by either logging in as root or with the su command.
Add or edit the following line to have Squid listen only on the loopback device on port 3128. This will cause Squid to act only as a proxy server for this computer and assigns it a specific port number to listen on:
http_port 127.0.0.1:3128
To configure Squid as a transparent proxy, add the following lines to squid.conf:
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
Your system should have created a user and a group named squid when you installed Squid. If it didn’t, you should create them yourself by using the following two commands from the command line:
groupadd -r squid
useradd -g squid -d /var/spool/squid -s /bin/false -r squid
Since Squid is normally started by the system and run as root, you need to add the next two lines to /etc/squid/squid.conf in order to make Squid run with squid’s user and group IDs:
cache_effective_user squid
cache_effective_group squid
We will later use this to identify Squid to our firewall. Then we will allow the user squid to access the Internet while we redirect all other Web traffic through our filter.
Configuring DansGuardian
Our next step is to configure DansGuardian. The default location, on most systems, for the configuration files is /etc/dansguardian/dansguardian.conf. Once again, most of the default values are fine, but we need to make a few changes.
First, add or edit the following line to make the filter use HTML templates, which are static Web pages that our filter will use to display the “Access Denied” page instead of the inappropriate sites. Using HTML templates keeps us from having to set up a Web server to display the “Access Denied” information.
reportinglevel = 3
Next, add or edit the following lines to make DansGuardian listen on the loopback address and port 8080:
filterip = 127.0.0.1
filterport = 8080
Add or edit the following line to tell DansGuardian which address and port that Squid is listening on. This enables our filter to fetch the requested Web content through the proxy.
proxyip = 127.0.0.1
proxyport = 3128
Again, to keep your filter from running as root you need to change the user that it will run as. For simplicity, we will reuse the user and group that we previously set up for Squid. Add or edit the following to make DansGuardian run with UID and GID of squid:
daemonuser = ‘squid’
daemongroup = ‘squid’
While DansGuardian provides an excellent filter all by itself, you may want to exercise further control over the Web filtering by editing the other files in the /etc/dansguardian directory that contain external blacklists. Blacklists from squidGuard and URLBlacklist work perfectly with DansGuardian. Each file contains a brief explanation for its contents to make configuration easier.
Putting it in action
Once you have Squid and DansGuardian set up, the final step is to implement a transparent proxy using iptables. Use the following commands at the command line to add rules to the firewall to allow the user squid to access both the Internet and the Squid proxy we set up.
iptables -t nat -A OUTPUT -p tcp –dport 80 -m owner –uid-owner squid -j ACCEPT
iptables -t nat -A OUTPUT -p tcp –dport 3128 -m owner –uid-owner squid -j ACCEPT
If you want a user to be exempt from filtering — a parent, for example — issue the following command. Replace EXEMPT_USER with the username that you wish to exempt from filtering:
iptables -t nat -A OUTPUT -p tcp –dport 80 -m owner –uid-owner EXEMPT_USER -j ACCEPT
The next command redirects Internet traffic from all users, other than squid and any exempt users, to the filter on port 8080:
iptables -t nat -A OUTPUT -p tcp –dport 80 -j REDIRECT –to-ports 8080
Since we have a proxy server set up, a user could configure a Web browser to bypass the filter and access the proxy directly. The Squid proxy is listening for requests from the computer, and it doesn’t care which user sends the request. We could set up our firewall to deny all access to the proxy except from our filter, but let’s be a little sneakier. Let’s set it up so that direct requests to the Squid proxy server, except from our filter, get redirected through the filter. To do this, use the following command:
iptables -t nat -A OUTPUT -p tcp –dport 3128 -j REDIRECT –to-ports 8080
Some systems, such as MandrakeLinux, utilize an application called Shorewall to manage firewall rules. For these systems, place the above firewall rules in /etc/shorewall/start, to use the filtering when Shorewall starts, and in /etc/shorewall/stop, to make them stick if you should stop Shorewall for some reason. To implement the new rules simply restart Shorewall using the following command:
service shorewall restart
For systems using Shorewall, your firewall rules are set. For all other systems, you’ll need to perform the next two steps in order to get the new firewall rules started at boot time. Issue the following command to save your firewall rules:
iptables-save > /etc/sysconfig/iptables
Now issue the following to make sure iptables is started at boot time and to start the iptables firewall:
chkconfig iptables on
service iptables restart
You may also need to make sure that DansGuardian and Squid get started at boot by using the following two commands:
chkconfig squid on
chkconfig dansguardian on
To get the filtering started, you can now enter the following commands:
service squid restart
service dansguardian restart
The “Access Denied” screen – click to enlarge
Now when users enter a forbidden Web address they will be presented with an “Access Denied” page instead of the offending site. You can customize the look of the “Access Denied” page by editing the template.html file in the appropriate language section located in /etc/dansguardian/languages.
Final thoughts
While the setup discussed in this article is intended for use on a single computer, this method of Web filtering can be applied to a wide range of scenarios. These tools can be easily and successfully implemented on a small home network, a large business infrastructure, or any environment that needs to comply with the Children’s Internet Protection Act.
Bear in mind that Web filtering software of any kind is not 100% failsafe, nor is it a substitute for parental supervision. Along with installing filtering software, educate yourself and your children about the Internet.