Varnish with Apache on Ubuntu, with throttling, logging for multiple sub-sites, handling cookies and more

18th January, 2014 - Posted by david

One of the good things about working in a small company is that you’re more likely to be asked to do things that you’ve never done before and are way out of your comfort zone. Although I’m a programmer/web developer primarily, I like the whole infrastructure aspect of web development too and am always happy to learn how to get things working. Earlier last year I got a Beanstalkd queueing system up and running, which has proven to be a great success, so next up we wanted intermediary cache that could also handle throttling/rate limiting and after some research by myself, we settled on the excellent Varnish Cache. In work we run Apache as our webserver, which is the scope of this article, but it can easily run with Nginx too. I should also mention that the below applies to Varnish 3.0.5, some stuff might not work in earlier/later major versions.

The way varnish works is that it listens on port 80 (i.e. your webserver’s default port) for incoming HTTP connections and looks up it’s cache. If it’s a hit, it simply serves the request, whether that’s a HTML page, a JSON request, an image, static CSS/JS file etc. and the webserver is never touched. On a miss however, it passes the request on to your webserver via port 8080 (or whatever you choose), which processes it, gives the response back to Varnish to cache (for 2 minutes by default) and serve back to the client.

*** N.B. The way this guide works, you’ll need to restart Apache a couple of times. I wrote it this way so you can be sure each stage is working correctly. Ideally in a live environment, you’d have everything set-up correctly, then simply do one restart of Apache (to get it to listen on port 8080 instead of 80). At the end of this article I do show a way how to test everything before you do your Apache restart, i.e. while it’s still listening on port 80. ***

Initial Apache Set-up

So, first up we need to tell Apache to listen on port 8080. This is done by changing the first line in your websites sites-available to listen on port 8080, as follows:

1
2
# /etc/apache2/sites-available/default
<VirtualHost *:8080>

If you have multiple sites-available entries and want to serve them all from the same port, you’ll need to edit each of them. One thing you could do instance would be to store the port as a variable in your Apache configuration, then use the variable in your sites-available. I did this by storing a variable called VARNISH_PORT in /etc/apache/envvars, then using that in each of our sites-available, as follows:

1
2
# /etc/apache/envvars
export VARNISH_PORT=8080

and then

1
2
# /etc/apache2/sites-available/default
<VirtualHost *:${VARNISH_PORT}>

Additionally, you’ll want to change any reference to port 80 in /etc/apache/ports.conf to either 8080 or ${VARNISH_PORT}.

Varnish

So, next up we want to install Varnish, get it listening for connections on port 80, then forwarding to Apache on port 8080. installation is a simple apt-get:

1
sudo apt-get install varnish

Varnish’s 2 main configure files are /etc/default/varnish for boot-up options and /etc/varnish/default.vcl for actual configuration. To the former we say to listen for connections on port 80, allocate 256MB of memory to Varnish and a few other configuration options:

1
2
3
4
5
6
7
8
# /etc/default/varnish
DAEMON_OPTS="-a :80 \
            -T localhost:1234 \
            -f /etc/varnish/default.vcl \
            -S /etc/varnish/secret \
            -s malloc,256m"


# -a is listen on port 80, -T provides a web-based interface to Varnish at http://localhost:1234, -s is the amount of memory it can use

To the latter file, in order to tell Varnish to pass any cache misses on to Apache, running locally on port 8080, we add the following:

1
2
3
4
5
# /etc/varnish/default.vcl
backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

Now we’re ready to restart Apache and turn on Varnish (needs to be done in this order):

1
2
sudo service apache2 restart
sudo service varnish start

Now, if you go to a page on your site and look at the response headers for each request, you should see a Varnish and an Age header. Age will probably be set to 0 initially, but if you reload the page, it’ll have a higher value, i.e. it’s age in the cache. By default, Age shouldn’t go over 120, unless you tell Varnish to cache things for longer (see below).

Logging the correct IP address for Apache

With Varnish passing cache misses on to Apache, you’ll notice that entries in Apache’s access log will have 127.0.0.1 as their IP address. Ideally we’d like to store the originating client address, as we did before. We do this by installing an Apache module called RPAF, which gets Apache to use the X-Forwarded-For IP address when logging, instead of the normal one. We also need to set Varnish up to pass an X-Forwarded-For value on to Apache. Most of this bit I got from a post on theyusedtocallitablog.net.

Let’s start with Varnish. Add the following to your default.vcl:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
sub vcl_pipe {
    set bereq.http.connection = "close";

    if (req.http.X-Forwarded-For) {
        set bereq.http.X-Forwarded-For = req.http.X-Forwarded-For;
    } else {
        set bereq.http.X-Forwarded-For = regsub(client.ip, ":.*", "");
    }
}

sub vcl_pass {
    set bereq.http.connection = "close";

    if (req.http.X-Forwarded-For) {
        set bereq.http.X-Forwarded-For = req.http.X-Forwarded-For;
    } else {
        set bereq.http.X-Forwarded-For = regsub(client.ip, ":.*", "");
   }
}

Next up we need to install RPAF (be careful, for me the install step restarted Apache without asking!):

1
2
3
4
sudo apt-get update
sudo apt-get install libapache2-mod-rpaf
sudo a2enmod rpaf
sudo apache2ctl graceful

You then need to add the RPAF module to either each sites-available or, if you want it to apply for all sites, you could add it to /etc/apache2/apache2.conf:

1
2
3
4
5
<IfModule mod_rpaf.c> # if this doesn't work, try simply "rpaf.c"
    RPAFenable On
    RPAFsethostname On
    RPAFproxy_ips 127.0.0.1
</IfModule>

Once again, you’ll need to restart Varnish (see above) to load the updated default.vcl. Now, if you look at requests coming into your Apache’s access.log file, you should see the correct IP address coming through.

Varnish logging

So, you’ll probably want to log any requests coming into Varnish too. This can be done using either varnishlog or varnishncsa. The latter program writes a log file similar to that of a webserver (i.e. to NCSA standard) and it’s the one we’ll be using here. The standard way to run it as a daemon, logging everything to the one file is:

1
2
3
sudo touch /var/log/varnish/access.log # make sure it exists!
# -D option below means 'run as a daemon'
sudo varnishncsa -a -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid

If you serve multiple sites from the one server and want to split the logs out into separate log files, you can have multiple instances of varnishncsa running and filter what gets logged via the incoming header. To do this, you could run the following (courtesy of linuxaria.com):

1
2
sudo vanishncsa -m "RxHeader:^Host: (www\.)?site1.com$" -a -w /var/log/varrnish/site1.access.log -D
sudo vanishncsa -m "RxHeader:^Host: (www\.)?site2.com$" -a -w /var/log/varrnish/site2.access.log -D

If you want to pipe the output of your logging to a program like cronolog, it’s a bit tricker. I wasn’t able to figure out how to do this with varishncsa running as a daemon, all I could get to work was to run it as a background process (which will stay running after you log out), something like:

1
sudo varnishncsa -m "RxHeader:^Host: (www\.)?site1.com$" | /usr/bin/cronolog /var/www/site1/logs/%Y/%m/varnish.access_%Y%m%d.log &

We should also really ensure that any logging scripts are started automatically on start-up, but that’s outside of the scope of this article. I’ll just say you do it by writing a script in /etc/init.d/ and call update-rc.d on it. A quick google should point you in the right direction.

Another thing you could so is integrate your varnishncsa logs with your Apache ones, so everything is in the one log file. You can tell what’s a Varnish log entry and what’s an Apache one by looking at the request: Apache’s will be something like GET /index.htm, while Varnish’s will be more like GET http://www.site1.com/index.htm.

Bypassing Varnish for certain pages/sub-sites

In certain instances, you might not want to use Varnish and always pass the request onto Apache. You can do this by adding some code to your default.vcl. In the following example, we’re going to skip the cache for the stats/ sub-section and the admin.site1.com sub-site:

1
2
3
4
5
6
7
8
# /etc/varnish/default.vcl
sub vcl_recv {
    if (req.url ~ "stats/" ||
        req.http.host ~ "admin.site1.com"
    ) {
        return (pass);
    }
}

Caching items for longer than the default

For certain files (especially static ones such as images) you might want to cache them for longer. This can be done easily, by adding the following to your sub vcl_recv block in /etc/varnish/default.vcl (or add a new one if you’ve skipped the step above!):

1
2
3
4
# /etc/varnish/default.vcl - in vcl_recv
    if (req.url ~ ".(jpg|png|gif)+$") {
        set beresp.ttl = 3600s; # cache images for 1 hour
    }

Varnish and Cookies

Varnish won’t cache any request coming in that contains cookies (it also won’t cache POST requests, but that’s understandable). However, often cookies are only used by the client and have no impact on the server, for example Google tracking cookies. There will obviously also be certain cookies that the server does need (e.g. a session cookie) that we’d want/need to keep and thus not cache the generated content. Fortunately there’s a workaround we can use. To only keep the cookies cookie1 and cookie2 but disregard all others, again in our default.vcl vcl_recv block, we can do:

1
2
3
4
5
6
7
8
9
10
11
12
# /etc/varnish/default.vcl - in vcl_recv
    if (req.http.Cookie) {
        set req.http.Cookie = ";" + req.http.Cookie;
        set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
        set req.http.Cookie = regsuball(req.http.Cookie, ";(cookie1|cookie2)=", "; \1=");
        set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
        set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

        if (req.http.Cookie == "") {
            remove req.http.Cookie;
        }
    }

There’s a couple of other examples on the original work around page you can check out in your own time.

Throttling with Varnish

Another great feature of Varnish is the ability to install various modules to enhance it’s functionality. One we needed to use in my work was throttling/rate-limiting, to thwart scrapers and prevent against basic DOS attacks. This was achieved via the libvmod-throttle module (there’s plenty of configuration options at the module’s main webpage). However, for me at least, installation wasn’t as straightforward as one would’ve hoped. For this part, I’ll go through the order I tried things in, the errors I received and their subsequent solutions.

Initially, I downloaded the package, unzipped it and read the README:

1
2
3
4
wget http://github.com/nand2/libvmod-throttle/archive/master.zip
unzip master.zip
cd libvmod-throttle-master/
sudo ./autogen.sh

(If you’re running xubuntu as opposed to ubuntu, you may get an error saying either automake or libtoolize is missing. For these you can simply sudo apt-get install automake or sudo apt-get install libtool (note, not “libtoolize“) and then you should be able to run autogen.sh)

In the README, you need to run a configure script, which also needs to be passed a directory containing the Varnish source. If you simply run ./configure, you’ll most likely get the following error:

1
configure: error: No Varnish source tree specified

So, to download the Varnish source and store it somewhere, I had to do the following (not all steps might be necessary):

1
2
3
sudo apt-get install dpkg-dev
sudo apt-get apt-get source varnish # creates a directory varnish-3.0.2 in the current directory
sudo mv varnish-3.0.2/ /usr/local/src/ # or wherever you want to move it to

Now that you have the Varnish source, you need to build it, otherwise you’ll get an error saying

1
Please build your varnish source directory

When I was building it, I got the error

1
No package 'libpcre' found

… so I had to install that too, which is included in the next step:

1
2
3
4
5
cd /usr/local/src/varnish-3.0.2
sudo ./autogen.sh
sudo apt-get install libpcre3 libpcr3-dev # may also need pkg-config
sudo ./configure
sudo make

Now we can try and install libvmod-throttle again (I had the source downloaded to my home directory, hence the first cd):

1
2
cd ~/libvmod-throttle-master/
sudo ./configure VARNISHSRC=/usr/local/src/varnish-3.0.2 VMODDIR=/var/lib/varnish

Hopefully, that will all run smoothly and you’ll have successfully installed the module. Now, to enable it for Varnish, add the following to the top of your default.vcl:

1
2
# /etc/varnish/default.vcl
import throttle;

If you restart Varnish now and don’t get any errors, such as something along the lines of “unknown module ‘throttle'” (not the exact error message), you’ll know everything is installed correctly. To enable it, add something like the following to the vcl_recv block in your default.vcl:

1
2
3
4
# /etc/varnish/default.vcl - in sub vcl_recv
    if (throttle.is_allowed("ip:" + client.ip, "20req/30s") > 0) {
        error 429 "Too many requests";
    }

More options can be found on the module’s github page.

Getting Varnish running and testing it before restarting Apache

So, as I mentioned above, in a live environment, you don’t want to be restarting your webserver that often and ideally we’d like to know that things are going to work before we do the restart. To do this, I initially set-up Varnish to run on port 8080 (the .port option in the backend default block in /etc/varnish.default.vcl) and to pass requests to Apache on port 80 (the -a option in /etc/defaukt/varnish), then access my website via http://www.mysite.com:8080. This way, you should still see the Varnish and Age headers, without disturbing Apache.

Conclusion

So, hopefully this guide will help you set-up Varnish server and configure it to work as you need. There’s loads of different things Varnish can do and the documentation is pretty good.

Tags: apache varnish | david | 18th Jan, 2014 at 18:54pm | No Comments

No Comments

Leave a reply

You must be logged in to post a comment.