Cache Hierarchy Structures

Deciding what hierarchy structure to use is difficult. Not only that, but it's quite often very difficult to change, since you can have thousands of clients accessing you directly, and even more through your peers.

Here, I cover the most common cache-peer architectures. We start off with the most simple setup that's considered peering: two caches talking to one another as siblings. I am going to try and cover all the issues with this simple setup, and then move to larger cache meshes.

Two Peering Caches

We have two caches, your cache and their cache. You have a nice fast link between them, with low-latency, both caches have quite a lot of disk space, and they aren't going to be running into problems with load anytime soon. Let's look at the peering options you have:


The traditional cache hierarchy structure involves lots of small servers (with their own disk space, each holding the most common objects) which query another set of large parent servers (there can even be only one large server.) These large servers then query the outside on the client cache's behalf. The large servers keep a copy of the object so that other internal caches requesting the page get it from them. Generally, the little servers have a small amount of disk space, and are connected to the large servers by quite small lines.

This structure generally works well, as long as you can stop the top-level servers from becoming overloaded. If these machines have problems, all performance will suffer.

Client caches generally do not talk to one another at all. The parent cache server should have any object that the lower-down cache may have (since it fetched the object on behalf of the lower-down cache). It's invariably faster to communicate with the head-office (where the core servers would be situated) than another region (where another sibling cache is kept).

In this case, the smaller servers may as well treat the core servers as default parents, even using the no-query option, to reduce cache latency. If the head-office is unreachable it's quite likely that things may be unusable altogether (if, on the other hand, your regional offices have their own Internet lines, you can configure the cache as a normal parent: this way Squid will detect that the core servers are down, and try to go direct. If you each have your own Internet link, though, there may not be a reason to use a tree structure. You might want to look at the mesh section instead, which follows shortly.)

To avoid overloading one server, you can use the round-robin option on the cache_peer lines for each core server. This way, the load on each machine should be spread evenly.


Large hierarchies generally use either a tree structure, or they are true meshes. A true mesh considers all machines equal: there is no set of large root machines, mainly since they are almost all large machines. Multicast ICP and Cache digests allow large meshes to scale well, but some meshes have been around for a long time, and are only using vanilla ICP.

Cache digests seem to be the best for large mesh setups these days: they involve bulk data transfer, but as the average mesh size increases machines will have to be more and more powerful to deal with the number of queries coming in. Instead of trying to deal with so many small packets, it is almost certainly better to do a larger transfer every 10 minutes. This way, machines only have to check their local ram to see which machines have the object.

Pure multicast cache meshes are another alternative: unfortunately there are still many reply packets generated, but it still effectively halves the number of packets flung around the network.

Load Balancing Servers

Sometimes, a single server cannot handle the load required. DNS or CARP load balancing will allow you to split the load between two (or more) machines.

DNS load balancing is the simplest option: In your DNS file, you simply add two A records for the cache's hostname (you did use a hostname for the cache when you configured all those thousands of browsers like I told you, right?) The order that the DNS server returns the names in is continuously, randomly switched, and the client requesting the lookup will connect to a random server. These server machines can be setup to communicate with one-another as peers. By using the proxy-only option, you reduce duplication of objects between the machines, saving disk space (and, hopefully, increasing your hit rate.)

There are other load-balancing options. If you have client caches accessing the overloaded server (rather than client pcs), you can configure Squid on these machines with the round-robin option on the cache_peer lines. You could also use the CARP (Cache Array Routing Protocol) to split the load unevenly (if you have one very powerful machine and two less powerful machines, you can use CARP to load the fast cache twice as much as the other machines).