Cache Java webapps with Squid Reverse Proxy
This article shows you step by step how to cache your entire tomcat web application with Squid reverse Proxy without writing any java code.
What is Squid
Squid is a free proxy server for HTTP, HTTPS and FTP which saves bandwidth and increases response time by caching frequently requested web pages. While squid can be used as a proxy server when users try to download pages from the internet, it can be also used as a reverse-proxy by putting squid between the user and your webapp. All user requests first hit Squid. If the requested page already exists in Squid’s cache it is served directly from the cache without hitting your Webapp. If the page does not exist in Squid’s cache, it is fetched from your web application and stored in the cache for future requests.
Squid reduces hits to your server by caching response pages. You don’t have to worry about building page level caching in every application that your write, Squid takes care of that part.
When should I use Squid
Ideally you should use Squid for pages which have a high ratio of reads to writes. In other words, a page that changes less frequently but is accessed very often. Here are some scenarios:
- A dynamical web page which displays news and is updated once an hour, and receives hundreds of hits during the hour
- A static web page accessed freqently. Squid can give performance boost by caching frequently accessed static web pages in memory
When should I not use Squid
In most cases, if the request URL is the only factor which determines the response then you can safely use Squid. See more specific examples below:
- If the entire apps is very dynamic in nature, and the validity of pages changes immediately.
- Squid is not suitable for apps which require login. This unfortunately is a large number of applications. Such applications need to resort to back end caching, for example use other caching frameworks like Ehcache to cache re-usable page fragments and/or cache database queries and/or other performance bottlenecks.
- Apps which heavily use browser cookies. Squid relies on URLs to cache pages. If the page served is computed from URLs + cookies, then you should not cache those pages in Squid.
How does the overall setup work
Apache receives requests on port 80. Apache calls Squid with the request. Squid checks its cache to see if it has the response cached from before. If yes and if the response is not expired, it returns the cached response.In this case:
Squid will write the following header to the response:
X-Cache: HIT from www.ashoklabs.com
If the response is not found in Squid’s cache, squid will make a call to Tomcat on port 8082. Tomcat’s proxy connector is listening on this port. It processes the request and sends the response back to Squid. Squid saves the response in its cache, unless caching is disabled for that URL. Squid returns the final response to Apache which sends the response back to the user.
What if I don’t want to use Apache
Using Apache is not required to use Squid. You can run Squid on port 80, and point your users directly to Squid. If that is the case, skip section one and directly jump to section 2 below.
Step 1/3: Apache Httpd Config
If you are using Apache as a front end, you need to instruct Apache to forward requests to Squid at port 3128. See the following code snippet. Change the server name and paths to reflect your real values.
Apache config file:
/etc/httpd/conf/httpd.conf
ServerName www.ashoklabs.com
DocumentRoot /home/webadmin/www.ashoklabs.com/html
# forward requests to squid running on port 3128
ProxyPass / http://localhost:3128/
ProxyPassReverse / http://localhost:3128/
In addition to the above, you also need mod_proxy installed. If you see the following in your httpd.conf, you probably already have mod_proxy installed. If you first need to install mod_proxy.
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
Step 2/3: Squid Config
First make sure that Squid is installed on your server. You can download Squid from here.
The squid config file on Linux/Unix is located at this location.
/etc/squid/squid.conf
The config file is pretty long. Follow these instructions and set the values appropriately.
# leave the port to 3128
http_port 3128
# how much memory cache do you want? depends on how much memory you have on the machine
cache_mem 200 MB
# what's the biggest page that you want stored in memory. If you home page is 100 KB and
# you want it stored in memory, you may set it to a number bigger than that.
maximum_object_size_in_memory 100 KB
# how much disk cache do you want. It is 6400 MB in the following example, change it as per
# your needs. Make sure you have that much disk space free.
cache_dir ufs /var/spool/squid 6400 16 256
# this is probably the most important config section. Here you can configure the cache life for
# each URL pattern.
# Time is in minutes
# 1 day = 1440, 2 days = 2880, 7 days = 10080, 28 days = 40320
# do not cache url1
refresh_pattern ^http://127.0.0.1:8082/url1/ 0 20% 0
# cache url2 for 1 day
refresh_pattern ^http://127.0.0.1:8082/url2/ 1440 20% 1440 override-expire override-lastmod reload-into-ims ignore-reload
# cache css for 7 days
refresh_pattern ^http://127.0.0.1:8082/css 10080 20% 10080 override-expire override-lastmod reload-into-ims ignore-reload
# by default cache the whole website for 1 minute
refresh_pattern ^http://127.0.0.1:8082/ 0 20% 0 override-expire override-lastmod reload-into-ims ignore-reload
# how long should the errors should be cached for. For example 404s, HTTP 500 errors
negative_ttl 0 seconds
# On which host does tomcat run. Set 127.0.0.1 for localhost
httpd_accel_host 127.0.0.1
# this is the proxy port as defined in Tomcat server.xml. By default it is "8082"
httpd_accel_port 8082
# set this to "on". Read more documentation if you want to change this.
httpd_accel_single_host on
# To access Squid stats via the manager interface, you need to enter a password here
cachemgr_passwd your_clear_text_password all
# Say "off" if you want the query string to appear in the squid logs.
strip_query_terms off
Step 3/3: Tomcat Config
Make sure that the HTTP Proxy Connector is defined in TOMCAT_HOME/conf/server.xml.
maxThreads="50" minSpareThreads="5" maxSpareThreads="10"
enableLookups="false" acceptCount="100" connectionTimeout="20000"
proxyName="www.ashoklabs.com"
compressableMimeType="text/html,text/xml,text/css,text/javascript,text/plain" compression="on"
proxyPort="80" disableUploadTimeout="true" />
If needed, see additional documentation on Tomcat proxy connector.
Squid Manager Interface
You can access the Squid config and stats via the Squid Manger HTTP interface. Make sure that the “cachemgr.cgi” file which ships with squid installation is in your cgi-bin directory. More documentation on setting that up here.
Once you’ve set it up, you can access the cache manager via this URL:
http:///cgi-bin/cachemgr.cgi
To continue enter the following values:
Cache host: localhost
Cache port: 3128
Manager name: manager
Password:
- Store Directory Stats shows you how much disk space is used by the disk cache.
- Cache Client List show you the cache HIT/MISS ratio as %. You should monitor this frequently and tune your cache to get a higher hit %.
Reload Squid Config without restarting
Edit the squid config using “vi” or your favorite editor.
vi /etc/squid/squid.conf
Once you are done editing, reload the new config without restarting Squid.
/usr/sbin/squid -k reconfigure
Clearing Squid Cache
To clear Squid cache:
1) Set the memory cache to 4 MB (or a lower number)
cache_mem 8 MB
2) Set the disk cache to 8 MB (or a lower number). The disk cache must be higher that the memory cache.
cache_dir ufs /var/spool/squid 20 16 256
3) Reload squid config without restart as described in the previous section
4) You may need to wait a few hours for the cache to get cleared. Once the cache is clear, you may restore the previous cache sizes and reload the new config again. You can monitor the cache size through the Squid Manager HTTP interface.
Bypassing Squid
If for some reason you need to bypass Squid, reconfigure Apache to directly send requests to Tomcat. Edit the Apache config file /etc/httpd/conf/httpd.conf
# forward requests directly to Tomcat's proxy connector running on port 8082
ProxyPass / http://localhost:8082/
ProxyPassReverse / http://localhost:8082/
You will need to restart Apache after making this change.
/etc/init.d/httpd restart
Conclusion
Squid is a very powerful tool for caching. It is not for all applications. Please examine the need of your application and use squid appropriately. I’ve used squid for several years for caching the output from a Java data mashup application and am very satisfied with the ease of use and benefits. Hope you found this tutorial useful. Feel free to post a comment or share your experience with squid.