Wednesday, July 28, 2010

Cache Java webapps with Squid Reverse Proxy


Cache Java webapps with Squid Reverse Proxy

This article shows you step by step how to cache your entire tomcat web application with Squid reverse Proxy without writing any java code.

What is Squid

Squid is a free proxy server for HTTP, HTTPS and FTP which saves bandwidth and increases response time by caching frequently requested web pages. While squid can be used as a proxy server when users try to download pages from the internet, it can be also used as a reverse-proxy by putting squid between the user and your webapp. All user requests first hit Squid. If the requested page already exists in Squid’s cache it is served directly from the cache without hitting your Webapp. If the page does not exist in Squid’s cache, it is fetched from your web application and stored in the cache for future requests.

Squid reduces hits to your server by caching response pages. You don’t have to worry about building page level caching in every application that your write, Squid takes care of that part.

When should I use Squid

Ideally you should use Squid for pages which have a high ratio of reads to writes. In other words, a page that changes less frequently but is accessed very often. Here are some scenarios:

  • A dynamical web page which displays news and is updated once an hour, and receives hundreds of hits during the hour
  • A static web page accessed freqently. Squid can give performance boost by caching frequently accessed static web pages in memory

When should I not use Squid

In most cases, if the request URL is the only factor which determines the response then you can safely use Squid. See more specific examples below:

  • If the entire apps is very dynamic in nature, and the validity of pages changes immediately.
  • Squid is not suitable for apps which require login. This unfortunately is a large number of applications. Such applications need to resort to back end caching, for example use other caching frameworks like Ehcache to cache re-usable page fragments and/or cache database queries and/or other performance bottlenecks.
  • Apps which heavily use browser cookies. Squid relies on URLs to cache pages. If the page served is computed from URLs + cookies, then you should not cache those pages in Squid.

How does the overall setup work


Apache Squid Tomcat architecture


Apache receives requests on port 80. Apache calls Squid with the request. Squid checks its cache to see if it has the response cached from before. If yes and if the response is not expired, it returns the cached response.In this case:

Squid will write the following header to the response:

X-Cache: HIT from www.ashoklabs.com

If the response is not found in Squid’s cache, squid will make a call to Tomcat on port 8082. Tomcat’s proxy connector is listening on this port. It processes the request and sends the response back to Squid. Squid saves the response in its cache, unless caching is disabled for that URL. Squid returns the final response to Apache which sends the response back to the user.

What if I don’t want to use Apache

Using Apache is not required to use Squid. You can run Squid on port 80, and point your users directly to Squid. If that is the case, skip section one and directly jump to section 2 below.

Step 1/3: Apache Httpd Config

If you are using Apache as a front end, you need to instruct Apache to forward requests to Squid at port 3128. See the following code snippet. Change the server name and paths to reflect your real values.

Apache config file:

/etc/httpd/conf/httpd.conf

ServerName www.ashoklabs.com
DocumentRoot /home/webadmin/www.ashoklabs.com/html
# forward requests to squid running on port 3128
ProxyPass / http://localhost:3128/
ProxyPassReverse / http://localhost:3128/

In addition to the above, you also need mod_proxy installed. If you see the following in your httpd.conf, you probably already have mod_proxy installed. If you first need to install mod_proxy.

LoadModule  proxy_module         modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so

Step 2/3: Squid Config

First make sure that Squid is installed on your server. You can download Squid from here.

The squid config file on Linux/Unix is located at this location.

/etc/squid/squid.conf

The config file is pretty long. Follow these instructions and set the values appropriately.

# leave the port to 3128
http_port 3128

# how much memory cache do you want? depends on how much memory you have on the machine
cache_mem 200 MB

# what's the biggest page that you want stored in memory. If you home page is 100 KB and
# you want it stored in memory, you may set it to a number bigger than that.
maximum_object_size_in_memory 100 KB

# how much disk cache do you want. It is 6400 MB in the following example, change it as per
# your needs. Make sure you have that much disk space free.
cache_dir ufs /var/spool/squid 6400 16 256

# this is probably the most important config section. Here you can configure the cache life for
# each URL pattern.

# Time is in minutes
# 1 day = 1440, 2 days = 2880, 7 days = 10080, 28 days = 40320

# do not cache url1
refresh_pattern ^http://127.0.0.1:8082/url1/ 0 20% 0

# cache url2 for 1 day
refresh_pattern ^http://127.0.0.1:8082/url2/ 1440 20% 1440 override-expire override-lastmod reload-into-ims ignore-reload

# cache css for 7 days
refresh_pattern ^http://127.0.0.1:8082/css 10080 20% 10080 override-expire override-lastmod reload-into-ims ignore-reload

# by default cache the whole website for 1 minute
refresh_pattern ^http://127.0.0.1:8082/ 0 20% 0 override-expire override-lastmod reload-into-ims ignore-reload

# how long should the errors should be cached for. For example 404s, HTTP 500 errors
negative_ttl 0 seconds

# On which host does tomcat run. Set 127.0.0.1 for localhost
httpd_accel_host 127.0.0.1

# this is the proxy port as defined in Tomcat server.xml. By default it is "8082"
httpd_accel_port 8082

# set this to "on". Read more documentation if you want to change this.
httpd_accel_single_host on

# To access Squid stats via the manager interface, you need to enter a password here
cachemgr_passwd your_clear_text_password all

# Say "off" if you want the query string to appear in the squid logs.
strip_query_terms off

Step 3/3: Tomcat Config

Make sure that the HTTP Proxy Connector is defined in TOMCAT_HOME/conf/server.xml.

maxThreads="50" minSpareThreads="5" maxSpareThreads="10"
enableLookups="false" acceptCount="100" connectionTimeout="20000"
proxyName="www.ashoklabs.com"
compressableMimeType="text/html,text/xml,text/css,text/javascript,text/plain" compression="on"
proxyPort="80" disableUploadTimeout="true" />

If needed, see additional documentation on Tomcat proxy connector.

Squid Manager Interface

You can access the Squid config and stats via the Squid Manger HTTP interface. Make sure that the “cachemgr.cgi” file which ships with squid installation is in your cgi-bin directory. More documentation on setting that up here.

Once you’ve set it up, you can access the cache manager via this URL:

http:///cgi-bin/cachemgr.cgi

To continue enter the following values:

Cache host: localhost
Cache port: 3128
Manager name: manager
Password:
  • Store Directory Stats shows you how much disk space is used by the disk cache.
  • Cache Client List show you the cache HIT/MISS ratio as %. You should monitor this frequently and tune your cache to get a higher hit %.

Reload Squid Config without restarting

Edit the squid config using “vi” or your favorite editor.

vi /etc/squid/squid.conf

Once you are done editing, reload the new config without restarting Squid.

/usr/sbin/squid -k reconfigure

Clearing Squid Cache

To clear Squid cache:

1) Set the memory cache to 4 MB (or a lower number)

cache_mem 8 MB

2) Set the disk cache to 8 MB (or a lower number). The disk cache must be higher that the memory cache.

cache_dir ufs /var/spool/squid 20 16 256

3) Reload squid config without restart as described in the previous section

4) You may need to wait a few hours for the cache to get cleared. Once the cache is clear, you may restore the previous cache sizes and reload the new config again. You can monitor the cache size through the Squid Manager HTTP interface.

Bypassing Squid

If for some reason you need to bypass Squid, reconfigure Apache to directly send requests to Tomcat. Edit the Apache config file /etc/httpd/conf/httpd.conf

# forward requests directly to Tomcat's proxy connector running on port 8082
ProxyPass / http://localhost:8082/
ProxyPassReverse / http://localhost:8082/

You will need to restart Apache after making this change.

/etc/init.d/httpd restart

Conclusion

Squid is a very powerful tool for caching. It is not for all applications. Please examine the need of your application and use squid appropriately. I’ve used squid for several years for caching the output from a Java data mashup application and am very satisfied with the ease of use and benefits. Hope you found this tutorial useful. Feel free to post a comment or share your experience with squid.


rsync with delete option and different ssh port

How to rsync e.g PIPELINE dir from Source to Destination? #rsync -avzr   --delete-before  -e "ssh -p $portNumber"  /local...