When BSD meet Linux and Windows: What is Reverse Proxy Cache?

Reverse proxy cache, also known as Web Server Acceleration, is a method of reducing the load on a busy web server by using a web cache between the server and the internet. Another benefit that can be gained is improved security. It's one of many ways to improve scalability without increasing the complexity of maintenance too much. A good use of a reverse proxy is to ease the burden on a web server that provides both static and dynamic content. The static content can be cached on the reverse proxy while the web server will be freed up to better handle the dynamic content.

By deploying Reverse Proxy Server alongside web servers, sites will:

• Avoid the capital expense of purchasing additional web servers by increasing the capacity of existing servers.
• Serve more requests for static content from web servers.
• Serve more requests for dynamic content from web servers.
• Increase profitability of the business by reducing operating expenses including the cost of bandwidth required to serve content.
• Accelerate the response time of web and accelerate page download times to end users, delivering a faster, better and experience to site visitors.

When planning Reverse Proxy implementation the origin server's content should be written with the proxy server in mind, i.e. it should be "Cache Friendly". If the origin server's content is not "cache aware", it will not be able to take full advantage of the reverse proxy cache. In Reverse Proxy mode, the Proxy Server functions more like a web server with respect to the clients it services. Unlike internal clients, external clients are not reconfigured to access the proxy server. Instead, the site URL routes the client to the proxy as if it were a web server. Replicated content is delivered from the proxy cache to the external client without exposing the origin server or the private network residing safely behind the firewall. Multiple reverse proxy servers can be used to balance the load on an overtaxed web server in much the same way. The objective of this white paper is to explain the implementation of

Squid as a Reverse proxy also known as Web Server accelerator. The basic concept of caching is explained followed by the actual implementation and testing of the reverse proxy mode of squid.

Squid is an Open source high-performance Proxy caching server designed to run on Unix systems. National Science Foundation funds squid project, Squid has its presence in numerous ISP's and corporate around the globe. Squid can do much more than what most of the proxy servers around can do.

Reverse Proxy compared with other Proxy caches
There are three main ways that proxy caches can be configured on a network:
Standard Proxy Cache A standard proxy cache is used to cache static web pages (html and images) to a machine on the local network. When the page is requested a second time, the browser returns the data from the local proxy instead of the origin web server. The browser is explicitly configured to direct all HTTP requests to the proxy cache, rather than the target web server. The cache then either satisfies the request itself or passes on the request to the target server.

Transparent Cache

A transparent cache achieves the same goal as a standard proxy cache, but operates transparently to the browser. The browser does not need to be explicitly configured to access the cache. Instead, the transparent cache intercepts network traffic, filters HTTP traffic (on port 80), and handles the request if the item is in the cache. If the item is not in the cache, the packets are forwarded to the origin web server. For Linux, the transparent cache uses iptables or ipchains to intercept and filter the network traffic. Transparent caches are especially useful to ISPs, because they require no browser setup modification. Transparent caches are also the simplest way to use a cache internally on a network (at peering hand off points between an ISP and a larger network, for example), because they don't require explicit coordination with other caches.

Reverse Proxy Cache

A reverse proxy cache differs from standard and transparent caches, in that it reduces load on the origin web server, rather than reducing upstream network bandwidth on the client side. Reverse Proxy Caches offload client requests for static content from the web server, preventing unforeseen traffic surges from overloading the origin server. The proxy server sits between the Internet and the Web site and handles all traffic before it can reach the Web server. A reverse proxy server intercepts requests to the Web server and instead responds to the request out of a store of cached pages. This method improves the performance by reducing the amount of pages actually created "fresh" by the Web server.

How reverse proxy caches work.

When a client browser makes an HTTP request, the DNS will route the request to the reverse proxy machine, not the actual web server. The reverse proxy will check its cache to see if it contains the requested item. If not, it connects to the real web server and downloads the requested item to its disk cache. The reverse proxy can only server cacheable URLs (such as html pages and images).

Dynamic content such as cgi scripts and Active Server Pages cannot be cached. The proxy caches static pages based on HTTP header tags that are returned from the web page
The four most important header tags are:
Last-Modified: Tells the proxy when the page was last modified.
Expires: Tells the proxy when to drop the page from the cache.
Cache-Control: Tells the proxy if the page should be cached.
Pragma: Also tells the proxy if the page should be cached.
For example, by default all Active Server Pages return "Cache-control: private."Therefore, no Active Server Pages will be cached on a reverse proxy server

When BSD meet Linux and Windows

Monday, June 9, 2008

What is Reverse Proxy Cache?

No comments:

Blog Archive

About Me

Info Sharing