Routing protocols and architectures/Content Delivery Networks
A web cache is a device that stores a local copy of most recently required content (e.g. HTTP resources) and reacts as a proxy server to clients' requests:
- the web cache is closer to the user with respect to the web server:
- performance: the reply is faster when the requested resource is already in cache;
- bandwidth: expensive long-distance links (e.g. transoceanic links) are not loaded;
- reactive solution: if the requested resource is not in cache, the user needs to wait for the web cache to acquire (pull) it from the web server;
- no transparency: the user's web browser needs to be manually configured to contact that web cache.
A Content Delivery Network (CDN) is an overlay network[1] of web caches scattered all around the world but cooperating with the purpose of offering to the user a better quality of experience[2]:
- proactive solution: the web server copies (push) content (generally the most popular one) to the web cache before the users will ask for it;
- transparency: the user connects to the web cache automatically, without the need for manual configuration on his own client;
- performance: the user, even if he moves, always connects to the closest web cache;
- load balancing: the user always connects to the least loaded web cache;
- scalability: the content deployment into multiple replicas allows a large number of requests which a single web server alone would not be able to serve;
- conditional access: it is possible to customize returned content based on the user (e.g. targeted advertisements).
CDNs are ideal for content generating large amounts of traffic (e.g. multimedia resources), but not all content can be cached:
- dynamic web pages (e.g. stock market prices);
- customized-content web pages (e.g. user account).
CDNs can be deployed in a variety of ways:
- DNS-based CDNs: traffic is redirected to the best replica based on host names:
- DNS-based routing: the hosting provider needs to enter into agreements with DNS server managers;
- Akamai approach: intervention on DNS servers is not needed;
- URL-based CDNs: traffic is redirected to the best replica based on full URLs:
- server load balancing: the TCP connection termination point is close to the server;
- content routing: the TCP connection termination point is close to the client.
DNS-based CDNs
[edit | edit source]DNS-based routing
[edit | edit source]Selection of the best replica takes place when the host name is translated to an IP address. The DNS reply to a query does not depend only on the host name, but also on the source: a special DNS server computes, based on as many metrics as possible (RTT, server load, response time, etc.), a replica routing table containing entries like:
{host name, client IP address} → replica IP address
The routing engine in the 'modified' DNS server has a standard interface to guarantee transparency: the user believes that the IP address corresponding to the host name is the IP address of the real web server, while it is the IP address of one of its replicas.
Adding a new actor, the hosting provider, constitutes a new business opportunity in the network world:
- access provider: it provides network access to users;
- backbone provider: it provides long-range connectivity;
- hosting provider: it provides the CDN service to content providers;
- content provider: it provides content.
- Issues
- metrics: metric measurement, especially the dynamic ones, is not easy, and layer-3 metrics alone are not particularly meaningful;
- DNS caching: only the authoritative server knows all replicas and can select the best replica based on the client location → intermediate DNS servers in the hierarchy can not cache DNS replies;
- granularity: redirection granularity is at host-name, not single-URL, level → content of large web sites can not be split into multiple caches, hence the same replica will be asked for two different pages in the same web site.
Akamai approach
[edit | edit source]Akamai CDN exploits a proprietary automatic algorithm to redirect traffic to its replicas without any intervention on DNS servers:
- the user types the address of a web page with its normal domain name (e.g. http://cnn.com/index.html);
- the server of the content provider (e.g. CNN) returns a web page where the address of every multimedia resource (e.g. image) has a special domain name corresponding to a specific replica on an Akamai cache (e.g. http://a128.g.akamai.net/7/23/cnn.com/a.gif instead of http://cnn.com/a.gif);
- the user's web browser when parsing the page performs DNS queries to the new domain names and gets multimedia resources from the closest replicas.
URL-based CDNs
[edit | edit source]Server load balancing
[edit | edit source]The real servers containing replicas are seen by clients as a single virtual server with the same IP address.
The traffic load destined to the virtual server is balanced among the several real servers by a Server Load Balancer (SLB):
- layer-4 switching: TCP connections are not terminated by the SLB (content-unaware):
- one of the real servers answers the three-way handshake with the client;
- all HTTP queries belonging to the same TCP session have to be always served by the same real server;
- load balancing can be based on the source IP address, the source TCP port, etc.;
- layer-7 switching: TCP connections are terminated by the SLB (content-aware), acting as a proxy:
- the SLB answers the three-way handshake with the client, to be able to catch URLs requested at a later time;
- each HTTP query can be served by the currently least loaded real server, based on SLB decisions;
- load balancing is based on the full URL.
- Issues
- encrypted connections (HTTP): the SLB needs to have the private SSL cryptographic key of the server, and needs to support the processing load for encrypting/decrypting packets in transit;
- sticky connections: some applications require that TCP connections from the same client are redirected to the same server (e.g. shopping cart) → cookies should be considered too;
- geographical distribution: all replicas are close to each other and to the SLB, which is far away from the client.
Content routing
[edit | edit source]Content routers are routers which route traffic based on the URL toward the best replica:
- TCP: all content routers in a sequence terminate TCP connections between them → too many delays are introduced;
- content delivery control protocol: the URL is extracted by the first content router, and is propagated by a specific protocol.
- Issues
- stateful: the first content router needs to terminate the TCP connection to be able to catch the URL the user will query;
- complexity of devices: packet parsing for getting the URL is complex → layer-7 switches are complex and expensive devices;
- complexity of protocols: proposed content delivery control protocols are really complex;
- privacy: content routers read all the URLs queried by users.