CDN usage on high performance e-commerce sites

When building an e-commerce web site several design aspects need to weighted into account and these are slightly specific compared to other types of sites (eg: editorial focused or peronal service). This article will deep dive into how such sites can be designed (ie architected and built technically) and implement a CDN effeciently, this might seem straight forward but has some caveats.

Assumptions

To make this interesting we assume that we are dealing with a fairly major site...
  • Global operations (at least multiple parts of the world covered)
  • High traffic volumes (millions of visits)
  • Many products on sale (multiple thousands)
  • Personalization of the UX is required
  • Two main types of pages, product detail pages and others
Product detail page (the deep red part is product unique eg: name/prize)
Other page (includes various pieces of product information, red parts)
The deep red part on both types should be consistent when updated.

Logical conclusions based on assumptions

A CDN (Content delivery network) is beneficial to reduce latency. On global site with millions of visits this seems like a fairly easy conclusion resulting in great performance. Some basic CDN caching with a fixed time should be a good start?

But there are some problems with this approach...
  • Even if there is significant traffic it is spread out over so many products the traffic per page is fairly low, this ruins cache hit ratio unless the fixed cache time is very long.
  • If the fixed cache time is very long critical updates such as price changes will take to long before reaching the user.
  • Long cache times will cause inconsistencies on pages displaying multiple products since they need updates whenever any product on the page needs changing.
  • Personalization becomes more complex.

Initial approach

  • Set a fixed cache time on all detailed product pages (display of a single product) (eg: 1 hour, up to a day)
  • Set a fixed but much shorter cache time on all pages that display multiple products (eg: 15 minutes), this will decrease potential inconsistencies between these pages and the detailed product pages.
  • Move all personalization to be based on separate requests using AJAX.
The compromise to find is how to set the fixed cache times. To long and you might have slow updates to changes and inconsistencies on some pages. To short and the cache hit ratio will be low and there will be less of a performance increase from the CDN.

A better way

  • Set long cache times on all pages
  • Send purge commands to the CDN whenever a page or part of a page needs updating
This allows having much higher cache hit ratios since the product detail pages are so many they will not update that frequently and rarely be purged. The other pages that include many products might still be a bit more volatile and be purged frequently and to do this properly some dependency tracking is needed to know which parts are included where (a bit of complexity).

The frequency of updates will eventually set the limit on how effectively the site can be delivered at what cache hit ratio and at what performance.

An even better way (The Holy grail)

  • Deliver sub page elements
  • Build the page at the CDN edge
This means lower load on the origin servers since any part only needs to be updated and delivered once. Updating individual pieces will be more efficient and faster since any other overhead is removed. Dependency tracking when purging is no longer needed (parts still need purging though). Additional development (eg: ESI or Lambda@Edge) will be needed to build the pages at the CDN edge (a bit of complexity). Since any execution (even simply putting page parts together) will slightly increase latency, some short cache of the result can be added to the resulting page.

The complete site can now be cached optimally, piece by piece. The risk of inconsistencies only depend on the cache time of the resulting page which can be very short since it can be rebuilt completely at the CDN edge. Some limitations might apply depending on CDN supplier, and of course some extra cost, but it's still the most efficient cache strategy I've seen.

Micro services

This piece by piece delivery is also a perfect structure when having multiple teams build and deliver different parts or aspects of the site independently. A micro service architecture for front-ends.

Comments

Popular Posts