Here’s the abstract:
If you work on distributed systems, you probably try to design your system to keep running if any single machine fails. If you’re ambitious, you might extend this to entire racks, or even more inconvenient sets of machines. However, what if your entire datacenter falls off the face of the earth?
This talk will examine how current large scale storage systems handle fault tolerance and consistency, with a particular focus on popular cloud computing platforms. We’ll cover techniques such as replication, sharding, two phase commit, and consensus protocols (e.g. Paxos), then explore how they can be applied across datacenters.
Feedback is welcome!