Transactions Across Datacenters
(and other weekend projects)
Special Lecture Series in Computer Science
University of San Francisco
Feb. 12, 2009
Of three properties of distributed data systems - consistency, availability,
partition tolerance - choose two.
-Eric Brewer, CAP theorem,
PODC 2000
Scaling is hard.
-Various
What's ahead
- Why transactions?
- Consistency
- Why across datacenters?
- Multihoming
- Techniques
- Tradeoffs
GMail
- Mail delivery: eventual, guaranteed
- Mailbox operations: immediate, guaranteed
- Web clip history: best effort
Why transactions?
- Correctness
- Consistency
- Enforce invariants
- ACID
Cliched examples
- Transfer money from A to B
- What if something happens in between?
- another transaction on A or B
- machine crashes
- ...
- Gmail: archive an email
What's ahead
- Why transactions?
- Consistency
- Why across datacenters?
- Multihoming
- Techniques
- Tradeoffs
Weak consistency
- After a write, subsequent reads may or may not see the new data
- Best effort only
- "Message in a bottle"
- IP/UDP, SMS, JPEG
- GMail: Web Clip history
Eventual consistency
- After a write, subsequent reads will eventually see the new data
- Search engine indexing
- DNS, SMTP, snail mail
- Amazon S3, SimpleDB
- GMail: mail delivery
Strong consistency
- After a write, subsequent reads will see the new data
- File systems
- RDBMSes
- App Engine datastore, Azure tables
- GMail: mailbox operations
What's ahead
- Why transactions?
- Consistency
- Why across datacenters?
- Multihoming
- Techniques
- Tradeoffs
Why across datacenters?
- Catastrophic failures
- Expected failures
- Routine maintennance
- Geolocality
Why not across datacenters?
- Within a datacenter
- High bandwidth: 1-100Gbps interconnects
- Low latency: < 1ms within a rack, < 5ms across
- Little to no cost
- Between datacenters
- Low bandwidth: 10Mbps-1Gbps
- High latency: expect 100s of ms
- $$$ for bandwidth or fiber
What's ahead
- Why transactions?
- Consistency
- Why across datacenters?
- Multihoming
- Techniques
- Tradeoffs
Multihoming
- Multihoming (n): operating from multiple datacenters simultaneously
- Hard problem.
- ...consistently? Harder.
- ...with real time writes? Hardest.
Option 1: Don't.
- ...instead, bunkerize.
- Most common
- Oracle, MySQL (even sharded)
- Microsoft, Cisco, telcos, US gov't
- Bad at catastrophic failure
- Not great for serving
Option 2: Primary with hot failover(s)
- Better, but not ideal
- Mediocre at catastrophic failure
- Window of lost data
- Failover data may be inconsistent
- Banks, brokerages, etc.
- Geolocated for reads, not for writes
Option 3: True multihoming
- Simultaneous writes in different DCs, maintaining consistency
- Two way
- N way
- Handles catastrophic failure, geolocality
What's ahead
- Why transactions?
- Consistency
- Why across datacenters?
- Multihoming
- Techniques
- Tradeoffs
Interested in...
- Replication
- Transactions
- distributed?
- decentralized?
Backups
- Make a copy
- Sledgehammer
- Weak consistency
- Replication, not transactions
Locking
- Sledgehammer
- Common in RDBMSes
- Good for consistency, bad for throughput
- Optimizations
- shared locks, read/write locks
- InnoDB (MySQL), Oracle
- Transactions, not replication
Optimistic concurrency
- Opposite of "pessimistic" locking
- Journal writes, check for collisions before commit
- Useful variant: multi version concurrency control
- Good for throughput, read-heavy workloads
- Transactions, not replication
Master/slave replication
- Usually asynchronous
- Good for throughput, latency
- Most RDBMSes
- Weak/eventual consistency
- Replication, not transactions
Multi-master replication
- Umbrella term for merging concurrent writes
- Usually asynchronous, eventual consistency
- Usually need serialization protocol
- e.g. timestamp oracle: monotonically increasing timestamps
- Either SPOF with master election...
- ...or distributed consensus protocol
- Replication and transactions
Two Phase Commit
- Centralized consensus protocol
- Single coordinator (SPOF)
- 1: propose, 2: vote, 3: commit/abort
- Heavyweight (high latency) and blocking
- 3PC buys non-blocking with extra message
- Replication and distributed transactions
- Supports true multihoming
Paxos
- Decentralized, distributed consensus protocol
- "Either Paxos, or Paxos with cruft, or broken"
- Majority writes; survives minority failure
- Protocol similar to 2PC/3PC
- Lighter, but still high latency
- Replication and distributed transactions
- Supports true multihoming
What's ahead
- Why transactions?
- Consistency
- Why across datacenters?
- Multihoming
- Techniques
- Tradeoffs
Tradeoffs (very approximate)
|
Paxos ... 2PC ... MMR ... MSR ... Backups |
Consistency |
Strong |
Eventual |
Weak |
Latency |
High |
Low |
Throughput |
Medium |
Low |
High |
Data loss |
None |
Some |
Lots |
Failover |
N/A |
Minimal impact |
Read only |
Tradeoffs (very approximate)
|
Paxos ... 2PC ... MMR ... MSR ... Backups |
Consistency |
Strong |
Eventual |
Weak |
Latency |
High |
Low |
Throughput |
Medium |
Low |
High |
Data loss |
None |
Some |
Lots |
Failover |
N/A |
Minimal impact |
Read only |
|
GMail |
What's behind (phew!)
- Why transactions?
- Consistency
- Why across datacenters?
- Multihoming
- Techniques
- Tradeoffs