Scaling Data on the Cheap
Ryan Barrett
Inc.
why?
- growth and flash crowds: a good problem, but a problem
- bad for load, but...
- can duplicate frontends, file servers, indexes, caches
- worse for data volume
- even worse with common usage
- how to scale a traditional rdbms?
[any material that should appear in print but not on the slide]
what?
- shard! scale dbs across machines
- not ORM, just a library and a shell
- simple partitioning across db machines
- mirrored schema
- global primary key namespace
- present a single-db abstraction to app code
[any material that should appear in print but not on the slide]
how?
- library takes sql queries from app
- connects to db(s) that have the data
- return results to app
- don't distribute by hashing
- tables are sharded, replicated, or single-shard
[any material that should appear in print but not on the slide]
how, faster?
- parallel queries
- shard hints
- bloom filters
- ask your ORM for help!
- parent/child relationships
- session pinning
[any material that should appear in print but not on the slide]
why not
- can't join across shards (except histograms)
- need separate data warehouse
- cache incoherence
- your dba will hate you
- resharding
[any material that should appear in print but not on the slide]
hibernate shards
- config per shard
- access, resolution, selection, ids
- all customizable
- virtual shards
- early days: minimal Criteria/HQL, no caching
[any material that should appear in print but not on the slide]
sharding 201: cross-shard txes
- no two-phase commit?
- standard distributed tx
- Transactions table, with tx data, on all shards
- update shard 1, write tx row
- update shard 2, write tx row with same primary key
- recover on startup or in background
[any material that should appear in print but not on the slide]
sharding 201: cache coherence
- avoid stateful sessions
- no, really. just say no!
- ...
- fine, be that way
- use version numbers
- maintain versions in session state
- check version in write tx
- if changed, abort
[any material that should appear in print but not on the slide]
Questions?
Ryan Barrett
http://snarfed.org/
hackfest@ryanb.org
[any material that should appear in print but not on the slide]