javascript:void(0);

Under the covers of the
App Engine Datastore

Ryan Barrett
Google
May 28, 2008

Contents

Bigtable in one slide

(...ok maybe ten)

Bigtable

Bigtable

Bigtable

Bigtable

a...
b...
c...

Bigtable

a...
b...
c...

Bigtable

  • Delete
a...
Xb...
c...

Bigtable

  • Single-row transaction
a...
bX
c...
 
...
 
a...
bX'
c...

Bigtable

  • Scan: prefix b
a...
b1...
b2...
b3...
c...

Bigtable

  • Scan: range b to d
a...
b...
c...
d...
e...

Contents

  • Best practices
  • Bigtable in one slide
  • The Entities table
  • Queries and indexes
    • Kind index
    • Single-property index
    • Composite index
    • Merge join
  • Entity groups and transactions

The Entities table

  • Primary table
  • Stores all entities in all apps
  • Generic, schemaless
  • Row name is entity key
  • Only column is serialized entity

Entity keys

  • Entity key based on parent entities
  • Root entity, then its child, then its child, ...

Entity keys

class Grandparent(db.Model): pass class Parent(db.Model): pass class Child(db.Model): pass ethel = Grandparent() jane = Parent(parent=ethel) timmy = Child(parent=jane)

Entity keys

class Grandparent(db.Model): pass class Parent(db.Model): pass class Child(db.Model): pass ethel = Grandparent() jane = Parent(parent=ethel) timmy = Child(parent=jane)

/Grandparent:Ethel
/Grandparent:Ethel/Parent:Jane
/Grandparent:Ethel/Parent:Jane/Child:Timmy

Entity keys

/Grandparent:Alice
/Grandparent:Alice/Parent:Sam
/Grandparent:Ethel
/Grandparent:Ethel/Parent:Jane
/Grandparent:Ethel/Parent:Jane/Child:Timmy
/Grandparent:Ethel/Parent:Jane/Child:William
/Grandparent:Frank

Contents

  • Best practices
  • Bigtable in one slide
  • The Entities table
  • Queries and indexes
    • Kind index
    • Single-property index
    • Composite index
    • Merge join
  • Entity groups and transactions

Queries

  • kind
  • property filters
  • sort orders
  • ancestor

Example queries

SELECT * FROM Person ... WHERE name = 'John'; ORDER BY name DESC; WHERE city = 'Sonoma' AND state = 'CA' AND country = 'USA'; WHERE ANCESTOR IS :ethel ORDER BY name;

Indexes

  • Separate Bigtable tables
  • Map property values to entities
  • Each row includes index data and entity key

Scanning indexes

  • filtering in memory
  • sorting in memory
  • not even a little!

Query planning

  • Goal: convert query to dense index scan
  • Pick the index
  • Compute the prefix or range
  • Scan!

Contents

  • Best practices
  • Bigtable in one slide
  • The Entities table
  • Queries and indexes
    • Kind index
    • Single-property index
    • Composite index
    • Merge join
  • Entity groups and transactions

Kind index

  • Serves queries for all entities of a given kind
  • e.g. SELECT * FROM Grandparent

Kind index

  • Index data is kind
  • KindKey
    ChildJimmy
    ChildTimmy
    GrandparentEthel
    GrandparentFred
    ParentJane
    ParentJohn
    ParentTodd

Kind index

  • SELECT * FROM Grandparent
  • Scan with prefix Grandparent
  • KindKey
    ChildJimmy
    ChildTimmy
    GrandparentEthel
    GrandparentFred
    ParentJane
    ParentJohn
    ParentTodd

Contents

  • Best practices
  • Bigtable in one slide
  • The Entities table
  • Queries and indexes
    • Kind index
    • Single-property index
    • Composite index
    • Merge join
  • Entity groups and transactions

Single-property index

  • Serves queries on a single property:
    • WHERE name = 'John'
    • ORDER BY name DESC
    • WHERE name >= 'B' AND name < 'C' ORDER BY name
  • One ascending, one descending

Single-property index

  • Index data is kind, property name, value
  • KindNameValue
    Parentaddress1 Palm Dr.
    ParentnameAlice
    ParentnameBob
    ParentnameBrad
    ParentnameChelsea
    ParentnameJane
    ParentnameJohn
    ParenttitleNinja Pirate

Single-property index

  • ...or descending
  • KindNameValue DESC
    Parentaddress1 Palm Dr.
    ParentnameJohn
    ParentnameJane
    ParentnameChelsea
    ParentnameBrad
    ParentnameBob
    ParentnameAlice
    ParenttitleNinja Pirate

Single-property index

  • WHERE name = 'John'
  • Scan with prefix Parent name John
  • KindNameValue
    Parentaddress1 Palm Dr.
    ParentnameAlice
    ParentnameBob
    ParentnameBrad
    ParentnameChelsea
    ParentnameJane
    ParentnameJohn
    ParenttitleNinja Pirate

Single-property index

  • ORDER BY name DESC
  • Scan descending index with prefix Parent name:
  • KindNameValue DESC
    Parentaddress1 Palm Dr.
    ParentnameJohn
    ParentnameJane
    ParentnameChelsea
    ParentnameBrad
    ParentnameBob
    ParentnameAlice
    ParenttitleNinja Pirate

Single-property index

  • WHERE name >= 'B' AND name < 'C'
    ORDER BY name
  • Scan range [Parent name B, Parent name C):
  • KindNameValue
    Parentaddress1 Palm Dr.
    ParentnameAlice
    ParentnameBob
    ParentnameBrad
    ParentnameChelsea
    ParentnameJane
    ParentnameJohn
    ParenttitleNinja Pirate

Contents

  • Best practices
  • Bigtable in one slide
  • The Entities table
  • Queries and indexes
    • Kind index
    • Single-property index
    • Composite index
    • Merge join
  • Entity groups and transactions

Composite index

  • Defined in index.yaml
  • Serve queries on multiple properties or ancestor

  • WHERE firstname = 'Ryan' AND lastname = 'Barrett' WHERE firstname >= 'B' AND firstname < 'C' AND lastname = 'Smith' WHERE ANCESTOR IS :ethel ORDER BY firstname

Composite index

  • index.yaml:

  • - kind: Parent properties: - name: lastname - name: firstname

Composite index

  • index.yaml:

  • - kind: Parent properties: - name: lastname - name: firstname
  • Index data is kind, ancestor, property values
  • Kindlastnamefirstname
    ParentAndersonJane
    ParentBarrettRyan
    ParentSmithBob
    ParentThomasAlice

Query planning

  1. Add kind to prefix
  2. Add ancestor, if any
  3. Add = filters, if any
  4. If any inequality filters, convert to range scan
  5. Add sort orders, if any

Composite index

  • WHERE firstname = 'Ryan'
     AND lastname = 'Barrett'
  • Scan index with prefix Parent Barrett Ryan:
  • Kindlastnamefirstname
    ParentAndersonJane
    ParentBarrettRyan
    ParentSmithBob
    ParentSmithBrad
    ParentSmithChelsea
    ParentThomasAlice

Composite index

  • WHERE lastname = 'Smith'
     AND firstname >= 'B' AND firstname < 'C'
  • Scan range [Parent Smith B, Parent Smith C):
  • Kindlastnamefirstname
    ParentAndersonJane
    ParentBarrettRyan
    ParentSmithBob
    ParentSmithBrad
    ParentSmithChelsea
    ParentThomasAlice

Composite index

    Ancestor index:

    - kind: Parent ancestor: yes properties: - name: firstname
    KindAncestorfirstname
    Parent/Grandparent:DavidJane
    Parent/Grandparent:EthelBob
    Parent/Grandparent:EthelChelsea
    Parent/Grandparent:EthelRyan
    Parent/Grandparent:FredAlice

Composite index

  • WHERE ANCESTOR IS /Grandparent:Ethel
    ORDER BY firstname
  • Scan with prefix Parent /Grandparent:Ethel:
  • KindAncestorfirstname
    Parent/Grandparent:DavidJane
    Parent/Grandparent:EthelBob
    Parent/Grandparent:EthelChelsea
    Parent/Grandparent:EthelRyan
    Parent/Grandparent:FredAlice

Contents

  • Best practices
  • Bigtable in one slide
  • The Entities table
  • Queries and indexes
    • Kind index
    • Single-property index
    • Composite index
    • Merge join
  • Entity groups and transactions

Merge join

  • Serves multiple = filters without a composite index
  • Can include ancestor
  • e.g. WHERE firstname = 'John' AND lastname = 'Smith'
  • Scan single-property index once for each filter value
  • Intersect results

Merge join

  • WHERE firstname = 'John' AND lastname = 'Smith'
  • Scan with prefix Parent firstname John:
  • KindNameValue
    ParentfirstnameJohn
    ParentfirstnameJohn
    ParentfirstnameRyan
  • ...and with prefix Parent lastname Smith:
  • ParentlastnameBarrett
    ParentlastnameSmith
    ParentlastnameSmith

Merge join

  • Intersect results incrementally, using key:
  • KindNameValueKey
    ParentfirstnameJohnX
    ParentfirstnameJohnY
    ParentlastnameSmithY
    ParentlastnameSmithZ

Contents

  • Best practices
  • Bigtable in one slide
  • The Entities table
  • Queries and indexes
    • Kind index
    • Single-property index
    • Composite index
    • Merge join
  • Entity groups and transactions

Transaction model

  • All writes are transactional
  • Timestamped journals
  • Entity version history
  • Optimistic concurrency

Example: Put

  1. Read "committed" timestamp
  2.  
  3.  
  4.  
KeyEntityJournalCommitted
 ←AX (3:00) ...3:00

Example: Put

  1. Read "committed" timestamp
  2. Write journal
  3.  
  4.  
KeyEntityJournalCommitted
 →AX (3:00) 3:45 A←Y3:00

Example: Put

  1. Read "committed" timestamp
  2. Write journal
  3. Apply journal
  4.  
KeyEntityJournalCommitted
 →A Y (3:45)
X (3:00)
3:45 A←Y3:00

Example: Put

  1. Read "committed" timestamp
  2. Write journal
  3. Apply journal
  4. Update committed timestamp
KeyEntityJournalCommitted
←→AY (3:45)
X (3:00)
3:45 A←Y 3:00 3:45

Entity groups

  • Defined by root entities
  • Consists of root and all descendants
  • Journal, last committed timestamp co-located with root entity

Example: paying allowance

class Parent(db.Model): cash = db.IntegerProperty() class Child(db.Model): cash = db.IntegerProperty() parent = Parent(cash=1000) child = Child(parent=parent, cash=0)

Example: paying allowance

def pay(parent_key, child_key, amount): parent, child = db.get(parent_key, child_key) parent.cash -= amount child.cash += amount db.put(parent, child) db.run_in_transaction( pay, parent.key(), child.key(), 10)

Example: paying allowance

def pay(parent_key, child_key, amount): parent, child = db.get(parent_key, child_key) parent.cash -= amount child.cash += amount db.put(parent, child) db.run_in_transaction( pay, parent.key(), child.key(), 10)
  • Read entity group for parent
  • Note committed timestamp, 3:00pm
  • Read parent and child entities as of 3:00pm

Example: paying allowance

def pay(parent_key, child_key, amount): parent, child = db.get(parent_key, child_key) parent.cash -= amount child.cash += amount db.put(parent, child) db.run_in_transaction( pay, parent.key(), child.key(), 10)
  • Serialize parent and child
  • Write to journal in parent's entity group row

Example: paying allowance

def pay(parent_key, child_key, amount): parent, child = db.get(parent_key, child_key) parent.cash -= amount child.cash += amount db.put(parent, child) db.run_in_transaction( pay, parent.key(), child.key(), 10)
  • Apply journalled changes to parent and child
  • Write new committed timestamp to entity group
  • Update parent.cash and child.cash index rows

Recovery

  • Always check entity group first
  • Unfinished txes are rolled forward
  • Adds one Bigtable read to every datastore read/write

Contents, redux

  • Best practices
  • Bigtable in one slide
  • The Entities table
  • Queries and indexes
    • Kind index
    • Single-property index
    • Composite index
    • Merge join
  • Entity groups and transactions