Facebook Data Store API Thoughts

On Monday, Facebook employee Haiping Zhao quietly published a Data Store API on the Facebook developer wiki.

It’s preliminary, and still in closed beta, but it marks a milestone in the evolution of the Facebook platform. It could be an indication of the direction Facebook plans to take the platform in the future.

Here are some of my thoughts on this API, as well as a few responses from Haiping, in italics. Feel free to add your own!

Contents

Introduction
Schema definition
Data access
Transactions
Associations
Preferences

Introduction

The Facebook data store is a very simple hosted database, accessible by a RESTful CRUD API. It supports both XML and JSON. It requires users to define their schema, and has basic create, read, update, and delete operations, but no structured queries, full text search, or transactions.

The one truly interesting feature it does provide is associations, which are foreign keys, with all the usual constraints, but at the row level, not the table level. These may be powerful enough to provide an alternative to querying on property values.

So far, the API is very limited. Still, the sources of inspiration – SQL and social networking – are obvious even in this first draft. It will be interesting to see how it evolves.

Schema definition

As with most databases, Facebook apps must define their schema up front. A schema consists of object types, ie tables, and properties, ie columns. Object types and property names are strings, alphanumeric plus underscores, 32 characters or less. Property values can be small strings, large text blobs, or integers. It looks like any or all of an object’s properties can be unset, as if they were NULLABLE SQL columns.

Schemas can be modified at runtime. Object types and properties can be queried, added, removed, and renamed. These are similar to the SQL CREATE TABLE, ALTER TABLE, AND DROP TABLE commands; in particular, dropObjectType and undefineObjectProperty delete existing objects and property values along with the corresponding object types and property definitions.

Questions:

The description of the properties parameter in createObject and updateObject seems to imply that some or all any or all of an object’s properties may be undefined. Is that true?

In most calls that handle property values, like setObjectProperty and set/getHashValue, properties are sent and returned as strings. Does that mean developers need to serialize integer properties manually? If so, that’s unfortunate at best, and dangerous at worst.

We were facing a decision between a strongly typed model vs. a weakly typed model in handling return values.

Data access

The API includes the standard create, get, update, and delete operations. In addition, there are get and delete operations that can operate on multiple objects, and operations that get and set a subset of an object’s properties. There are even dedicated objects for getting and setting a single property. Whether that’s syntactic sugar or API bloat is left to the reader.

When an object is created, it’s assigned a 64-bit integer fbid, which serves as its primary key. The fbid namespace and characteristics are still unclear, but they’re used in other parts of the Facebook API, outside of the data store.

Objects can also be identified by hash keys. Hash keys are described as “string object identifiers,” but there’s not much information on them beyond that. They may just be a string encoding of the fbid, for use in URLs and other string contexts.

Questions:

How are fbids allocated? Is there a single global fbid namespace, or is it per app?

An fbid is a GUID, a globally unique identifier. Well, it’s actually a UUID, universally unique identifier, in case Mars has life and computers.

What are hash keys? When would they be used?

It’s an arbitrary string application defines. So, it could be a URL, if an application elects to do so.

General-purpose queries based on property values are an important feature in most data stores, but they’re conspicuously absent here. Will they ever be added?

In fact, you could do that through FQL. In FQL, any property can appear in WHERE clause.

What are the practical limits on scaling? How many objects could an app realistically store? How many object types? Properties per object type? Associations? etc.

Facebook Data Store API is designed to be scalable, although we will have resource limit imposed to make fair use between different applications. Quota will be documented when we have them.

Transactions

Despite the heading of this section, there are no general-purpose transactions in the Facebook data store API. Fortunately, there is an atomic increment operation. It can increment or decrement an integer property value by any amount. This is obviously no substitute for full-fledged transactions, but it can handle a surprising number of use cases.

One odd quirk is that the increment operation requires a hash key. Unlike the CRUD operations, there’s no corresponding increment operation that takes an fbid.

Questions:

Will more transactional capabilities be added? Or is atomic increment the best we can hope for?

Not in consideration currently, due to distributed nature of data objects.

Associations

Associations are by far the most intriguing part of the data store API. They’re foreign keys, but they can be one-way or bidirectional, and, unlike SQL, they’re set at the object level instead of the object type level.

Similar to object types and properties, associations are a kind of schema. Association types have string names, they must be defined ahead of time, and they can be queried and modified on the fly.

More importantly, associations have properties and constraints traditionally found in both foreign keys and graph theory. They can be one-way or two-way, and two-way associations can be symmetric, where A=>B is equivalent to B=>A, or asymmetric, where A=>B and B=>A mean different things, and can have different names. Also, each endpoint of an association may be classified as unique, which enforces a one-to-many or many-to-one relationship, as opposed to the default many-to-many. See the defineAssociation page for more.

Associations can be set, removed, and queried in the expected ways. In particular, there are operations to get all associations between two objects as well as all objects associated with a given object, along with corresponding operations to get the number of associations and associated objects.

It’s easy to see how these kinds of object-level relationships can be very powerful in a social networking application. Apps can use these to define custom relationships and connections between people beyond the existing ones that Facebook provides. Beyond that, apps could let users create objects of any type – classes, books, movies, pets – and then connect them to people, and to each other, in new ways.

Questions:

Even in this first draft, associations are very powerful and useful. It’s interesting that they’re handled separately from properties, though, and that they can’t be set automatically, similar to traditional foreign keys. If that was added, would it be a net gain?

What happens if i create an invalid association, e.g. one that violates a uniqueness constraint? The setAssociation calls don’t have return values, and there aren’t any association-specific error codes.

It returns an invalid operation error, although we may consider to special case the error code.

Are the counting methods efficient? Specifically, are they roughly constant time, or linear? I can see how they’d be very useful for large associations, but if they’re no faster than calling getAssociatedObjects, that makes them less useful.

Constant time. This should be documented. There is one more difference from calling getAssociatedObjects() then count, because we have an upper limit on how many associated object ids to return. This is missing from documentation, too.

Preferences

The API has separate, dedicated storage for per-app user preferences. A user preference is simply a string key/value pair. Unlike objects and properties, though, preference names don’t have to be defined up front, and different users can have different preference names as well as values.

Apart from that twist, the preferences API is fairly straightforward and unsurprising.

10 thoughts on “Facebook Data Store API Thoughts

  1. Ah yes … yet another “REST”ful API that doesn’t really exhibit any of the properties of REST other than (mis)using HTTP.

  2. mind elaborating? do you mean that it’s not stateless? sure, the calls need an api key and a session id, but i expect the former is just for quotas and throttling, and the latter is just the facebook user who owns the data. i don’t know that those disqualify it from claiming RESTfulness.

  3. i wrote a php client lib for people too impatient to wait for the official one from facebook… it just extends the existing classes, shouldn’t break anything – http://mysticspiral.org/facebook_datastore.php.gz – also, your openid validator submitted a messed up url and wouldn’t let me authenticate :/

  4. Paul, thank you for client lib.
    The only issue I found is in the function data_setUserPreference as a parameters you used ‘pref_id’ and ‘string’ where it should be ‘pref_id’ and ‘value’
    Best regards..

  5. Does FB define a set of schemas for existing associations? Is there a kind of type hierarchy for associations? This may provide an extension mechanism for relationship between members.

  6. Regarding resource limits:
    <<Facebook Data Store API is designed to be scalable, although we will have resource limit imposed to make fair use between different applications. Quota will be documented when we have them.>>

    Before one makes the effort to integrate their data into the Facebook datastore it would be nice to know and understand the limits.  Has the quota been documented yet?

  7. what data is stored about a facebook group or
    individual

Leave a Reply

Your email address will not be published. Required fields are marked *