It’s preliminary, and still in closed beta, but it marks a milestone in the evolution of the Facebook platform. It could be an indication of the direction Facebook plans to take the platform in the future.
Here are some of my thoughts on this API, as well as a few responses from Haiping, in italics. Feel free to add your own!
The Facebook data store is a very simple hosted database, accessible by a RESTful CRUD API. It supports both XML and JSON. It requires users to define their schema, and has basic create, read, update, and delete operations, but no structured queries, full text search, or transactions.
The one truly interesting feature it does provide is associations, which are foreign keys, with all the usual constraints, but at the row level, not the table level. These may be powerful enough to provide an alternative to querying on property values.
So far, the API is very limited. Still, the sources of inspiration – SQL and social networking – are obvious even in this first draft. It will be interesting to see how it evolves.
As with most databases, Facebook apps must define their schema up front. A
schema consists of
ie tables, and
ie columns. Object types and property names are strings, alphanumeric plus
underscores, 32 characters or less. Property values can be small strings, large
text blobs, or integers. It looks like any or all of an object’s properties can
be unset, as if they were
NULLABLE SQL columns.
Schemas can be modified at runtime. Object types and properties can be
queried, added, removed, and renamed. These are similar to the SQL
ALTER TABLE, AND
DROP TABLE commands; in particular,
delete existing objects and property values along with the corresponding
object types and property definitions.
In most calls that handle property values, like setObjectProperty and set/getHashValue, properties are sent and returned as strings. Does that mean developers need to serialize integer properties manually? If so, that’s unfortunate at best, and dangerous at worst.
We were facing a decision between a strongly typed model vs. a weakly typed model in handling return values.
The API includes the standard create, get, update, and delete operations. In addition, there are get and delete operations that can operate on multiple objects, and operations that get and set a subset of an object’s properties. There are even dedicated objects for getting and setting a single property. Whether that’s syntactic sugar or API bloat is left to the reader.
When an object is created, it’s assigned a 64-bit integer
which serves as its primary key. The
fbid namespace and characteristics are
still unclear, but they’re used in other parts of the Facebook API, outside of
the data store.
Objects can also be identified by
Hash keys are described as “string object identifiers,” but there’s not much
information on them beyond that. They may just be a string encoding of the
fbid, for use in URLs and other string contexts.
fbids allocated? Is there a single global
fbid namespace, or
is it per app?
fbidis a GUID, a globally unique identifier. Well, it’s actually a UUID, universally unique identifier, in case Mars has life and computers.
What are hash keys? When would they be used?
It’s an arbitrary string application defines. So, it could be a URL, if an application elects to do so.
General-purpose queries based on property values are an important feature in most data stores, but they’re conspicuously absent here. Will they ever be added?
In fact, you could do that through FQL. In FQL, any property can appear in WHERE clause.
What are the practical limits on scaling? How many objects could an app realistically store? How many object types? Properties per object type? Associations? etc.
Facebook Data Store API is designed to be scalable, although we will have resource limit imposed to make fair use between different applications. Quota will be documented when we have them.
Despite the heading of this section, there are no general-purpose transactions in the Facebook data store API. Fortunately, there is an atomic increment operation. It can increment or decrement an integer property value by any amount. This is obviously no substitute for full-fledged transactions, but it can handle a surprising number of use cases.
One odd quirk is that the increment operation requires a hash key. Unlike
the CRUD operations, there’s no corresponding increment operation that takes
Will more transactional capabilities be added? Or is atomic increment the best we can hope for?
Not in consideration currently, due to distributed nature of data objects.
Associations are by far the most intriguing part of the data store API. They’re foreign keys, but they can be one-way or bidirectional, and, unlike SQL, they’re set at the object level instead of the object type level.
More importantly, associations have properties and constraints traditionally
found in both foreign keys and graph theory. They can be one-way or two-way,
and two-way associations can be symmetric, where A=>B is equivalent to B=>A,
or asymmetric, where A=>B and B=>A mean different things, and can have
different names. Also, each endpoint of an association may be classified as
unique, which enforces a one-to-many or many-to-one relationship, as opposed
to the default many-to-many. See the
page for more.
Associations can be set, removed, and queried in the expected ways. In particular, there are operations to get all associations between two objects as well as all objects associated with a given object, along with corresponding operations to get the number of associations and associated objects.
It’s easy to see how these kinds of object-level relationships can be very powerful in a social networking application. Apps can use these to define custom relationships and connections between people beyond the existing ones that Facebook provides. Beyond that, apps could let users create objects of any type – classes, books, movies, pets – and then connect them to people, and to each other, in new ways.
Even in this first draft, associations are very powerful and useful. It’s interesting that they’re handled separately from properties, though, and that they can’t be set automatically, similar to traditional foreign keys. If that was added, would it be a net gain?
What happens if i create an invalid association, e.g. one that violates a uniqueness constraint? The setAssociation calls don’t have return values, and there aren’t any association-specific error codes.
It returns an invalid operation error, although we may consider to special case the error code.
Are the counting methods efficient? Specifically, are they roughly constant
time, or linear? I can see how they’d be very useful for large associations,
but if they’re no faster than calling
that makes them less useful.
Constant time. This should be documented. There is one more difference from calling getAssociatedObjects() then count, because we have an upper limit on how many associated object ids to return. This is missing from documentation, too.
The API has separate, dedicated storage for per-app user preferences. A user preference is simply a string key/value pair. Unlike objects and properties, though, preference names don’t have to be defined up front, and different users can have different preference names as well as values.
Apart from that twist, the preferences API is fairly straightforward and unsurprising.