<?xml version="1.0"?>
<!DOCTYPE content [ <!ENTITY nbsp " "> ]>
<rdf:RDF xml:base="http://snarfed.org/rdf"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">

<rdf:Description rdf:about="http://snarfed.org">
  <dc:title> snarfed.org  </dc:title>
  <dc:description> draw group stream of consciousness </dc:description>
  <dc:creator> Ryan Barrett &lt;snarfed at ryanb dot org&gt; </dc:creator>
  <dc:language> en </dc:language>
  <dc:format> text/html </dc:format>
  <dc:rights> Copyright 2002-2007 Ryan Barrett </dc:rights>
</rdf:Description>

<rdf:Description rdf:about="http://snarfed.org/space/amazon%20simpledb%20thoughts">
  <dc:title> Amazon SimpleDB thoughts </dc:title>
  <dc:creator> Ryan Barrett &lt;snarfed at ryanb dot org&gt; </dc:creator>
  <dc:date> 2007-12-16T21:25:00Z </dc:date>
  <dc:language> en </dc:language>
  <dc:format> text/html </dc:format>
  <dc:rights> Copyright 2002-2007 Ryan Barrett </dc:rights>

  <content>
    <div style="float: right; margin-left: 10px">
  <a href="http://aws.amazon.com/">
   <img src="/space/amazon_web_services.gif" /></a>
</div>

<p><a href="http://amazon.com/">Amazon</a> recently announced their latest web service,
<a href="http://aws.amazon.com/simpledb">SimpleDB</a>, to a roar of
<a href="http://www.techcrunch.com/2007/12/14/amazon-takes-on-oracle-and-ibm-with-simple-db-beta/">buzz</a>
<a href="http://hardware.slashdot.org/hardware/07/12/16/0012213.shtml">and</a>
<a href="http://www.scripting.com/stories/2007/12/15/amazonRemovesTheDatabaseSc.html">hype</a>.
I finally got a chance to sit down and read through the docs and blog posts,
and
<a href="/space/facebook+data+store+api+thoughts">like with the Facebook Data Store API</a>,
I've written up my thoughts.</p>

<p>Amazon's
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/">docs</a>
do a pretty good job of describing SimpleDB, so I won't try to reproduce them.
Instead, I'll focus on observations, and I'll emphasize a few important points
that are buried deep in the docs.</p>

<p>The executive summary is: I like it. It's solid, straightforward, and
eminently useful. Sure, it's limited. It includes design decisions that
clearly simplified the implementation at the cost of functionality and
usability. Still, as a result of those decisions, SimpleDB has
the potential to be very robust, scalable, and performant.</p>

<p>With SimpleDB alongside <a href="http://aws.amazon.com/s3">S3</a> and
<a href="http://aws.amazon.com/ec2">EC2</a>, Amazon's web services look more and more
like the <a href="http://www.catb.org/~esr/writings/taoup/">Unix philosophy</a>: small,
simple tools that do one job, do it well, and fit together in ways that
complement each other. Very, very cool.</p>

<p>Then again, I'm a sucker for anything
<a href="/space/amazon+simpledb+thoughts#tuplespaces">based on tuplespaces...</a></p>

<h3>Contents</h3>

<p><a href="/space/amazon+simpledb+thoughts#intro">Introduction</a> <br />
<a href="/space/amazon+simpledb+thoughts#tuplespaces">Tuplespaces!</a> <br />
<a href="/space/amazon+simpledb+thoughts#queries">Queries</a> <br />
<a href="/space/amazon+simpledb+thoughts#attributes">Attributes and ordering</a> <br />
<a href="/space/amazon+simpledb+thoughts#scaling">Scaling and Dynamo</a> <br />
<a href="/space/amazon+simpledb+thoughts#pricing">Usage-based pricing</a></p>

<p><a name="intro"></a></p>

<h3><a href="/space/amazon+simpledb+thoughts#intro"><img src="/Icon-Permalink.png" alt="Icon-Permalink.png" title="" /></a> Introduction</h3>

<p>SimpleDB is a simple, schemaless structured storage engine. It stores items,
which are bags of key/value pairs. Keys and values are always strings;
primitive data types like integers and floats are not natively supported.
Developers choose a unique string name for each item at creation time.</p>

<p>The primary operations are
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_PutAttributes.html">Put</a>,
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_GetAttributes.html">Get</a>, and
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_DeleetAttributes.html">Delete</a> -
which are self explanatory - and
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_Query.html">Query</a>,
which accepts attribute predicates and boolean operators in a
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_Query.html#SDB_API_Query_QueryExpressionSyntax_Example2">custom string query format</a>
and returns all matching items.</p>

<p>Items are partitioned into
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/Introduction.html#KeyConcepts">domains</a>.
An item's name must only be unique within its domain. Similarly, queries only
return items from a single domain.</p>

<p><a name="tuplespaces"></a></p>

<h3><a href="/space/amazon+simpledb+thoughts#tuplespaces"><img src="/Icon-Permalink.png" alt="Icon-Permalink.png" title="" /></a> Tuplespaces!</h3>

<p>One of the coolest things about SimpleDB is that its interface is pure
<a href="http://en.wikipedia.org/wiki/Tuple_space">tuplespaces</a>, also known as
<a href="http://en.wikipedia.org/wiki/Linda_%28coordination_language%29">Linda</a>.
(Thanks to <a href="http://www.somebits.com/weblog/">Nelson</a>, who was one of the first
people to point out this huge piece of SimpleDB's provenance.)</p>

<p>The tuplespaces concept never spread too far beyond of research, but I've
always loved it, and I've had <a href="space/ideas#tuplespaces">"build tuplespaces on top of a
DHT"</a> on my <a href="space/ideas">list of project ideas</a> for
years. As long as I get to play with it, I don't care if Amazon beat me to the
punch. After all, they can afford a few more servers and sysadmins than I can.</p>

<p>There are at least a couple noticeable differences between SimpleDB and
standard tuplespaces interfaces. First, most tuplespace implementations only
support equals and wildcard query operators. SimpleDB offers
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_Query.html#SDB_API_Query_QueryExpressionSyntax_Example2">inequality, prefix, and boolean operators</a>,
which it probably supports with extra secondary indices.</p>

<p>Second, most tuplespaces implementations offer at least limited support for
transactions, in the form of an atomic "update" operation that can remove
existing tuples and add new ones. SimpleDB has no such operation, nor any
other support for transactions.</p>

<p><a name="queries"></a></p>

<h3><a href="/space/amazon+simpledb+thoughts#queries"><img src="/Icon-Permalink.png" alt="Icon-Permalink.png" title="" /></a> Queries</h3>

<p>SimpleDB uses a a minimal, string-based query language. It's
best described by example. Here's one
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_Query.html#SDB_API_Query_QueryExpressionSyntax_Example2">from the docs</a>
that will return all blue items that cost less than 14.99:</p>

<div class='p-shadow'><pre><code>"['Price' &lt; '14.99'] intersection ['Color' = 'Blue']"
</code></pre></div>

<p><br class='clearing' /><br class="clearing" />
It's interesting that Amazon went with a custom, proprietary query language,
as opposed to a subset of SQL like Facebook's
<a href="http://developers.facebook.com/documentation.php?doc=fql">FQL</a>. Again, it
almost certainly made it easier for them to develop, but it raises the
learning curve for developers, not to mention contributing to lock-in somewhat.</p>

<p>Luckily, since all attribute values are strings, they avoid the issue of
serializing non-string values and operands. I've used a decent number of ORMs
and database libraries, and this always tends to be a wart. It can definitely
be done safely, and somewhat cleanly, but it's always awkward.</p>

<p>Apart from the query language, there's no support for joins, full text search,
or sorting query results. I doubt I'd miss joins, but I'd definitely miss full
text search and sorting. I expect that sorting alone will be one of the
largest pain points for developers who try to use SimpleDB as a replacement
for a standard RDBMS.</p>

<p>Finally, separate from the <a href="/space/amazon+simpledb+thoughts#pricing">utilization-based pricing</a>
SimpleDB imposes a hard deadline on query execution time. If a query takes
longer than 5 seconds, it's
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_Query.html#SDB_API_Query_Description">cut off</a>.
Tough love, but reasonable.</p>

<p><a name="attributes"></a></p>

<h3><a href="/space/amazon+simpledb+thoughts#attributes"><img src="/Icon-Permalink.png" alt="Icon-Permalink.png" title="" /></a> Attributes and ordering</h3>

<p>Like in tuplespaces, SimpleDB attribute names and values are untyped strings, so
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_Query.html#d0e3182">comparison is always lexicographic</a>.
That simplicity is endearing and attractive at first glance, and it almost
certainly made SimpleDB easier for Amazon to develop. Unfortunately, it causes
problems for numbers, dates, and composite types like points, which aren't
compared lexicographically.</p>

<p>To their credit, Amazon does explain how to <a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_Query.html#d0e3182">zero-pad numbers and offset
negative
numbers</a>,
and their libraries include code that handles these operations.
Still, no matter how you look at it, jumping through those kinds of hoops is
ugly and awkward, for both data access and presentation. Worse, developers
will need to write custom code to map to/from lexicographic ordering for any
non-numeric types, such as points and dates. It doesn't help that
the SimpleDB docs themselves have lots of
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_Query.html#d0e3182">examples</a>
of numeric comparisons that aren't offset or zero-padded.</p>

<p>Apart from ordering, attribute values are limited to 1024 characters, which is
way too low. I can understand that they want to encourage developers to use S3
for binary data, but articles, comments, and other text data is often much
larger than 1024 characters. It would be infeasible for many apps to store and
access that data separately from the rest of their data, which could prevent a
number of applications from using SimpleDB as their only structured storage
engine.</p>

<p>Finally, it's worth noting that
<a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_API_Query.html#d0e3182">all strings are UTF-8</a>,
including domains, item names, attribute names, and values. It's what you'd
expect, but it took me a fairly long time to find that tidbit in the docs.</p>

<p><a name="scaling"></a></p>

<h3><a href="/space/amazon+simpledb+thoughts#scaling"><img src="/Icon-Permalink.png" alt="Icon-Permalink.png" title="" /></a> Scaling and Dynamo</h3>

<p><ins style="text-decoration: none"><em><strong>Update</strong>: After I originally wrote
this, I learned from a reliable source that SimpleDB is probably</em> not <em>based
on Dynamo.</em></ins></p>

<p>SimpleDB <del>is almost certainly</del> <ins>originally seemed to be</ins>
based on Amazon's
<a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">Dynamo</a>,
a distributed hash table that's highly replicated and available in exchange 
for a relatively low churn tolerance. That link is from Amazon's CTO Werner
Vogel's <a href="http://www.allthingsdistributed.com/">blog</a>, where he
<a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">said</a>:</p>

<blockquote>
  <p>Let me emphasize the internal technology part before it gets misunderstood:
  Dynamo is <strong>not</strong> directly exposed externally as a web service; however, Dynamo
  and similar Amazon technologies are used to power parts of our Amazon Web
  Services, such as S3.</p>
</blockquote>

<p>Dynamo's key characteristic is that it really is just a DHT, so its only
operations are put, get, and delete. In particular, it doesn't provide
secondary indices. So, <ins>if SimpleDB was</ins> based on Dynamo, how would
SimpleDB be queries executed? Maybe they'd use a modified full text
index...but then you'd expect SimpleDB to offer full text search, which it
doesn't. Hmm.</p>

<p>One useful hint is that SimpleDB only guarantees <a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/SDB_Glossary.html#glossary_EventualConsistency"><em>eventual</em>
consistency</a>.
(Thanks to <a href="http://keeda.stanford.edu/~kash/">Ken</a> for pointing this out.)
Evidently, items and indices are replicated, and the replicas are updated
asynchronously. That's a big, big caveat for developers, but it helps us start
to reverse engineer the architecture of their storage and indexing engine.</p>

<p>Personally, I wonder if SimpleDB's indexing is based on a
<del>full text</del> <ins>conventional</ins> index that's augmented to support
structured data, similar to <a href="http://base.google.com/">Google Base</a> or <a href="http://labs.ebay.com/erlresearchfocus.html#search">eBay's
search engine</a>. if so, I'm
sure Amazon has its reasons for not (yet) offering full text search over
SimpleDB domains.</p>

<p><a name="pricing"></a></p>

<h3><a href="/space/amazon+simpledb+thoughts#pricing"><img src="/Icon-Permalink.png" alt="Icon-Permalink.png" title="" /></a> Usage-based pricing</h3>

<p>The <a href="http://aws.amazon.com/simpledb">pricing model</a> for SimpleDB is very
interesting. Similar to <a href="http://aws.amazon.com/s3">S3</a> and
<a href="http://aws.amazon.com/ec2">EC2</a>, SimpleDB charges for bandwidth and usage.
However, SimpleDB also charges for <em>machine utilization</em>, measured in
<a href="http://www.amazon.com/b/ref=sc_fe_c_0_342335011_4?ie=UTF8&amp;node=393699011&amp;no=342335011&amp;me=A36L942TSJ2AJA#sdb14">normalized CPU-hours</a>.</p>

<p>This makes sense from a cost modeling perspective, but it's surprisingly hard
to implement in the storage engine. The particularly impressive part is that
SimpleDB includes machine utilization in the response to <em>every API call</em>.
Wow. Measuring utilization can be hard in general, but it's even harder in
real time.</p>

<p><strong>See also</strong>:</p>

<ul>
<li><a href="/space/facebook+data+store+api+thoughts">Facebook Data Store API Thoughts</a></li>
</ul>

  </content>

  <rdf:Seq>

<rdf:li>
<rdf:Description rdf:about="#1198159512.07">
  <dc:source> http://snarfed.org/ </dc:source>
  <dc:title> Amazon SimpleDB thoughts </dc:title>
  <dc:creator> Philipp Lenssen </dc:creator>
  <dc:date> cmt_pubDate </dc:date>
  <dc:format> text/html </dc:format>

  <content>
    Your tuplespaces link doesn't seem to go anywhere in specific...
  </content>
</rdf:Description>
</rdf:li>

<rdf:li>
<rdf:Description rdf:about="#1198174048.32">
  <dc:source> http://snarfed.org/ </dc:source>
  <dc:title> Amazon SimpleDB thoughts </dc:title>
  <dc:creator> ryan </dc:creator>
  <dc:date> cmt_pubDate </dc:date>
  <dc:format> text/html </dc:format>

  <content>
    thanks, fixed.
  </content>
</rdf:Description>
</rdf:li>

<rdf:li>
<rdf:Description rdf:about="#1198228646.14">
  <dc:source> http://snarfed.org/ </dc:source>
  <dc:title> Amazon SimpleDB thoughts </dc:title>
  <dc:creator> Pedro Marban </dc:creator>
  <dc:date> cmt_pubDate </dc:date>
  <dc:format> text/html </dc:format>

  <content>
    God post. It aid in clearing my doubts about SimpleDB.<br />
<br />
I think that the big value of SimpleDB is the concept (a limitlessly scalable DB accessible everywhere by webservices) not the implementation. I believe that soon other big players will come with similar proposals.
  </content>
</rdf:Description>
</rdf:li>

<rdf:li>
<rdf:Description rdf:about="#1198243020.15">
  <dc:source> http://snarfed.org/ </dc:source>
  <dc:title> Amazon SimpleDB thoughts </dc:title>
  <dc:creator> Eric Litman </dc:creator>
  <dc:date> cmt_pubDate </dc:date>
  <dc:format> text/html </dc:format>

  <content>
    Nice. Would love to see a follow-up with your thoughts after some usage.
  </content>
</rdf:Description>
</rdf:li>

<rdf:li>
<rdf:Description rdf:about="#1198511613.65">
  <dc:source> http://snarfed.org/ </dc:source>
  <dc:title> Amazon SimpleDB thoughts </dc:title>
  <dc:creator> Sam Figueroa </dc:creator>
  <dc:date> cmt_pubDate </dc:date>
  <dc:format> text/html </dc:format>

  <content>
    I can't see myself using the service anytime soon. The limitations it imposes feel way to strong to me. The lack for specific datatypes makes me feel cold, like it did going from Java to PHP. No ordering? What's up with that? If they can do EC2 and S3 they sure could have managed to implement ordering. This would make a many situation very frustrating to code. If they don't want to stick to a SQL subset, I can live with that. But come on. I know they are calling it SimpleDB for a reason, but I can't see modern web apps being able to utilize this as their primary database. <br />
<br />
Could somebody point out to me some obvious applications of SimpleDB?
  </content>
</rdf:Description>
</rdf:li>

<rdf:li>
<rdf:Description rdf:about="#1198515668.57">
  <dc:source> http://snarfed.org/ </dc:source>
  <dc:title> Amazon SimpleDB thoughts </dc:title>
  <dc:creator> JB </dc:creator>
  <dc:date> cmt_pubDate </dc:date>
  <dc:format> text/html </dc:format>

  <content>
    Very useful analysis - thanks. One aspect of sdb which I haven't seen discussed is backup/data portability. What tools, if any, will Amazon provide for this? What happens when a site built on sdb hits it big and decides they can do a better job bringing the data in-house? What about sites that just want backup in case AWS suffers a catastrophic failure?
  </content>
</rdf:Description>
</rdf:li>

<rdf:li>
<rdf:Description rdf:about="#1201643024.72">
  <dc:source> http://snarfed.org/ </dc:source>
  <dc:title> Amazon SimpleDB thoughts </dc:title>
  <dc:creator> Ralf Westphal </dc:creator>
  <dc:date> cmt_pubDate </dc:date>
  <dc:format> text/html </dc:format>

  <content>
    @Sam: SimpleDB should not be confused with a "regular" database. Firstly it´s in the cloud and thus less responsive (it just takes longer for the signals to travel). Secondly its API/data model is different from the relational data model. So if you´re used to mass data handling with single SQL statements, then SimpleDB is different. It just sports a sort of "select".<br />
<br />
Now, what can you do with the features left in SimpleDB - or to put it differently: which make SimpleDB shine?<br />
<br />
It´s a very simple API, so put on top of it your favorite higher level API, e.g. retrieve complete items from a query instead of just item names, if you like. Or model higher level data structures (like list, trees) with the simple items. SimpleDB is good at serving several concurrent request, e.g. retrieve all children of a root note in parallel.<br />
<br />
This already hints at how naturally you can map object models to items. No inverse references using foreign keys, but "forward pointers" from parents to children. Think "easy objects graph serialization".<br />
<br />
This hints at object/data caching. SimpleDB is not supposed to replace your local long term storage, but to ease short term storage, to foster communication between collaborating parties.<br />
<br />
If you like (and are a .NET programmer), check out this implementation of the SimpleDB API for local and remote use: NSimpleDB, <a href="http://code.google.com/p/nsimpledb/">http://code.google.com/p/nsimpledb/</a>. It might help to clear things up for you.<br />
<br />
-Ralf
  </content>
</rdf:Description>
</rdf:li>

  </rdf:Seq>
</rdf:Description>
</rdf:RDF>
