DynamoDB vs Fauna: Terminology and features
Taro Woollet-Chiba|Aug 11th, 2020
Introduction
DynamoDB’s roots originate from an Amazon internal project called
"Dynamo", a database built to address the many growing e-commerce needs
of the online shopping service. Inspired by Dynamo and its techniques,
DynamoDB provides a database with most operations and management
automated behind the scenes. Of the many NoSQL databases out there,
perhaps DynamoDB is the closest to Fauna, where both databases share a
similar value proposition as "serverless databases". While DynamoDB’s
on-demand pricing/scaling model lends itself to the "serverless"
philosophy, it misses the mark on developer experience when it comes to
multi-region transactions, schema flexibility, geo-distribution, burst
scaling, and required developer operations.
As for business viability: evolving a schema overtime with DynamoDB can
be an arduous experience given the lack of support for relations and
joins. As applications grow, they will likely encounter significant
technical debt which typically rears its head in features which cannot
be changed without recreating the entire table. DynamoDB better serve’s
mature and proven businesses, where all data and CRUD is well understood
ahead of time. Furthermore, DynamoDB’s query API doesn’t provide
extensive functionality for computations, aggregates, etc., requiring
either long-term storage (and updating) of such calculated results or a
costly layer of server side logic (along with an increased surface area
for bugs).
With the recent wave of serverless and GraphQL adoption, Fauna seeks
to be an uncompromising data API for client-serverless applications. To
further elaborate, Fauna offers an out-of-the-box GraphQL API and
functional query language, on top of a strongly consistent distributed
database engine capable of modeling relational, document, graph, and
time-series data. Fauna's value proposition improves on traditional
database offerings, by converting the underlying database management
infrastructure into a Data API that is well-suited for direct
client-side usage, allowing backend developers to focus on the
server-side logic which matters the most. The notion of database
developer operations does not exist with Fauna. Developers are allowed
to fully focus on application specific work, without the burden of
maintaining throughput and capacity on an API and database.
Terminology
For clarity, here the terminology that each technology uses to describe
itself:
Query APIs
Rather than replace SQL with another query language, the DynamoDB
creators opted for a simple API with a handful of
operations.
Specifically, the API lets developers create and manage tables along
with their indexes, perform CRUD operations, stream data
changes/mutations, and finally, execute CRUD operations within ACID
transactions. While DynamoDB doesn’t support complex querying, this
tradeoff leads to reduced latency on a broad set of operations, as the
database doesn’t need to process or interpret a query language.
Fauna offers its take on a complex query language in the form of the
Fauna Query Language (FQL); a flexible, highly-composable, and expressive query
language. While FQL is distinctly different from SQL, developers
familiar with the popular functional programming paradigm, will feel
right at home. Readers well-versed in SQL might be interested in this
in-depth explanation of FQL,
written specifically for SQL users.
As mentioned earlier, Fauna offers an out-of-the-box GraphQL API
with the same technical promises originally made for Fauna:
zero-maintenance, globally-distributed, and serverless. GraphQL’s
flexible abstraction on data, enables smooth frontend and backend
cohesiveness; where clients/frontend-developers query for the exact data
they need, often without having to request the backend team to write
server-side or backend code.
Indexes
Both Fauna and DynamoDB support indexes which can store subsets of
data (i.e. projection), optionally, with a specified order and/or
uniqueness constraint. However, this is where the similarities end, as
Fauna indexes can perform and persist computations, combine data from
multiple collections, ensure strict
serializability for
reads/writes, and more. To further elaborate, Fauna's indexes can
handle multiple sources, sort fields, match fields, and returned values.
This differs from DynamoDB where indexes are constructed for a single
table and can only match with one attribute, along with the ability to
sort on one attribute.
Given its indexing flexibility and support for relational data, Fauna
is a powerful tool for the evolution of applications over time.
When using indexes in DynamoDB, careful consideration and forethought is
required ahead of time to avoid technical debt, unnecessary expenses, or
throttling in the worst case. When strongly consistent queries are
desired in an index, DynamoDB allows for a maximum of 5 Local Secondary
Indexes (LSI), each of which enable sorting on an attribute specified at
index creation (i.e. sort key). Developers should know that Local
Secondary Indexes can only be created at the same time that a table is
created, and cannot be deleted (without deleting the table) afterwards;
no such quantity or creation limits exist for Fauna indexes.
Should eventually consistent queries suffice, DynamoDB offers Global
Secondary Indexes (GSI), which allow for querying on a different primary
key (and optionally, a different sort key). As for billing, all write
operations to a DynamoDB table will be multiplied and applied to
relevant indexes, resulting in elevated expenses; Fauna doesn’t charge
for index entry updates.
Finally, while GSI throughput is separate from tables, LSI throughput is
not. Users of DynamoDB must keep in mind that LSIs are multipliers of
traffic, resulting in more dramatic peaks. LSI usage can cripple both
Provisioned and On-Demand tables if not properly planned for by manually
elevating traffic peaks or adjusting an Auto Scaling plan. This differs
from Fauna, where all accommodations for traffic and throughput are
not the user’s concern, rather, these factors are handled automatically
behind the scenes.
Schema design
DynamoDB is largely schemaless, where the few predefined details are
the partition key and/or sort key, along with any Local Secondary
Indexes. Like many of its NoSQL siblings, DynamoDB lacks support for
relational data and is best designed with a denormalized schema in mind,
improving read and write performance over traditional relational
databases with numerous complex relationships. To satisfy relational
needs, DynamoDB places heavy emphasis on denormalization and
single-table design, where developers are responsible for maintaining
some form of referential integrity among shared/related data. While
DynamoDB and its best practices lend itself well to mature applications
with proven scope and data needs, it does not bode well for extensive
schema evolution.
DynamoDB’s value is best realized when migrating from a
deployed/overloaded database that already satisfies a product or
project’s needs. Even with extensive planning and design ahead of time,
it’s far from uncommon to completely iterate on a database model due to
a variety of external factors (e.g. a business pivot); if not a complete
redesign, then usually an iteration on partition keys to avoid "hot"
partitions. Developers risk significant technical debt if they build an
application and schema using DynamoDB without confidence in their
understanding (and longevity of their understanding) of an application’s
scope.
Fauna in contrast, inherits much of the iterable relational data
modeling of traditional RDBMS, while also meeting the scaling promises
of NoSQL (more on this later). While denormalization is a perfectly
viable approach, developers are encouraged to take advantage of
Fauna's first-class relational and referential support. In addition to
document and relational data, Fauna also accommodates graph-like
patterns and time-series data, along with advanced multi-tenant
capabilities allowing for parent-child relationships among databases.
With Fauna, schema iteration is very forgiving, unlike DynamoDB, and
provides relational data capabilities which developers already know and
love.
Fauna also features built-in GraphQL support. A growing trend in
backend engineering is GraphQL schema-first development. Fauna
supports uploading GraphQL schemas which generate Fauna resources
behind the scenes. Describing relations, whether they be 1:1, 1:N, etc.,
with Fauna's GraphQL directives is a breeze, as described
here.
Transactional model
Support for serializable multi-item read and write transactions exists
in DynamoDB with the caveat that they’re only ACID-compliant within the
region they occur in. In particular, developers using multi-region
deployments of DynamoDB may witness dirty reads or phantom reads among
concurrent transactions in separate regions; writes affecting the same
data in separate regions are resolved using "last writer wins"
reconciliation, where DynamoDB makes a best effort to determine the last
writer. The region limitation is fine for applications which depend on a
single-region backend, however when utilizing Global Tables (a
multi-region deployment of DynamoDB), successful transactions in one
replica table will not provide ACID guarantees for other replica tables,
due to the delay between replication/propagation of changes. Such a
limitation is not uncommon among applications today, where the solution
is usually to direct all transactions to a single region, or to store
data based on the location of its origin (e.g. a ride-sharing app might
store San Francisco trip/user data in a us-west-2 database, and nowhere
else). Keep in mind however, that DynamoDB Global Tables do not allow
for designating partial replication among regions (i.e. all replica
tables will eventually contain the same data); instead, developers
themselves must deploy individual DynamoDB instances in each desired
region.
With global distribution in mind, Fauna offers strictly serializable
transactions, where the strictness provides the additional guarantee of
a real-time serial order for transactions. This is a critical
distinction for geo-distribution, where variance in the order of
propagated transactions can impact the final state between replicas.
Fauna achieves this degree of isolation and distribution with heavy
inspiration from
Calvin,
a cutting edge approach to distributed transactions and replications.
Consistency models
By default, DynamoDB uses eventually consistent reads unless specified
otherwise. Strong consistency is available to some degree but only
within the context of a single region (i.e. strongly consistent reads
only consider writes from the region they read from). This may cause
confusion when working with Global Tables, as developers are allowed to
make read requests parameterized with "strong consistency" in one
region, while writes from another region will eventually be propagated
(often in a second or less). Additionally, strongly consistent reads can
result in throttling if developers aren’t careful, as only the leader
node can satisfy strongly consistent reads; DynamoDB leader nodes are
also the only node responsible for writes in a partition (unlike Fauna
where every node is a Query Coordinator and can perform writes, etc.),
thus they are the most trafficked node and critical to the function of a
partition.
Fauna offers strong forms of consistency and isolation across the
board in all operations, not only within individual regions but
globally. By default, indexes and read-only operations will maintain
snapshot or serializable isolation (for lower latencies), however
developers are free to re-introduce strict serializability should they
desire it. Additionally, indexes which aren’t serialized will
re-evaluate their relevant documents (providing strong consistency and
isolation) when used within a transaction. Essentially, Fauna requires
all writes to be strictly serializable for strong consistency across the
globe, while also letting applications utilize slightly weaker forms of
isolation and consistency for less-critical functionality, enabling
faster reads; should the client or developer need stronger consistency
in a read, they have the option to introduce strict serializability.
This approach protects against inconsistent write results/transactions,
which are far more consequential to a business than a stale read. Again,
strong consistency is available as an option for indexes and read-only
operations. For more information regarding consistency models and their
tradeoffs, read this piece written by one of the original Calvin
authors.
Storage
It remains unclear exactly what storage engines are utilized under the
hood of DynamoDB today. At the time of the Dynamo whitepaper’s
publishing,
Dynamo utilized a handful of storage engines (most notably the Berkeley
Database (BDB) Transactional Data Store and MySQL) through a "pluggable
persistence component". Many years have passed since the paper’s
publishing however, and there’s no public documentation guaranteeing
these storage engines are still in use. Compression, while an AWS
recommended
practice
for storing large attributes, is not natively implemented.
Fauna uses an LSM tree-based storage engine that provides LZ4
compression. By default, Fauna stores the last 30 days of history for
each collection (can be as long as desired or even indefinite), and
temporal queries may use any point-in-time snapshot within that history.
These temporal queries also offer valuable rollback capabilities for
applications and their backends, a luxury which often isn’t afforded
outside of a full blown database recovery. Finally, temporal storage
provides simple recovery after accidental data loss and streamlined
integration debugging.
Security
Like many AWS products, DynamoDB inherits the excellent AWS Identity and
Access Management (IAM) feature. With it, developers can specify coarse
and granular user permissions, applicable to the entire DynamoDB API.
Furthermore, developers can specify conditions which must be met before
granting permissions (e.g. an IAM policy only grants access to items
where the client’s UserID matches an item’s UserId). Authentication and
authorization aside, DynamoDB also offers encryption at rest using
256-bit Advanced Encryption Standard (AES-256) and three decryption key
types, each with varying customer control. Finally, DynamoDB’s security
and compliance is audited by several
third-parties,
as is standard for many AWS products.
Fauna offers varying levels of access control, however the requirement
of authentication is constant; Fauna cannot be left accidentally
unprotected. Developers have the freedom to define coarse role-based
permissions, more specific identity-based permissions, and even finer
attribute-based access control (ABAC). With reserved FQL functions,
clients can easily authenticate (e.g. Login) and be provided secure
access tokens for database connection. Lastly, Fauna's multi-tenancy
features provide even further protection through the natural database
hierarchies that spawn themselves (i.e. authenticated users of a
database cannot necessarily access a parent or sibling database). With
its out-of-the-box tooling, Fauna meets the authentication and
authorization requirements for a wide variety of applications,
eliminating the need for equivalent custom solutions.
Fault tolerance
DynamoDB relies on AWS Availability Zones (AZ), replication, long-term
storage to protect against data loss or service failure. Table
partitions consist of three nodes stored in separate AZs, one of which
is a Leader Node capable of accepting writes and strongly-consistent
reads, while the remaining two nodes simply provide additional
replication storage and eventually consistent reads. Customers of
DynamoDB should know that Leader Nodes are potential bottlenecks in
their application, should they perform too many writes and/or strongly
consistent reads to a partition. This differs from Fauna where every
node’s read and write capabilities are equal, thus no single node can be
a bottleneck.
Fauna relies on its unique transactional protocol derived from
Calvin
and its multi-cloud topology to achieve fault tolerance.
Because Fauna is strongly consistent, all transactions are first made
durable in the transaction log before being applied to data nodes, so
that hardware outages don’t affect data correctness or data loss. If a
node is down, the length of the transaction log is extended so that it
can apply the transactions it missed when it comes back online. In
Fauna, as long as you receive a commit for a transaction, you are
guaranteed total safety against data loss.
Also, within Fauna's architecture, a functioning system must contain
at least three replicas. A logical replica contains numerous nodes. All
nodes can simultaneously serve as a query coordinator, data replica, and
log replica, with no individual node being a single point of failure.
Should a node fail or perform poorly as a data replica, temporarily or
permanently, Fauna will smartly redirect reads to a non-local copy
until that node becomes available again. Because Fauna's nodes are
distributed across multiple cloud platforms, users are shielded from
cloud provider outages as well as regional outages within a single
provider.
Scalability
Both DynamoDB and Fauna provide abstractions over traditional server
hardware specs with "serverless" pricing and consumption models. Along
with these new serverless concepts, both databases aim to absorb the
responsibility of scaling to customer needs, however DynamoDB still
leaves significant operational work and overhead for customers. While
DynamoDB is a managed service, you remain responsible for the bulk of
the operational heavy lifting. If you’re using DynamoDB, you have to
think upfront about volume and scale, continuously manage these
parameters, and explicitly provision your capacity to meet your needs.
Understanding both the consumption model and data distribution concepts
is particularly critical when using DynamoDB, as even though there are
scaling features to better accommodate traffic, they all expose windows
where throttling or failure is possible; in particular, developers
should be familiar with DynamoDB’s Read Capacity Units (RCUs), Write
Capacity Units (WCUs), capacity modes, table partitioning behavior,
partition limits, and "hot" partitions.
In contrast, Fauna is built to auto-scale without any input from
customers. You are never responsible for provisioning capacity, or
tweaking parameters to achieve a desired level of throughput. It works
on a utility model, much like your electrical outlet. Plug-in and go,
never worry about running out of resources in peak times. Fauna
achieves this by maintaining several consistent, full replicas of
customer data. A replica consists of several geographically-aware nodes,
each with a partition of the full dataset in a single local environment.
As mentioned earlier, all nodes share the same set of capabilities
(query coordinator, data replica, and log replica), each able to perform
reads and writes. Fauna scales its services behind the scenes by
adding more full-copy replicas or adding more nodes to a single replica,
which requires no additional downtime, manual configuration, or changes
to drivers. As a customer of Fauna, you can assume infinite capacity
and march on.
Fauna is multi-region by default (in fact, it’s global), with
uncompromising consistency and isolation, as was elaborated on earlier.
Provisioning throughput or capacity is not a concern nor a reality for
customers, where the only information of relevance is the
pricing/consumption model. Specifically, Fauna's consumption model
primarily focuses on read and write ops, which are almost identical to
DynamoDB’s RRUs and WRUs, where they’re simply a metric for representing
on-demand usage.
Operations
While the responsibilities of traditional database operations have been
abstracted away, DynamoDB customers still have a handful of
DynamoDB-specific responsibilities, with the two major items being (1)
designing an optimal partition key (including read/write sharding if
needed), and (2) specifying one of two capacity modes along with their
parameters. Developers can implement and tweak DynamoDB deployments
through the AWS CLI, AWS Management Console, AWS SDK, NoSQL Workbench,
or directly through the DynamoDB low-level API.
For the responsibilities which don’t fall under the customer’s
jurisdiction, many fundamental operations (e.g. partition splitting) are
performed by DynamoDB’s internally developed tool, Auto Admin.
Additionally, DynamoDB is known to rely on several AWS services to
achieve certain functionality (e.g. Auto Scaling uses CloudWatch, SNS,
etc.), though the exact scope of this is unknown.
Further elaborating on Auto Admin, the tool is akin to an automated
database administrator (DBA), responsible for managing DynamoDB’s
Partition Metadata System, Partition Repair, Table Provisioning, and
more. Although it isn’t consistently documented, it appears that Auto
Admin shares some partition splitting and provisioning functionality
with DynamoDB’s Adaptive Capacity, where the most obvious example of
this is Adaptive Capacity’s ability to isolate frequently accessed
items.
Much of Fauna's infrastructure management relies on the same
Calvin-inspired protocol and consistency mechanisms provided to
customers, with the addition of some internal process scheduling.
Changes to a deployment are performed within the safety of a single
transaction, where the final state is once again evaluated by Fauna,
before being applied. The internal transactions used for scaling Fauna
deployments allow for easy-to-reason-with and seamless migration of data
between nodes.
In conclusion, it’s worth highlighting that developer operations occur
seamlessly with zero downtime and user maintenance; developers are free
to focus on what matters most, building an excellent application.
Jepsen tests
Jepsen tests along with their associated tools and writing, are widely
respected among database engineers. The results of a Jepsen test aren’t
a simple pass-fail, but are more akin to diagnoses and postmortems;
specifically, comparing a database’s performance to its value
propositions and elaborating on promises that aren’t sufficiently met.
Although DynamoDB lacks an official Jepsen test, it’s one of the most
popular NoSQL databases in use today and as an AWS product, is likely to
be heavily audited, tested, and scrutinized.
Fauna's goal with Jepsen has been to conduct an exhaustive
investigation to identify and fix any errors in the implementation,
integrate the resulting tests into continuous integration, and to have a
trusted third party verify both public consistency claims and the
effectiveness of the core architecture. The current Fauna Jepsen
report, which
covers versions 2.5.4 and 2.6.0 and represents three months of detailed
work, clearly shows Fauna's commitment to providing users with a
seamlessly-correct datastore.
"Fauna is based on peer-reviewed research into transactional systems, combining Calvin’s cross-shard transactional protocol with Raft’s consensus system for individual shards. We believe Fauna's approach is fundamentally sound… Calvin-based systems like Fauna could play an important future role in the distributed database landscape."
Summary
DynamoDB aims to provide a fully managed, multi-region, and
multi-master, serverless database for internet-scale applications.
However, each one of these value propositions has a caveat. While
traditional database resources are managed for customers, provisioning
and/or designing around DynamoDB’s abstract capacity units is still
required. Multi-region and multi-master deployment is available, but at
the cost of strong consistency and isolation. Serverless scaling is
achievable, but only if developers design an optimal partition key and
strategize throughput escalation.
DynamoDB schemas often have little room to grow given their lack of
support for relational data (an almost essential function for evolving
applications); the heavy-emphasis on single-table design to support
relational-like access patterns, leaves customers with the
responsibility of maintaining the correctness of denormalized data.
Finally, customers are required to build their applications with
numerous inconsistencies, conflicts, and race-conditions in mind, or
risk producing odd and unpredictable errors caused by DynamoDB.
On the other hand, Fauna promises a highly flexible, zero-maintenance,
globally-distributed datastore as an API; with first-class support for
GraphQL, and a data model that lets you work with both documents and
relations, Fauna simplifies your initial development as well as
ongoing evolution of your application; there is no need to write
application logic to handle odd bugs, errors, and race-conditions found
in many databases with poor consistency and isolation. Transactional by
design, Fauna ensures that you’re not locked into limitations when
using transactions — your data is always consistent, and there are no
constraints placed on you to shard your data in a specific way, or limit
the number of keys you use. With Fauna you never have to worry about
typical database tasks such as provisioning, sharding, correctness,
replication, etc. Consequently, developers find Fauna more flexible to
use, and are completely free from backend heavy lifting that is required
when using DynamoDB.
Appendix
Deep Dive into DynamoDB Scalability Architecture
Behind the scenes DynamoDB distributes data and throughput for a table
among partitions, each outfitted with 10GB of storage. Data distribution
to partitions relies on a table’s partition key, and throughput is
specified with Read Capacity Units (RCUs) and Write Capacity Units
(WCUs); where RCUs and WCUs specify the upper limits of a partition’s
read and write capacity per second (a single RCU allows for either 2
eventually consistent reads or 1 strongly consistent read). Note that
while RCUs and WCUs are specified at the table-level, in actuality,
DynamoDB does not directly limit throughput for tables. Instead, these
limits apply to a table’s underlying partitions, where RCUs and WCUs are
evenly distributed among all. This even distribution of throughput used
to be a common concern, as it often led to over-provisioning to meet the
needs of "hot" partitions (partitions with disproportionate traffic
compared to their peers).
DynamoDB has both Burst Capacity and Adaptive Capacity to address hot
partition traffic. Burst Capacity utilizes unused throughput from the
past 5 minutes to meet sudden spikes in traffic, and Adaptive Capacity
borrows throughput from partition peers for sustained increases in
traffic. DynamoDB has also extended Adaptive Capacity’s feature set with
the ability to isolate frequently accessed items in their own
partitions. Note that partitions have a hard limit of 3000 RCUs and 1000
WCUs, meaning a frequently accessed item which is isolated in its own
partition cannot satisfy an access pattern that exceeds the partition’s
hard limits. This is unlikely to be an issue for most applications,
however should it arise, Global Tables or a similar implementation can
resolve it (strongly consistent reads will still be limited however).
Despite DynamoDB releasing several features and improvements targeting
hot partitions, they still have a negative
impact
on a table’s performance, though this consequence is not as significant
as it once was. Essentially, the scaling performance of DynamoDB
revolves around the design quality of a partition
key.
To make throughput scaling on tables easier, DynamoDB offers On-Demand
Mode and Auto Scaling (under the Provisioned Mode). On-Demand Mode is
DynamoDB’s more recent and abstract take on hands-free scaling of
throughput, where traffic is well accommodated for, up to double the
table’s previously recorded peak. DynamoDB may require 30 minutes before
adjusting to a new peak, therefore developers should be wary of traffic
which surpasses twice their previous peak, as throttling can occur. One
final note regarding On-Demand Mode is the distinct pricing model, where
sustained traffic could result in costs up to 6.94x that of provisioned
capacity.
Despite being DynamoDB’s best solution for rapid and automatic scaling,
the significantly higher cost suggests On-Demand Mode is best suited
only for applications which have unpredictable or unknown workloads.
Auto Scaling, which is only available under the Provisioned Mode, is
DynamoDB’s first iteration on convenient throughput scaling. It raises
or lowers read and write capacity based on sustained usage, leaving
spikes in traffic to be handled by a partition’s Burst and Adaptive
Capacity features.
Additionally, it’s the customer’s responsibility to specify upper and
lower bounds to throughput, along with a target utilization rate to
specify a consumption ratio which should consistently be met (e.g. 70%
of throughput should be used as often as possible). While Auto Scaling
and Provisioned Mode are more cost-efficient than DynamoDB’s On-Demand
Mode, they don’t handle unforeseen spikes in traffic (which surpass the
table’s current overall throughput capacity) as well as On-Demand Mode
does. This is due to the watermarks which trigger Auto Scaling’s
functionality requiring sustained increases or decreases in traffic. In
summary, developers have many parameters and provisioning ops they must
keep in mind while using DynamoDB, despite the layers of abstractions
(e.g. RCUs and WCUs).
Regarding geo-distribution, DynamoDB offers Global Tables: multi-region
deployments of replica tables. Each replica table is capable of
accepting both reads and writes, with writes eventually replicating to
sibling replica tables (usually under a second). Conflicts in Global
Tables are resolved with a "last writer wins" strategy, where all
replica tables agree on the latest write and update accordingly.
Customers ought to keep in mind that replication under Global Tables
will increase throughput traffic among the replica tables, with the
primary concern being WCUs as they’re the lower throughput limit (1000
WCUs vs 3000 RCUs).
If you enjoyed our blog, and want to work on systems and challenges related to globally distributed systems, serverless databases, GraphQL, and Jamstack, Fauna is hiring!
Subscribe to Fauna's newsletter
Get latest blog posts, development tips & tricks, and latest learning material delivered right to your inbox.