The DynamoDB Storage Backend for Titan is a plug-in that allows you to use DynamoDB as the underlying storage layer for Titan graph database.
It is a client side solution that implements index free adjacency for fast graph traversals on top of DynamoDB.
Using DynamoDB enables you to run graph workloads without having to manage your own cluster for graph storage.
The easiest way to get started is to launch an EC2 instance running Gremlin Server with the DynamoDB Storage Backend for Titan, using the CloudFormation templates.
The DynamoDB storage backend for Titan manages the storage layer for your Titan workload.
However, the plugin does not do provisioning and managing of the client side. AWS CloudFormation template does a simple provisioning as well.
If you create a graph and Gremlin server instance with the DynamoDB Storage Backend for Titan installed, all you need to do to connect to DynamoDB is provide a principal/credential set to the default AWS credential provider chain. This can be done with an EC2 instance profile, environment variables, or the credentials file in your home folder. Finally, you need to choose a DynamoDB endpoint to connect to.
The DynamoDB Storage Backend for Titan supports batch graph with the Blueprints BatchGraph implementation and through Titan’s bulk loading configuration options.
The DynamoDB Storage Backend for Titan supports optimistic locking.
You can also use DynamoDB’s Cross-Region Replication to create read-only replicas of your graph tables in other regions.
The DynamoDB Storage Backend for Titan implements the Titan KCV Store interface so you can switch from a different storage backend to DynamoDB with minimal changes. You can also use bulk loading to copy your graph from one storage backend to the DynamoDB Storage Backend for Titan. AWS has released DynamoDB storage backend plugins for Titan versions 0.5.4 and 1.0.0.
You are charged the regular DynamoDB throughput and storage costs. There is no additional cost for using DynamoDB as the storage backend for a Titan graph workload.
Limits & Recommendations
You are limited by Titan’s limits for (2^60) for the maximum number of edges and half as many vertices in a graph, as long as you use the multiple-item model for edgestore. If you use the single-item model, the number of edges that you can store at a particular out-vertex key is limited by DynamoDB’s maximum item size, currently 400kb.
There are two different storage models for the DynamoDB Storage Backend for Titan: In the single item storage model, vertices, vertex properties, and edges are stored in one item. In the multiple item data model, vertices, vertex properties and edges are stored in different items. In both cases, edge properties are stored in the same items as the edges they correspond to.
In general, AWS recommend you use the multiple-item data model for the edgestore and graphindex tables. Otherwise, you either limit the number of edges/vertex-properties you can store for one out-vertex, or you limit the number of entities that can be indexed at a particular property name-value pair in graph index. This is due DynamoDB’s item size restriction of 400KB.
Can change the schema of a Titan graph database, however, you cannot change the schema of existing vertex/edge properties and labels.
Accessing a DynamoDB endpoint in another region than the EC2 Titan instance is possible but not recommended.
AWS also recommend running the EC2 instance in a VPC to improve network performance. The CloudFormation template performs this entire configuration for you.
Graph Databases: A graph database is a store of vertices and directed edges that connect those vertices. Both vertices and edges can have properties stored as key-value pairs. A graph database uses adjacency lists for storing edges to allow simple traversal.
Whenever connections or relationships between entities are there (e.g. social networks), a graph database is a natural choice.
To get started, you can also clone the project from the GitHub repository and start by following the Marvel and Graph-Of-The-Gods tutorials. When you’re ready to expand your testing or run in production, you can switch the backend to use the DynamoDB service.
If you define a vertex label as partitioned in the management system upon creation, you can key different subsets of the edges and vertex properties going out of a vertex at different partition keys, resulting in the virtual vertex label partitions being stored in different physical DynamoDB partitions.