Related Technologies

JanusGraph

JanusGraph is a OLPT graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time. It can use Cassandra as backend storage. JenusGraph is a fork from the TitanDB. One commentary has been - "JenusGraph picks up where TitanDB left off". JanusGraph could be the de facto reference provider implementation for TinkerPop.

Apache Tinkerpop

Apache TinkerPop is a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP). JanusGraph supports queries using apache TinkerPop. Below are some examples of TinkerPop queries.

// What are the names of the managers in the management chain going from Gremlin to the CEO?
gremlin> g.V().has("name","gremlin").repeat(in("manages")).until(has("title","ceo")).path().by("name")
// What is the distribution of job titles amongst Gremlin's collaborators?
gremlin> g.V().has("name","gremlin").as("a").out("created").in("created").where(neq("a")).groupCount().by("title")
// Get a ranking of the most relevant products for Gremlin given his purchase history.
gremlin> g.V().has("name","gremlin").out("bought").aggregate("stash").in("bought").out("bought").
    where(not(within("stash"))).groupCount().order(local).by(values,decr)

Gremlin

Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop. Gremlin works for both OLTP-based graph databases as well as OLAP-based graph processors. Gremlin's automata and functional language foundation enable Gremlin to naturally support imperative and declarative querying, host language agnosticism, user-defined domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, hybrid depth- and breadth-first evaluation.

Other graph database systems

  • Amazon Neptune - Fully-managed graph database service.
  • Bitsy - A small, fast, embeddable, durable in-memory graph database.
  • Blazegraph - RDF graph database with OLTP support.
  • CosmosDB - Microsoft's distributed OLTP graph database.
  • ChronoGraph - A versioned graph database.
  • DSEGraph - DataStax graph database with OLTP and OLAP support.
  • GRAKN.AI - Distributed OLTP/OLAP knowledge graph system.
  • Hadoop (Spark) - OLAP graph processor using Spark.
  • HGraphDB - OLTP graph database running on Apache HBase.
  • IBM Graph - OLTP graph database as a service.
  • JanusGraph - Distributed OLTP and OLAP graph database with BerkeleyDB, Apache Cassandra and Apache HBase support.
  • JanusGraph (Amazon) - The Amazon DynamoDB Storage Backend for JanusGraph.
  • Neo4j - OLTP graph database (embedded and high availability).
  • neo4j-gremlin-bolt - OLTP graph database (using Bolt Protocol).
  • OrientDB - OLTP graph database
  • Apache S2Graph - OLTP graph database running on Apache HBase.
  • Sqlg - OLTP implementation on SQL databases.
  • Stardog - RDF graph database with OLTP and OLAP support.
  • TinkerGraph - In-memory OLTP and OLAP reference implementation.
  • Titan - Distributed OLTP and OLAP graph database with BerkeleyDB, Apache Cassandra and Apache HBase support.
  • Titan (Amazon) - The Amazon DynamoDB storage backend for Titan.
  • Titan (Tupl) - The Tupl storage backend for Titan.
  • Unipop - OLTP Elasticsearch and JDBC backed graph.

Scylla - Cassandra Killer?

Scylla Is Next Generation NoSQL database that claims to give 10x performance of Cassandra. It is written in C++ ground up. It gives redis like performance. Scylla is a droping replacement of Cassandra 2.2 along with support for. Find the roadmap of Scylla here.

  • All Apache Cassandra Drivers
  • Protocols: CQL, Thrift, JMX
  • Tooling: cqlsh, nodetool, cassandra-stress, and all of Cassandra 2.2 tools
  • SSTable format

C++ applications can draw in maximum output from the available hardware resources. It is evident from the benchmark report too - to achieve the same of level of performance by a 3 node Scylla database might require as much as 30 nodes of Cassandra database. In the industry there is a push for C++ based products that take lower the hardware requirements and lower energy bills at the data center level. One drawback of C++ is that it requires significantly higher learning curve compared to Java and lack of standard libraries that Java ecosystem is blessed with.

Benchmark reports: https://www.scylladb.com/product/benchmarks/

Although Scylla has a superior throughput than Cassandra, the latter is more mature and battle tested for numerous internet scale applications with commercial support from Datastax. Perhaps sticking to Cassandra to solving is a good idea at this moment and let Scylla gain a more product maturity.

Pithos - build S3 like object store using Cassandra