DEV Community

Saket
Saket

Posted on

Janusgraph OLAP Traversal not working with Cassandra backend with client SSL enabled.

What is olap traversal in graph database?

OLAP stands for OnLine Analytical Processing, is one of the ways to traverse graph database parallelly in batch operations.
Janusgraph OLAP Traversal makes use of distributed graph processing by leveraging gremlin plugin for Apache Hadoop and Apache Spark.
For more information on this topic please refer to below links:
JanusGraph with TinkerPop’s Hadoop-Gremlin - JanusGraph

The Problem

We had a working setup of Janusgraph with version 0.5.2 where we were able to insert and query (OLTP) the data as per need. We were exploring JanusGraph OLAP traversal for some reporting and analytical requirements. However when we tried to follow the instructions provided on the JanusGraph documentation, we were not able connect to Cassandra with SSL enabled, when traversing the graph in OLAP mode through Gremlin queries. Cassandra database was setup on SSL connection with a Truststore expected with client connection requests. OLTP Queries or the regular way of working with the queries was working fine and inline with the official documentation available.

Below is config for OLTP which works janusgraph-cql-oltp.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=cassandra.cassandra.svc.cluster.local
storage.username=cassandra
storage.password=cassandra123
storage.cql.keyspace=janusgraph
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
storage.lock.wait-time = 60000
storage.cql.ssl.enabled=true
storage.cql.ssl.truststore.location=/etc/config/tls/truststore
storage.cql.ssl.truststore.password=secretpasswd
Enter fullscreen mode Exit fullscreen mode

When we load this line in gremlin console to connect and traverse a simple query we were able to fetch the expected results.

Below is the config for OLAP which is showing error for connection to Cassandra with ssl enabled:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
# # JanusGraph Cassandra InputFormat configuration
# # These properties defines the connection properties which were used while write data to JanusGraph.
janusgraphmr.ioformat.conf.storage.backend=cql
# This specifies the hostname & port for Cassandra data store.
janusgraphmr.ioformat.conf.storage.hostname=cassandra.cassandra.svc.cluster.local
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.username=cassandra
janusgraphmr.ioformat.conf.storage.password=cassandra123
janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph
janusgraphmr.ioformat.conf.storage.lock.wait-time = 60000
janusgraphmr.ioformat.conf.storage.cql.ssl.enabled=true
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.location=/etc/config/tls/truststore
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.password=cassandra123

janusgraphmr.ioformat.conf.storage.ssl.enabled=true
janusgraphmr.ioformat.conf.storage.ssl.truststore.location=/etc/config/tls/truststore
janusgraphmr.ioformat.conf.storage.ssl.truststore.password=cassandra123

janusgraphmr.ioformat.conf.storage.cql.read-consistency-level=ONE

storage.lock.wait-time = 60000
storage.cql.ssl.enabled=true
storage.cql.ssl.client-authentication-enabled=true
storage.cql.ssl.truststore.location=/etc/config/tls/truststore
storage.cql.ssl.truststore.password=cassandra123

janusgraphmr.ioformat.conf.cache.db-cache = true
janusgraphmr.ioformat.conf.cache.db-cache-clean-wait = 20
janusgraphmr.ioformat.conf.cache.db-cache-time = 180000
janusgraphmr.ioformat.conf.cache.db-cache-size = 0.5

cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true

# # SparkGraphComputer Configuration #
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator


Enter fullscreen mode Exit fullscreen mode

When we load the graph object in gremlin console, we can see properties are loaded correctly. But when we traverse the graph as mentioned in the documentation, we get cassandra connection error related to ssl config.

gremlin> graph=HadoopGraph.open('/janusgraph-full-0.5.2/conf/olap.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g=graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> graph.configuration()
//// i can see all the properties from the file loaded here
gremlin> g.V().limit(1)
07:34:44 WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra.cassandra.svc.cluster.local/10.0.165.158:9042 (com.datastax.driver.core.exceptions.TransportException: [cassandra.cassandra.svc.cluster.local/10.0.165.158:9042] Connection has been closed))
Type ':help' or ':h' for help.

Enter fullscreen mode Exit fullscreen mode

We could verify from cassandra logs that a connection was attempted but request was rejected for ssl reasons. Below are the logs from cassandra instance:

INFO  [epollEventLoopGroup-2-4] 2023-05-02 07:34:58,809 Message.java:826 - Unexpected exception during request; channel = [id: 0xeb0e017f, L:/10.12.0.224:9042 ! R:/10.12.0.135:60316]
io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0400000001000000500003000b43514c5f56455253494f4e0005332e302e30000e4452495645525f56455253494f4e0005332e392e30000b4452495645525f4e414d4500144461746153746178204a61766120447269766572
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1057) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411) [netty-all-4.0.44.Final.jar:4.0.44.Final]

Enter fullscreen mode Exit fullscreen mode

Finally found the missing piece

After trying several combinations to pass the ssl info the connection configuration, we were still not able to establish connection with Cassandra and successfully execute an OLAP query.
We posted this as a question on stackoverflow, discord channel and google groups hoping to receive some help from community. Finally got a response from the discord community member and it worked out. The discord channel for Janusgraph and Gremlin users is quite active. The configuration parameters which were needed to be populated for ssl connection were not mentioned in the documentation. They are there in the code and below is the reference. These however work with latest versions of Janusgraph and we verified this with 0.6.0 and 1.0.0-rc2 versions.

Image description

The OLAP connection configuration was updated with below mentioned entries:

cassandra.input.native.ssl.trust.store.password=cassandra123
Enter fullscreen mode Exit fullscreen mode

Finally the updated OLAP traversal configuration looks like below:

 gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
    gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
    gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
    gremlin.hadoop.jarsInDistributedCache=true
    gremlin.hadoop.inputLocation=none
    gremlin.hadoop.outputLocation=output
    gremlin.spark.persistContext=true
    janusgraphmr.ioformat.conf.storage.backend=cql
    janusgraphmr.ioformat.conf.storage.hostname=cassandra-headless.cassandra.svc.cluster.local
    janusgraphmr.ioformat.conf.storage.port=9042
    janusgraphmr.ioformat.conf.storage.username=cassandra
    janusgraphmr.ioformat.conf.storage.password=cassa@2@2!
    janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph
    janusgraphmr.ioformat.conf.storage.cql.read-consistency-level=ONE
    janusgraphmr.ioformat.conf.storage.cql.ssl.enabled=true
    janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.location=/tmp/security/truststore
    janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.password=cassandra123
    storage.cql.read-consistency-level=ONE
    janusgraphmr.ioformat.conf.cache.db-cache = true
    janusgraphmr.ioformat.conf.cache.db-cache-clean-wait = 20
    janusgraphmr.ioformat.conf.cache.db-cache-time = 180000
    janusgraphmr.ioformat.conf.cache.db-cache-size = 0.5
    cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
    cassandra.input.native.keep.alive=true
    cassandra.input.native.ssl.trust.store.path=/tmp/security/truststore
    cassandra.input.native.ssl.trust.store.password=cassa@2@2!
    storage.cql.protocol-version=V4 
    spark.master=local[*]
    spark.executor.memory=3g
    spark.serializer=org.apache.spark.serializer.KryoSerializer
    spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
    spark.cassandra.input.fetch.size_in_rows=500

Enter fullscreen mode Exit fullscreen mode

With the above configuration we were able to traverse the graph using OLAP traversal and achieve our objective.

Top comments (0)