Cassandra Persistor Deployment

Introduction

Instructions to install Cassandra from .deb or .rpm are available at: https://cassandra.apache.org/download/

Disk Storage Configuration

Data managed by Cassandra is stored locally on disk at: /var/lib/cassandra.

On EC2 instances with attached ephemeral storage (such as an I3 type, or an e.g. M5 instance with a “d” in the name), you may want to mount the local SSD drive there.

Example setup:

Format and mount the data volume for Cassandra to use at /var/lib/cassandra

sudo mkfs.xfs -L cassandra /dev/nvme1n1 # Or whatever the name of the attached ephemeral storage device is

sudo mkdir /var/lib/cassandra

And then add the following line to /etc/fstab

LABEL=cassandra /var/lib/cassandra xfs defaults,nofail,noatime,noquota 0 2

sudo mount /var/lib/cassandra

sudo chown cassandra:cassandra /var/lib/cassandra

Network Configuration

Configuration is in /etc/cassandra (or /etc/cassandra/conf on Amazon Linux 2).

The main config file is cassandra.yml

The minimum required changes to this file required to deploy a Cassandra cluster are: comment out the rpc_address and listen_address settings in this file, and set the seed address setting to the address of one or more hosts in the cluster. Start the seed node(s) first, and then bring up successive nodes one at a time.

The default cassandra.yml has a couple addresses set to localhost: rpc_address, and listen_address.

  • rpc_address is the address which the application uses to communicate with Cassandra (default port 9042).
  • listen_address is the address where Cassandra will listen for communication from other members of the Cassandra cluster (default port 7000). You can leave this alone if you’re only running a single Cassandra instance.

If these settings are commented out, Cassandra will use the value of java.net.InetAddress.getLocalHost(). On EC2 instances, this returns the internal IP of the instance (e.g. 10.0.xxx.xxx), which works well for this case. Alternatively, you can explicitly set them to an address, or set their _interface-suffixed variants if you wish to specify network interface rather than address.

There is also broadcast_address / broadcast_interface if for some reason you need the address the cluster member listens on (binds to) to be different that the address by which it is known to other cluster members, e.g. behind a NAT. This setting defaults to use the value of listen_address.

Clustering

If you’re running a cluster of Cassandra instances, you’ll need to populate the seeds setting with the address of one or more Cassandra servers to use as seed nodes, for the members to use to discover the cluster.

For more info, refer to: https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/production/seedNodesForSingleDC.html and: https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/initialize/initSingleDS.html.

Performance Optimization

There are some recommended systctl and ulimit settings you adjust on the OS - see https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/install/installRecommendSettings.html

For our testing, we followed the recommendations on https://docs.datastax.com/en/docker/doc/docker/dockerRecommendedSettings.html#dockerRecommendedSettings__userRes (as we were deploying on ECS)

Ulimits were set, with soft and hard limits, respectively, to:

nofile=100000:100000

nproc=32768:100000

memlock=-1:-1

In /etc/sysctl.d/50-cassandra.conf, we added:

vm.max_map_count=104857

Then we ran sysctl --system to make it take effect, and the IPC_LOCK capability was added.

See: https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-cassandra-on-amazon-ec2/ and https://docs.datastax.com/en/dse-planning/doc/planning/planningEC2.html#GuidelinesforEC2productionclusters for more recommendations for running Cassandra on AWS.

Quine Configuration For Use With Cassandra

To use Cassandra as the persistence backend for Quine, at a minimum, you’ll need to set in config:

quine.store {
  type = cassandra
  endpoints = ["cassandraHostAddress"]
}

Where endpoints is a list of strings that are the address(es) of one or more Cassandra hosts in the cluster. If you need to specify a port other than 9042 (the default), you can use “host:portNum” strings.

Alternatively, you may specify the environment variable CASSANDRA_ENDPOINTS (as a comma-separated list of hostnames, or host:ports), which takes precedence over the above config.

Default Cassandra Config Reference

The full range of config options available for this store section is (with default values) is as follows:

quine.store {
  # store data in an Apache Cassandra instance
  type = cassandra

  # "host:port" strings at which Cassandra nodes can be accessed from
  # the application
  endpoints = [
    "localhost:9042"
  ]

  # the keyspace to use
  keyspace = quine

  # whether the application should create the keyspace if it does not
  # yet exist
  should-create-keyspace = true

  # whether the application should create tables in the keyspace if
  # they do not yet exist
  should-create-tables = true

  # how many copies of each datum the Cassandra cluster should retain
  replication-factor = 1

  # how many hosts must agree on a datum for Quine to consider that
  # datum written/read
  write-consistency = LOCAL_QUORUM
  read-consistency = LOCAL_QUORUM

  # passed through to Cassandra
  local-datacenter = "datacenter1"

  # how long to wait before considering a write operation failed
  write-timeout = "10s"

  # how long to wait before considering a read operation failed
  read-timeout = "10s"

  # if set, the number of nodes for which to optimize node creation
  # latency
  # bloom-filter-size =
}

Actual config used:

We’ve used the following config for that section. See this doc for available consistency levels on Cassandra.

For this performance testing, we’ve chosen to optimize for performance at the expense of cluster consistency. Because we have a decline-sleep-when-write-within setting defaulted to 100ms, meaning we shouldn’t require immediate reading back of data that’s written, we felt this was an acceptable risk for this case. \

store {
  type = cassandra
  # We use AWS CloudMap to register an internal DNS entry for Cassandra seed nodes:
  endpoints = ["seed.cassandra"]
  read-consistency = ONE
  write-consistency = ANY
}

Quine Persistence Event Configuration

Relatedly, in the persistence section of our config, we have settings related to when to save data. The default values are given here:

   # configuration for which data to save about nodes and when to do so
    persistence {
      # whether to save node journals. "true" uses more disk space and
      # enables more functionality, such as historical queries
      journal-enabled = true

      # one of [on-node-sleep, on-node-update, never]. When to save a
      # snapshot of a node's current state, including any SingleId Standing
      # Queries registered on the node
      snapshot-schedule = on-node-sleep

      # whether only a single snapshot should be retained per-node. If false,
      # one snapshot will be saved at each timestamp against which a
      # historical query is made
      snapshot-singleton = false

      # when to save Standing Query partial result (only applies for the
      # `MultipleValues` mode -- `SingleId` Standing Queries always save when
      # a node saves a snapshot, regardless of this setting)
      standing-query-schedule = on-node-sleep
    }

For a hosted deployment, we’ve had those set as follows:

persistence {
  journal-enabled = false
  snapshot-singleton = true
  standing-query-schedule = "on-node-sleep"
}

Automatic Creation of Keyspace and Tables

We have a couple settings in the Cassandra section of the config, should-create-keyspace and should-create-tables, that when enabled (they default to true), will have Quine automatically create the keyspace and/or tables at startup if they don’t already exist. However, in the clustered case, because Cassandra doesn’t currently support concurrent CREATE TABLE IF NOT EXISTS statements for the same table (see CASSANDRA-10699), this can lead to exceptions being thrown on the client at startup of the form: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found e9daecc0-15b7-11ec-a406-6d2c86545d91; expected e9d98d30-15b7-11ec-a406-6d2c86545d91)

You can restart the Quine cluster if this happens. To forestall this, you could boot one node first and let it create those tables, and then have the rest of the cluster join.

Or you could create them ahead of time by running the following CQL (feel free to customize the keyspace settings as desired):

Cassandra Schema Used:

CREATE KEYSPACE thatdot WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

CREATE TABLE thatdot.journals (
    quine_id blob,
    timestamp bigint,
    data blob,
    PRIMARY KEY (quine_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy'};

CREATE TABLE thatdot.meta_data (
    key text PRIMARY KEY,
    value blob
);

CREATE TABLE thatdot.snapshots (
    quine_id blob,
    timestamp bigint,
    data blob,
    PRIMARY KEY (quine_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);

CREATE TABLE thatdot.standing_queries (
    query_id uuid PRIMARY KEY,
    queries blob
);

CREATE TABLE thatdot.standing_query_states (
    quine_id blob,
    standing_query_id uuid,
    standing_query_part_id uuid,
    data blob,
    PRIMARY KEY (quine_id, standing_query_id, standing_query_part_id)
) WITH CLUSTERING ORDER BY (standing_query_id ASC, standing_query_part_id ASC);