Showing posts from June, 2012

Bulk-Loading Data to Cassandra with sstable or JMX

The 'sstableloader' introduced from Apache Cassandra 0.8.1 onwards, provides a powerful way to load huge volumes of data into a Cassandra cluster. If you are moving from a cloud cluster to a dedicated cluster or vice-versa or  from a different database to Cassandra you will be interested in this tool. As shown below in whatever case if you can generate the 'sstable' from the data to be loaded into Cassandra, you can load it in bulk to the cluster using 'sstableloader'. I have tried it in version 1.1.2 here.

With this post I ll share my experience where I created sstables from a .csv file and loaded to a Cassandra instance running on same machine, which acts as the cluster here. sstable generationBulk loading Cassandra using sstableloaderUsing JMX 'sstable' generation To generate 'SSTableSimpleUnsortedWriter' the 'cassandra.yaml' file should be present in the class path. In Intellij Idea you can do it in Run-->Edit Config…

Running Cassandra in a Multi-node Cluster

This post gathers the steps I followed in setting up an Apache Cassandra cluster in multi-node. I have referred Cassandra wiki and Datastax documentation in setting up my cluster. The following procedure is expressed in details, sharing my experience in setting up the cluster. Setting up first nodeAdding other nodesMonitoring the cluster - nodetool, jConsole, Cassandra GUI
I used Cassandra 1.1.0 and Cassandra GUI - cassandra-gui-0.8.0-beta1 version(As older release had problems in showing data) in Ubuntu OS.
Setting up first node Open cassandra.yaml which is in 'apache-cassandra-1.1.0/conf'. Change listen_address: localhost -->  listen_address: <node IP address>          rpc_address: localhost -->  rpc_address: <node IP address> - seeds: "" --> - seeds: "node IP address" The listen address defines where the other nodes in the cluster should connect. So in a multi-node cluster it should to changed to it's identical address …