How to dump Cassandra DB keyspace

Overview

If you have some experience with SQL databases you know tools like mysqldump or pg_dump that allow to extract a database into a SQL script file. Unfortunatelly Apache Cassandra DB doesn't provide any similar tool. But there are several ways how to make a keyspace dump using free native and commercial software.
In our small research we'll measure dumping time, point advantages and disadvantages, compare usability for the each way.
We hope it will help to find the most appropriate way for you.

We'll test performance on Cassandra Apache cluster in a local network. The test keyspace consists of 8 tables containing from 1K to about 500K rows each.

We'll consider two different ways:

  1. DESCRIBE and COPY commands
  2. CQL Dump tool

The results of our comparison you can find here.

Test cluster and keyspace

Host: host1
Port: 9049
Login: cassandra
Password: cassandra
Keyspace: employees

Note: don't forget to use your connection parameters in the commands below.

1. DESCRIBE and COPY commands

You need cqlsh for this way.
cqlsh is a command line shell for running CQL (Cassandra Query Language) commands. cqlsh is a part of every Cassandra package, and can be found in the bin folder. You can find more information about this tool here.

To dump a keyspace structure we'll use DESCRIBE data definition command.
DESCRIBE command prints a description (DDL statements like create table) of a keyspace object.

cqlsh.bat -k employees -u cassandra -p cassandra host1 9049 -e "describe keyspace employees" > schema.cql

To dump table data we'll use COPY TO command with default options.
COPY TO command copies data from a table to a CSV (Comma-Separated Values) file.

cqlsh.bat -k employees -u cassandra -p cassandra host1 9049 -e "copy table-name to 'table-name.csv' with header=true"

To dump all keyspace tables data at once we suggest to prepare command file copy.txt:

copy current_dept_emp to 'current_dept_emp.csv' with header=true;
copy departments to 'departments.csv' with header=true;
copy dept_emp to 'dept_emp.csv' with header=true;
copy dept_manager to 'dept_manager.csv' with header=true;
copy employees to 'employees.csv' with header=true;
copy salaries to 'salaries.csv' with header=true;
copy titles to 'titles.csv' with header=true;

Execute the prepared command file with -f cqlsh option:

cqlsh.bat -k employees -u cassandra -p cassandra host1 9049 -f "copy.txt" > 1

Note: we recommend to redirect the command output to a file with > 1 redirection parameter. This improves dumping rate for large tables.

Execution time: 49s
Pros:
+ Require only native free tool.
+ Good performance.
Cons:
- Poor usability.

2. CQL Dump tool of NoSQL Manager for Cassandra

CQL Dump Tool is a feature of NoSQL Manager for Cassandra. It allows you to make a keyspace backup by generating a text file that contains CQL statements.

2.1. Right-click on employees keyspace in DB Explorer and select Dump 'employees' keyspace... item in the popup menu.

DB Explorer: CQL Dump tool

2.2. Specify the dump options and click Start Dump to strart.

CQL Dump Options

2.3. Dumping log.

CQL Dump Log

Execution time: 21s
Pros:
+ More than twice faster then the way #1.
+ Good usability.
+ Easy to restore.
Cons:
- Third-party paid tool.
- Dump file takes more disk space.

3. Summary

File format Dump time Files count Dump size
cqlsh, DESCRIBE + COPY TO CQL (structure), CSV (data) 49s Multiple 176MB
CQL Dump tool CQL (structure and data) 21s Single 350MB