This document describes the Cassandra Query Language (CQL) version 3. CQL v3 is not backward compatible with CQL v2 and differs from it in numerous ways.
CQL v3 offers a model very close to SQL in the sense that data is put in tables containing rows of columns. For that reason, when used in this document, these terms (tables, rows and columns) have the same definition than they have in SQL. But please note that as such, they do not refer to the concept of rows and columns found in the internal implementation of Cassandra and in the thrift and CQL v2 API.
To aid in specifying the CQL syntax, we will use the following conventions in this document:
<start> ::= TERMINAL <non-terminal1> <non-terminal1>
<angle brackets>
.?
, +
and *
) to signify that a given symbol is optional and/or can be repeated. We’ll also allow parentheses to group symbols and the [<characters>]
notation to represent any one of <characters>
.CREATE TABLE
statement is optional but supported if present even though the provided grammar in this document suggest it is not supported. SELECT sample_usage FROM cql;
fixed-width font
.The CQL language uses identifiers (or names) to identify tables, columns and other objects. An identifier is a token matching the regular expression [a-zA-Z0-9_]
*
.
A number of such identifiers, like SELECT
or WITH
, are keywords. They have a fixed meaning for the language and most are reserved. The list of those keywords can be found in Appendix A.
Identifiers and (unquoted) keywords are case insensitive. Thus SELECT
is the same than select
or sElEcT
, and myId
is the same than myid
or MYID
for instance. A convention often used (in particular by the samples of this documentation) is to use upper case for keywords and lower case for other identifiers.
There is a second kind of identifiers called quoted identifiers defined by enclosing an arbitrary sequence of characters in double-quotes("
). Quoted identifiers are never keywords. Thus "select"
is not a reserved keyword and can be used to refer to a column, while select
would raise a parse error. Also, contrarily to unquoted identifiers and keywords, quoted identifiers are case sensitive ("My Quoted Id"
is different from "my quoted id"
). A fully lowercase quoted identifier that matches [a-zA-Z0-9_]
*
is equivalent to the unquoted identifier obtained by removing the double-quote (so "myid"
is equivalent to myid
and to myId
but different from "myId"
). Inside a quoted identifier, the double-quote character can be repeated to escape it, so "foo "" bar"
is a valid identifier.
CQL defines 3 kinds of implicitly-typed constants: strings, numbers and uuids:
'
). One can include a single-quote in a string by repeating it, e.g. 'It''s raining today'
. Those are not to be confused with quoted identifiers that use double-quotes.-?[0-9]+
or a float constant defined by -?[0-9]+.[0-9]*
.hex{8}-hex{4}-hex{4}-hex{4}-hex{12}
where hex
is an hexadecimal character, e.g. [0-9a-fA-F]
and {4}
is the number of such characters.A comment in CQL is a line beginning by either double dashes (--
) or double slash (//
).
-- This is a comment // This is a comment too
CQL consists of statements. As in SQL, these statements can be divided in 3 categories:
All statements end with a semicolon (;
) but that semicolon can be omitted when dealing with a single statement. The supported statements are described in the following sections. When describing the grammar of said statement, we will reuse the non-terminal symbol defined below:
<identifier> ::= any quoted or unquoted identifier, excluding reserved keywords <tablename> ::= (<identifier> '.')? <identifier> <string> ::= a string constant <integer> ::= an integer constant <float> ::= a float constant <number> ::= <integer> | <float> <uuid> ::= a uuid constant <term> ::= <identifier> | <string> | <number> | <uuid> | '?' <int-term> ::= <identifier> | '?'
The question mark (?
) in the syntax above is a bind variables for prepared statements.
A <tablename>
will be used to identify a table. This is an identifier representing the table name that can be preceded by a keyspace name. The keyspace name, if provided, allow to identify a table in another keyspace than the currently active one (the currently active keyspace is set through the USE statement).
CQL supports prepared statements. Prepared statement is an optimization that allows to parse a query only once but execute it multiple times with differente concrete values.
In a statement, each time a column value is expected (in the data manipulation and query statements), a bind variable marker (denoted by a ?
symbol) can be used instead. A statement with bind variables must then be prepared. Once it has been prepared, it can executed by providing concrete values for the bind variables (values for bind variables must be provided in the order the bind variables are defined in the query string). The exact procedure to prepare a statement and execute a prepared statement depends on the CQL driver used and is beyond the scope of this document.
Data manipulation statements and queries allows to optionally specify the consistency level of the operation. Such consistency levels are specified through the following syntax:
<consistency-level> ::= ANY | ONE | TWO | THREE | QUORUM | ALL | LOCAL_QUORUM | EACH_QUORUM
When not user-specified, the default consistency level is ONE
. Consult your Cassandra documentation for information about how consistency levels work.
Syntax:
<create-keyspace-stmt> ::= CREATE KEYSPACE <identifier> WITH replication = <map> <map> ::= '{' <identifier> ':' <value> ( ',' <identifier> ':' <value> )* '}' <value> ::= <identifier> | <string> | <number>
Sample:
CREATE KEYSPACE Excelsior WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3}; CREATE KEYSPACE Excalibur WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 3};
The CREATE KEYSPACE
statement creates a new top-level keyspace. A keyspace is a namespace that defines a replication strategy for a set of tables. Valid keyspaces names are identifiers composed exclusively of alphanumerical characters and whose length is lesser or equal to 32. Note that as identifiers, keyspace names are case insensitive: use a quoted identifier for case sensitive keyspace names.
Syntax:
<use-stmt> ::= USE <identifier>
Sample:
USE myApp;
The USE
statement takes an existing keyspace name as argument and set it as the per-connection current working keyspace. All subsequent keyspace-specific actions will be performed in the context of the selected keyspace, unless otherwise specified, until another USE statement is issued or the connection terminates.
Syntax:
<drop-keyspace-stmt> ::= DROP KEYSPACE <identifier>
Sample:
DROP KEYSPACE myApp;
A DROP KEYSPACE
statement results in the immediate, irreversible removal of an existing keyspace, including all column families in it, and all data contained in those column families.
Syntax:
<create-table-stmt> ::= CREATE (TABLE | COLUMNFAMILY) <tablename> '(' <definition> ( ',' <definition> )* ')' ( WITH <option> ( AND <option>)* )? <column-definition> ::= <identifier> <type> ( PRIMARY KEY )? | PRIMARY KEY '(' <identifier> ( ',' <identifier> )* ')' <option> ::= <identifier> ( ':' ( <identifier> | <identifier> ) )? '=' <value> | COMPACT STORAGE | CLUSTERING ORDER <value> ::= <identifier> | <string> | <number>
Sample:
CREATE TABLE monkeySpecies ( species text PRIMARY KEY, common_name text, population varint, average_size int ) WITH comment='Important biological records' AND read_repair_chance = 1.0; CREATE TABLE timeline ( userid uuid, posted_month int, posted_time uuid, body text, posted_by text, PRIMARY KEY (userid, posted_month, posted_time) );
The CREATE TABLE
statement creates a new table. Each such table is a set of rows (usually representing related entities) for which it defines a number of properties. A table is defined by a name, it defines the CREATE COLUMNFAMILY
syntax is supported as an alias for CREATE TABLE
(for historical reasons).
<tablename>
Valid table names are the same than valid keyspace names (up to 32 characters long alphanumerical identifiers). If the table name is provided alone, the table is created within the current keyspace (see USE), but if it is prefixed by an existing keyspace name (see <tablename>
grammar), it is created in the specified keyspace (but does not change the current keyspace).
<column-definition>
A CREATE TABLE
statement defines the columns that rows of the table can have. A column is defined by its name (an identifier) and its type (see the data types section for more details on allowed types and their properties).
Within a table, a row is uniquely identified by its PRIMARY KEY
(or more simply the key), and hence all table definitions must define a PRIMARY KEY (and only one). A PRIMARY KEY
is composed of one or more of the columns defined in the table. If the PRIMARY KEY
is only one column, this can be specified directly after the column definition. Otherwise, it must be specified by following PRIMARY KEY
by the comma-separated list of column names composing the key within parenthesis. Note that:
CREATE TABLE t ( k int PRIMARY KEY, other text )
is equivalent to
CREATE TABLE t ( k int, other text, PRIMARY KEY (k) )
Moreover, a table must define at least one column that is not part of the PRIMARY KEY as a row exists in Cassandra only if it contains at least one value for one such column.
In CQL, the order in which columns are defined for the PRIMARY KEY
matters. The first column of the key is called the partition key. It has the property that all the rows sharing the same partition key (even across table in fact) are stored on the same physical node. Also, insertion/update/deletion on rows sharing the same partition key for a given table are performed atomically and in isolation.
The remaining columns of the PRIMARY KEY
definition, if any, are called clustering keys. On a given physical node, rows for a given partition key are stored in the order induced by the clustering keys, making the retrieval of rows in that clustering order particularly efficient (see SELECT).
<option>
The CREATE TABLE
statement supports a number of options that controls the configuration of a new table. These options can be specified after the WITH
keyword.
The first of these option is COMPACT STORAGE
. This option is meanly targeted towards backward compatibility with some table definition created before CQL3. But it also provides a slightly more compact layout of data on disk, though at the price of flexibility and extensibility, and for that reason is not recommended unless for the backward compatibility reason. The restriction for table with COMPACT STORAGE
is that they support one and only one column outside of the ones part of the PRIMARY KEY
. It also follows that columns cannot be added nor removed after creation. A table with COMPACT STORAGE
must also define at least one clustering key.
Another option is CLUSTERING ORDER
. It allows to define the ordering of rows on disk. It takes the list of the clustering key names with, for each of them, the on-disk order (Ascending or descending). Note that this option affects what ORDER BY
are allowed during SELECT
.
Table creation supports the following other options:
option | default | description |
---|---|---|
comment | none | A free-form, human-readable comment. |
read_repair_chance | 0.1 | The probability with which to query extra nodes (e.g. more nodes than required by the consistency level) for the purpose of read repairs. |
dclocal_read_repair_chance | 0 | The probability with which to query extra nodes (e.g. more nodes than required by the consistency level) belonging to the same data center than the read coordinator for the purpose of read repairs. |
gc_grace_seconds | 864000 | Time to wait before garbage collecting tombstones (deletion markers). |
bloom_filter_fp_chance | 0.00075 | The target probability of false positive of the sstable bloom filters. Said bloom filters will be sized to provide the provided probability (thus lowering this value impact the size of bloom filters in-memory and on-disk) |
compaction_strategy_class | SizeTieredCompactionStrategy | The compaction strategy to use. Default value are ‘SizeTieredCompactionStrategy’ and ‘LeveledCompactionStrategy’. Custom strategy can be provided by specifying the full class name as a string constant. |
compaction_strategy_options | none | Options for the compaction strategy. If opt is the name of an option to set, it can be set using compression_strategy_options:opt = <value> . See below for the default options. |
compression_parameters | see below | Compression options. If opt is the name of an option to set, it can be set using compression_parameters:opt = <value> . See below for the default options. |
replicate_on_write | true | Whether to replicate data on write. This can only be set to false for tables with counters values. Disabling this is dangerous and can result in random lose of counters, don’t disable unless you are sure to know what you are doing |
caching | keys_only | Whether to cache keys (“key cache”) and/or rows (“row cache”) for this table. Valid values are: all , keys_only , rows_only and none . |
The following default sub-options are available for compaction_strategy_options
:
option | supported compaction strategy | default | description |
---|---|---|---|
min_sstable_size | SizeTieredCompactionStrategy | 50MB | The size tiered strategy groups SSTables to compact in buckets. A bucket groups SSTables that differs from less than 50% in size. However, for small sizes, this would result in a bucketing that is too fine grained. min_sstable_size defines a size threshold (in bytes) below which all SSTables belong to one unique bucket |
min_compaction_threshold | SizeTieredCompactionStrategy | 4 | Minimum number of SSTables needed to start a minor compaction. |
max_compaction_threshold | SizeTieredCompactionStrategy | 32 | Maximum number of SSTables processed by one minor compaction. |
sstable_size_in_mb | LeveledCompactionStrategy | 5MB | The target size (in MB) for sstables in the leveled strategy. Note that while sstable sizes should stay less or equal to sstable_size_in_mb , it is possible to exceptionally have a larger sstable as during compaction, data for a given partition key are never split into 2 sstables |
For the compression_parameters
options, the following default sub-options are available:
option | default | description |
---|---|---|
sstable_compression | SnappyCompressor | The compression algorithm to use. Default compressor are: SnappyCompressor and DeflateCompressor. Use an empty string ('' ) to disable compression. Custom compressor can be provided by specifying the full class name as a string constant. |
chunk_length_kb | 64KB | On disk SSTables are compressed by block (to allow random reads). This defines the size (in KB) of said block. Bigger values may improve the compression rate, but increases the minimum size of data to be read from disk for a read |
crc_check_chance | 1.0 | When compression is enabled, each compressed block includes a checksum of that block for the purpose of detecting disk bitrot and avoiding the propagation of corruption to other replica. This option defines the probability with which those checksums are checked during read. By default they are always checked. Set to 0 to disable checksum checking and to 0.5 for instance to check them every other read |
Syntax:
<alter-table-stmt> ::= ALTER (TABLE | COLUMNFAMILY) <tablename> <instruction> <instruction> ::= ALTER <identifier> TYPE <type> | ADD <identifier> <type> | DROP <identifier> | WITH <option> ( AND <option> )* <option> ::= <identifier> ( ':' ( <identifier> | <identifier> ) )? '=' <value> <value> ::= <identifier> | <string> | <number>
Sample:
ALTER TABLE addamsFamily ALTER lastKnownLocation TYPE uuid; ALTER TABLE addamsFamily ADD gravesite varchar; ALTER TABLE addamsFamily DROP gender; ALTER TABLE addamsFamily WITH comment = 'A most excellent and useful column family' AND read_repair_chance = 0.2;
The ALTER
statement is used to manipulate table definitions. It allows to add new columns, drop existing ones, change the type of existing columns, or update the table options. As for table creation, ALTER COLUMNFAMILY
is allowed as an alias for ALTER TABLE
.
The <tablename>
is the table name optionally preceded by the keyspace name. The <instruction>
defines the alteration to perform:
ALTER
: Update the type of a given defined column. Note that the type of the clustering keys cannot be modified as it induces the on-disk ordering of rows. Columns on which a secondary index is defined have the same restriction. Other columns are free from those restrictions (no validation of existing data is performed), but it is usually a bad idea to change the type to a non-compatible one, unless no data have been inserted for that column yet, as this could confuse CQL drivers/tools.ADD
: Adds a new column to the table. The <identifier>
for the new column must not conflict with an existing column. Moreover, columns cannot be added to tables defined with the COMPACT STORAGE
option.DROP
: TODO
(pending #3919)WITH
: Allows to update the options of the table. The supported options (and syntax) are the same as for the CREATE TABLE
statement except that COMPACT STORAGE
is not supported. Note that setting any compaction_strategy_options:*
parameters has the effect of erasing all previous compaction_strategy_options:*
parameters, so you will need to re-specify any such parameters which have already been set, if you want to keep them. The same note applies to the set of compression_parameters:*
parameters.Syntax:
<drop-table-stmt> ::= DROP TABLE <tablename>
Sample:
DROP TABLE worldSeriesAttendees;
The DROP TABLE
statement results in the immediate, irreversible removal of a table, including all data contained in it. As for table creation, DROP COLUMNFAMILY
is allowed as an alias for DROP TABLE
.
Syntax:
<truncate-stmt> ::= TRUNCATE <tablename>
Sample:
TRUNCATE superImportantData;
The TRUNCATE
statement permanently removes all data from a table.
Syntax:
<create-index-stmt> ::= CREATE INDEX <identifier>? ON <tablename> '(' <identifier> ')'
Sample:
CREATE INDEX userIndex ON NerdMovies (user); CREATE INDEX ON Mutants (abilityId);
The CREATE INDEX
statement is used to create a new (automatic) secondary index for a given (existing) column in a given table. A name for the index itself can be specified before the ON
keyword, if desired. If data already exists for the column, it will be indexed during the execution of this statement. After the index is created, new data for the column is indexed automatically at insertion time.
Syntax:
<drop-index-stmt> ::= DROP INDEX <identifier>
Sample:
DROP INDEX userIndex;
The DROP INDEX
statement is used to drop an existing secondary index. The argument of the statement is the index name.
Syntax:
<insertStatement> ::= INSERT INTO <tablename> '(' <identifier> ( ',' <identifier> )* ')' VALUES '(' <term> ( ',' <term> )* ')' ( USING <option> ( AND <option> )* )? <option> ::= CONSISTENCY <consistency-level> | TIMESTAMP <integer> | TTL <integer>
Sample:
INSERT INTO NerdMovies (movie, director, main_actor, year) VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion', 2005) USING CONSISTENCY LOCAL_QUORUM AND TTL 86400;
The INSERT
statement writes one or more columns for a given row in a table. Note that since a row is identified by its PRIMARY KEY
, the columns that compose it must be specified. Also, since a row only exists when it contains one value for a column not part of the PRIMARY KEY
, one such value must be specified too.
Note that unlike in SQL, INSERT
does not check the prior existence of the row: the row is created if none existed before, and updated otherwise. Furthermore, there is no mean to know which of creation or update happened. In fact, the semantic of INSERT
and UPDATE
are identical.
All updates for an INSERT
are applied atomically and in isolation.
Please refer to the UPDATE
section for information on the <option>
available. Also note that INSERT
does not support counters, while UPDATE
does.
Syntax:
<update-stmt> ::= UPDATE <tablename> ( USING <option> ( AND <option> )* )? SET <assignment> ( ',' <assignment> )* WHERE <where-clause> <assignment> ::= <identifier> '=' <term> | <identifier> '=' <identifier> ('+' | '-') <int-term> <where-clause> ::= <identifier> '=' <term> | <identifier> IN '(' <term> ( ',' <term> )* ')' <option> ::= CONSISTENCY <consistency-level> | TIMESTAMP <integer> | TTL <integer>
Sample:
UPDATE NerdMovies USING CONSISTENCY ALL AND TTL 400 SET director = 'Joss Whedon', main_actor = 'Nathan Fillion', year = 2005 WHERE movie = 'Serenity'; UPDATE UserActions SET total = total + 2 WHERE user = B70DE1D0-9908-4AE3-BE34-5573E5B09F14 AND action = 'click';
The UPDATE
statement writes one or more columns for a given row in a table. The <where-clause>
is used to select the row to update and must include all columns composing the PRIMARY KEY
. Other columns values are specified through <assignment>
after the SET
keyword.
Note that unlike in SQL, UPDATE
does not check the prior existence of the row: the row is created if none existed before, and updated otherwise. Furthermore, there is no mean to know which of creation or update happened. In fact, the semantic of INSERT
and UPDATE
are identical.
In an UPDATE
statement, all updates within the same partition key are applied atomically and in isolation.
The c = c + 3
form of <assignment>
is used to increment/decrement counters. The identifier after the ‘=’ sign must be the same than the one before the ‘=’ sign (Only increment/decrement is supported on counters, not the assignment of a specific value).
<options>
The UPDATE
and INSERT
statements allows to specify the following options for the insertion:
CONSISTENCY
: sets the consistency level for the operation. The default consistency level is ONE
.TIMESTAMP
: sets the timestamp for the operation. If not specified, the current time of the insertion (in microseconds) is used. This is usually a suitable default.TTL
: allows to specify an optional Time To Live (in seconds) for the inserted values. If set, the inserted values are automatically removed from the database after the specified time. Note that the TTL concerns the inserted values, not the column themselves. This means that any subsequent update of the column will also reset the TTL (to whatever TTL is specified in that update). By default, values never expire.Syntax:
<delete-stmt> ::= DELETE ( <identifier> ( ',' <identifier> )* )? FROM <tablename> ( USING <option> ( AND <option> )* )? WHERE <where-clause> <where-clause> ::= <identifier> '=' <term> | <identifier> IN '(' <term> ( ',' <term> )* ')' <option> ::= CONSISTENCY <consistency-level> | TIMESTAMP <integer>
Sample:
DELETE FROM NerdMovies USING CONSISTENCY QUORUM WHERE movie = 'Serenity'; DELETE phone FROM Users WHERE userid IN (C73DE1D3-AF08-40F3-B124-3FF3E5109F22, B70DE1D0-9908-4AE3-BE34-5573E5B09F14);
The DELETE
statement deletes columns and rows. If column names are provided directly after the DELETE
keyword, only those columns are deleted from the row indicated by the <where-clause>
. Otherwise whole rows are removed. The <where-clause>
allows to specify the key for the row(s) to delete.
DELETE
supports both the CONSISTENCY
and TIMESTAMP
options with the same semantic that in the UPDATE
statement.
In a DELETE
statement, all deletions within the same partition key are applied atomically and in isolation.
Syntax:
<batch-stmt> ::= BEGIN BATCH ( USING <option> ( AND <option> )* )? <modification-stmt> ( ';' <modification-stmt> )* APPLY BATCH <modification-stmt> ::= <insert-stmt> | <update-stmt> | <delete-stmt> <option> ::= CONSISTENCY <consistency-level> | TIMESTAMP <integer>
Sample:
BEGIN BATCH USING CONSISTENCY QUORUM INSERT INTO users (userid, password, name) VALUES ('user2', 'ch@ngem3b', 'second user'); UPDATE users SET password = 'ps22dhds' WHERE userid = 'user3'; INSERT INTO users (userid, password) VALUES ('user4', 'ch@ngem3c'); DELETE name FROM users WHERE userid = 'user1'; APPLY BATCH;
The BATCH
statement group multiple modification statements (insertions/updates and deletions) into a single statement. It mainly serves two purposes:
BATCH
belonging to a given partition key are performed atomically and in isolationNote however that the BATCH
statement only allows UPDATE
, INSERT
and DELETE
statements and is not a full analogue for SQL transactions.
<option>
BATCH
supports both the CONSITENCY
and TIMESTAMP
options, with similar semantic to the ones described in the UPDATE
statement. However:
BATCH
must not specify a consistency level.TIMESTAMP
option can be use to set the timestamp for all the statements included in the BATCH
. If used, TIMESTAMP
must not be used in the statements within the batch.Syntax:
<select-stmt> ::= SELECT <select-clause> FROM <tablename> ( USING CONSISTENCY <consistency-level> )? ( WHERE <where-clause> )? ( ORDER BY <order-by> )? ( LIMIT <integer> )? <select-clause> ::= <column-list> | COUNT '(' ( '*' | '1' ) ')' <column-list> ::= <selected_id> ( ',' <selected_id> )* | '*' <selected_id> ::= <identifier> | WRITETIME '(' <identifier> ')' | TTL '(' <identifier> ')' <where-clause> ::= <relation> ( "AND" <relation> )* <relation> ::= <identifier> ("=" | "<" | ">" | "<=" | ">=") <term> | <identifier> IN '(' <term> ( ',' <term>)* ')' | TOKEN '(' <identifier> ')' ("=" | "<" | ">" | "<=" | ">=") (<term> | TOKEN '( <term> ')' ) <order-by> ::= <ordering> ( ',' <odering> )* <ordering> ::= <identifer> ( ASC | DESC )?
Sample:
SELECT name, occupation FROM users WHERE userid IN (199, 200, 207); SELECT time, value FROM events WHERE event_type = 'myEvent' AND time > 2011-02-03 AND time <= 2012-01-01 SELECT COUNT(*) FROM users;
The SELECT
statements reads one or more columns for one or more rows in a table. It returns a result-set of rows, where each row contains the collection of columns corresponding to the query.
<select-clause>
The <select-clause>
determines which columns needs to be queried and returned in the result-set. It consists of either the comma-separated list of column names to query, or the wildcard character (*
) to select all the columns defined for the table.
In addition to selecting columns, the WRITETIME
(resp. TTL
) function allows to select the timestamp of when the column was inserted (resp. the time to live (in seconds) for the column (or null if the column has no expiration set)).
The COUNT
keyword can be used with parenthesis enclosing *
. If so, the query will return a single result: the number of rows matching the query. Note that COUNT(1)
is supported as an alias.
<where-clause>
The <where-clause>
specifies which rows must be queried. It is composed of relations on the columns that are part of the PRIMARY KEY
and/or have a secondary index defined on them.
Not all relations are allowed in a query. For instance, non-equal relations (where IN
is considered as an equal relation) on a partition key is only supported if the partitioner for the keyspace is an ordered one. Moreover, for a given partition key, the clustering keys induce an ordering of rows and relations on them is restricted to the relations that allow to select a contiguous (for the ordering) set of rows. For instance, given
CREATE TABLE posts ( userid text, blog_title text, posted_at timestamp, entry_title text, content text, category int, PRIMARY KEY (userid, blog_title, posted_at) )
The following query is allowed:
SELECT entry_title, content FROM posts WHERE userid='john doe' AND blog_title='John's Blog' AND posted_at >= 2012-01-01 AND posted_at < 2012-01-31
But the following one is not, as it does not select a contiguous set of rows (and we suppose no secondary indexes are set):
// Needs a blog_title to be set to select ranges of posted_at SELECT entry_title, content FROM posts WHERE userid='john doe' AND posted_at >= 2012-01-01 AND posted_at < 2012-01-31
When specifying relations, the TOKEN
function can be used on the PARTITION KEY
column to query. In that case, rows will be selected based on the token of their PARTITION_KEY
rather than on the value (note that the token of a key depends on the partitioner in use, and that in particular the RandomPartitioner won’t yeld a meaningful order). Example:
SELECT * FROM posts WHERE token(userid) > token('tom') AND token(userid) < token('bob')
<order-by>
The ORDER BY
option allows to select the order of the returned results. It takes as argument a list of column names along with the order for the column (ASC
for ascendant and DESC
for descendant, omitting the order being equivalent to ASC
). Currently the possible orderings are limited (which depends on the table CLUSTERING ORDER
):
CLUSTERING ORDER
, then then allowed orderings are the order induced by the clustering key and the reverse of that one.CLUSTERING ORDER
option and the reversed one.The consistency level of a query can be set as for data manipulation statements using the USING CONSISTENCY
keywords.
The LIMIT
option to a SELECT
statement limits the number of rows returned by a query. LIMIT
defaults to 10,000 when left unset.
CQL supports a rich set of native data types for columns defined in a table. On top of those native types, users can also provide custom types (through a JAVA class extending AbstractType
loadable by Cassandra). The syntax of types is thus:
<type> ::= <native_type> | <collection_type> | <string> // Used for custom types. The fully-qualified name of a JAVA class <native_type> ::= ascii | bigint | blob | boolean | counter | decimal | double | float | inet | int | text | timestamp | timeuuid | uuid | varchar | varint <collection_type> ::= list '<' <native_type> '>' | set '<' <native_type> '>' | map '<' <native_type> ( ',' <native_type> )* '>'
Note that the native types are keywords and as such are case-insensitive. They are however not reserved ones.
The following table gives additional informations on the native data types:
type | description |
---|---|
ascii | ASCII character string |
bigint | 64-bit signed long |
blob | Arbitrary bytes (no validation) |
boolean | true or false |
counter | Counter column (64-bit signed value). See Counters for details |
decimal | Variable-precision decimal |
double | 64-bit IEEE-754 floating point |
float | 32-bit IEEE-754 floating point |
inet | An IP address. It can be either 4 bytes long (IPv4) or 16 bytes long (IPv6) |
int | 32-bit signed int |
text | UTF8 encoded string |
timestamp | A timestamp. See Working with dates below for more information. |
timeuuid | Type 1 UUID. This is a “conflict-free” timestamp and as timestamp , it allows date notation: see Working with dates below. |
uuid | Type 1 or type 4 UUID |
varchar | UTF8 encoded string |
varint | Arbitrary-precision integer |
Values of the timestamp
type are encoded as 64-bit signed integers representing a number of milliseconds since the standard base time known as “the epoch”: January 1 1970 at 00:00:00 GMT. Values of the timeuuid
type also include such timestamp and sort accordingly to said timestamp.
Timestamp and timeuuid types can be input in CQL as simple long integers, giving the number of milliseconds since the epoch, as defined above.
They can also be input as string literals in any of the following ISO 8601 formats, each representing the time and date Mar 2, 2011, at 04:05:00 AM, GMT.:
2011-02-03 04:05+0000
2011-02-03 04:05:00+0000
2011-02-03T04:05+0000
2011-02-03T04:05:00+0000
The +0000
above is an RFC 822 4-digit time zone specification; +0000
refers to GMT. US Pacific Standard Time is -0800
. The time zone may be omitted if desired— the date will be interpreted as being in the time zone under which the coordinating Cassandra node is configured.
2011-02-03 04:05
2011-02-03 04:05:00
2011-02-03T04:05
2011-02-03T04:05:00
There are clear difficulties inherent in relying on the time zone configuration being as expected, though, so it is recommended that the time zone always be specified for timestamps when feasible.
The time of day may also be omitted, if the date is the only piece that matters:
2011-02-03
2011-02-03+0000
In that case, the time of day will default to 00:00:00, in the specified or default time zone.
The counter
type is used to define counter columns. A counter column is a column whose value is a 64-bit signed integer and on which 2 operations are supported: incrementation and decrementation (see UPDATE
for syntax). Note the value of a counter cannot be set. A counter doesn’t exist until first incremented/decremented, and the first incrementation/decrementation is made as if the previous value was 0. Deletion of counter columns is supported but have some limitations (see the Cassandra Wiki for more information).
The use of the counter type is limited in the following way:
PRIMARY KEY
of a table.PRIMARY KEY
have the counter type, or none of them have it.A map
is a typed set of key-value pairs, where keys are unique. To create a column of type map
, use the map
keyword suffixed with comma-separated key and value types, enclosed in angle brackets. For example:
CREATE TABLE users ( id text PRIMARY KEY, given text, surname text, favs map<text, text> // A map of text keys, and text values )
Writing map
data is accomplished with a JSON-inspired syntax. To write a record using INSERT
, specify the entire map as a JSON-style associative array. Note: This form will always replace the entire map.
// Inserting (or Updating) INSERT INTO users (id, given, surname, favs) VALUES ('jsmith', 'John', 'Smith', { 'fruit' : 'apple', 'band' : 'Beatles' });
Adding key-values to the map of an existing record can be accomplished by subscripting the map column in an UPDATE
statement.
// Updating (or inserting) UPDATE users SET favs['author'] = 'Ed Poe' WHERE id = 'jsmith';
A list
is a typed, ordered collection of non-unique values. To create a column of type list
, use the list
keyword suffixed with the value type enclosed in angle brackets. For example:
CREATE TABLE plays ( id text PRIMARY KEY, game text, players int, scores list<int> )
Writing list
data is accomplished with a JSON-style syntax. To write a record using INSERT
, specify the entire list as a JSON array. Note: An INSERT
will always replace the entire list.
INSERT INTO plays (id, game, players, scores) VALUES ('123-afde', 'quake', 3, [17, 4, 2]);
Adding values to a list can be accomplished by adding a new JSON-style array to an existing list
column.
UPDATE plays SET players = 5, scores = scores + [ 14, 21 ] WHERE id = '123-afde';
A set
is a typed collection of non-ordered unique values. To create a column of type set
, use the set
keyword suffixed with the value type enclosed in angle brackets. For example:
CREATE TABLE images ( name text PRIMARY KEY, owner text, date timestamp, tags set<text> );
Writing a set
is accomplished by comma separating the set values, and enclosing them in curly braces. Note: An INSERT
will always replace the entire set.
INSERT INTO images (name, owner, date, tags) VALUES ('cat.jpg', 'jsmith', 'now', { 'kitten', 'cat', 'pet' });
Adding values to a set can be accomplished with an UPDATE
by adding new set values to an existing set
column.
UPDATE images SET tags = tags + { 'cute', 'cuddly' } WHERE name = 'cat.jpg';
CQL distinguishes between reserved and non-reserved keywords. Reserved keywords cannot be used as identifier, they are truly reserved for the language (but one can enclose a reserved keyword by double-quotes to use it as an identifier). Non-reserved keywords however only have a specific meaning in certain context but can used as identifer otherwise. The only raison d'ĂȘtre of these non-reserved keywords is convenience: some keyword are non-reserved when it was always easy for the parser to decide whether they were used as keywords or not.
Keyword | Reserved? |
---|---|
ADD | yes |
ALL | yes |
ALTER | yes |
AND | yes |
ANY | yes |
APPLY | yes |
ASC | yes |
ASCII | no |
BATCH | yes |
BEGIN | yes |
BIGINT | no |
BLOB | no |
BOOLEAN | no |
BY | yes |
CLUSTERING | no |
COLUMNFAMLY | yes |
COMPACT | no |
CONSITENCY | no |
COUNT | no |
COUNTER | no |
CREATE | yes |
DECIMAL | no |
DELETE | yes |
DESC | yes |
DOUBLE | no |
DROP | yes |
EACH_QUORUM | yes |
FLOAT | no |
FROM | yes |
IN | yes |
INDEX | yes |
INSERT | yes |
INT | no |
INTO | yes |
KEY | no |
KEYSPACE | yes |
LEVEL | no |
LIMIT | yes |
LOCAL_QUORUM | yes |
ON | yes |
ONE | yes |
ORDER | yes |
PRIMARY | yes |
QUORUM | yes |
SCHEMA | yes |
SELECT | yes |
SET | yes |
STORAGE | no |
TABLE | yes |
TEXT | no |
TIMESTAMP | no |
TIMEUUID | no |
THREE | yes |
TOKEN | yes |
TRUNCATE | yes |
TTL | no |
TWO | yes |
TYPE | no |
UPDATE | yes |
USE | yes |
USING | yes |
UUID | no |
VALUES | no |
VARCHAR | no |
VARINT | no |
WHERE | yes |
WITH | yes |
WRITETIME | no |
Versioning of the CQL language adheres to the Semantic Versioning guidelines. Versions take the form X.Y.Z where X, Y, and Z are integer values representing major, minor, and patch level respectively. There is no correlation between Cassandra release versions and the CQL language version.
version | description |
---|---|
Major | The major version must be bumped when backward incompatible changes are introduced. This should rarely occur. |
Minor | Minor version increments occur when new, but backward compatible, functionality is introduced. |
Patch | The patch version is incremented when bugs are fixed. |
Tue, 24 Apr 2012 15:12:36 +0200 - Sylvain Lebresne * Rework whole doc to target CQL 3 Wed, 12 Oct 2011 16:53:00 -0500 - Paul Cannon * Rework whole doc, adding syntax specifics and additional explanations Fri, 09 Sep 2011 11:43:00 -0500 - Jonathan Ellis * add int data type Wed, 07 Sep 2011 09:01:00 -0500 - Jonathan Ellis * Updated version to 2.0; Documented row-based count() * Updated list of supported data types Wed, 10 Aug 2011 11:22:00 -0500 - Eric Evans * Improved INSERT vs. UPDATE wording. * Documented counter column incr/descr. Sat, 01 Jun 2011 15:58:00 -0600 - Pavel Yaskevich * Updated to support ALTER (CASSANDRA-1709) Tue, 22 Mar 2011 18:10:28 -0700 - Eric Evans <eevans@rackspace.com> * Initial version, 1.0.0