I CQL Statements

This interpreter is compatible with any CQL statement supported by Cassandra. Ex:


    INSERT INTO users(login,name) VALUES('jdoe','John DOE');
    SELECT * FROM users WHERE login='jdoe';
                                

Each statement should be separated by a semi-colon (;).
Multi-line statements as well as multiple statements on the same line are also supported as long as they are separated by a semi-colon. Ex:


    USE spark_demo;

    SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM countries LIMIT 1;

    SELECT *
    FROM artists
    WHERE login='jlennon';
                                

Batch statements are supported and can span multiple lines, as well as DDL(CREATE/ALTER/DROP) statements:


    BEGIN BATCH
        INSERT INTO users(login,name) VALUES('jdoe','John DOE');
        INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC');
    APPLY BATCH;

    CREATE TABLE IF NOT EXISTS test(
        key int PRIMARY KEY,
        value text
    );
                                

CQL statements are case-insensitive (except for column names and values). This means that the following statements are equivalent and valid:


    INSERT INTO users(login,name) VALUES('jdoe','John DOE');
    Insert into users(login,name) vAlues('hsue','Helen SUE');
                                

The complete list of all CQL statements and versions can be found below:

II Comments

It is possible to add comments between statements. Single line comments start with the hash sign (#). Multi-line comments are enclosed between /** and **/. Ex:


    #First comment
    INSERT INTO users(login,name) VALUES('jdoe','John DOE');

    /**
     Multi line
     comments
     **/
    Insert into users(login,name) vAlues('hsue','Helen SUE');
                                

III Syntax Validation

The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors. All CQL-related syntax validation is delegated directly to Cassandra

Most of the time, syntax errors are due to missing semi-colons between statements or typo errors.

I Commands For Discovery

To make schema discovery easier and more interactive, the following commands are supported:

CommandDescription
DESCRIBE CLUSTER; Show the current cluster name and its partitioner
DESCRIBE KEYSPACES; List all existing keyspaces in the cluster and their configuration (replication factor, durable write ...)
DESCRIBE TABLES; List all existing keyspaces in the cluster and for each, all the tables name
DESCRIBE KEYSPACE <keyspace name>; Describe the given keyspace configuration and all its table details (name, columns, ...)
DESCRIBE TABLE (<keyspace name>).<table name>; Describe the given table. If the keyspace is not provided, the current logged in keyspace is used. If there is no logged in keyspace, the default system keyspace is used. If no table is found, an error message is raised
DESCRIBE TYPE (<keyspace name>).<type name>; Describe the given type(UDT). If the keyspace is not provided, the current logged in keyspace is used. If there is no logged in keyspace, the default system keyspace is used. If no type is found, an error message is raised

II Schema Display

The schema objects (cluster, keyspace, table & type) are displayed in a tabular format. There is a drop-down menu on the top left corner to expand objects details. On the top right menu is shown the Icon legend.

Sometimes you want to be able to pass runtime query parameters to your statements. Those parameters are not part of the CQL specs and are specific to the interpreter. Below is the list of all parameters:

Query Parameters

Parameter Syntax Description
Consistency Level @consistency=value Apply the given consistency level to all queries in the paragraph
Serial Consistency Level @serialConsistency=value Apply the given serial consistency level to all queries in the paragraph
Timestamp @timestamp=long value Apply the given timestamp to all queries in the paragraph.
Please note that timestamp value passed directly in CQL statement will override this value
Retry Policy @retryPolicy=value Apply the given retry policy to all queries in the paragraph
Fetch Size @fetchSize=int value Apply the given fetch size to all queries in the paragraph

Some parameters only accept restricted values:

Allowed Values

Parameter Possible Values
Consistency Level ALL, ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM
Serial Consistency Level SERIAL, LOCAL_SERIAL
Timestamp Any long value
Retry Policy DEFAULT, DOWNGRADING_CONSISTENCY, FALLTHROUGH, LOGGING_DEFAULT, LOGGING_DOWNGRADING, LOGGING_FALLTHROUGH
Fetch Size Any integer value

Some example:


    CREATE TABLE IF NOT EXISTS spark_demo.ts(
        key int PRIMARY KEY,
        value text
    );
    TRUNCATE spark_demo.ts;

    # Timestamp in the past
    @timestamp=10

    # Force timestamp directly in the first insert
    INSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100;

    # Select some data to make the clock turn
    SELECT * FROM spark_demo.albums LIMIT 100;

    # Now insert using the timestamp parameter set at the beginning(10)
    INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert');

    # Check for the result. You should see 'first insert'
    SELECT value FROM spark_demo.ts WHERE key=1;
                                

Some remarks about query parameters:

I Syntax


For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values. This interpreter provides 3 commands to handle prepared and bound statements:

  1. @prepare
  2. @bind
  3. @remove_prepared

Example:

    @prepare[statement_name]=...

    @bind[statement_name]=’text’, 1223, ’2015-07-30 12:00:01’, null, true, [‘list_item1’, ’list_item2’]

    @bind[statement_name_with_no_bound_value]

    @remove_prepare[statement_name]

                                

II @prepare


You can use the syntax "@prepare[statement_name]=SELECT ..." to create a prepared statement. The statement_name is mandatory because the interpreter prepares the given statement with the Java driver and saves the generated prepared statement in an internal map, using the provided statement_name as search key.

Please note that this internal prepared statement map is shared with all notebooks and all paragraphs because there is only one instance of the interpreter for Cassandra

If the interpreter encounters many @prepare for the same statement_name (key), only the first statement will be taken into account.

Example:

    @prepare[select]=SELECT * FROM spark_demo.albums LIMIT ?

    @prepare[select]=SELECT * FROM spark_demo.artists LIMIT ?
                                

For the above example, the prepared statement is "SELECT * FROM spark_demo.albums LIMIT ?". "SELECT * FROM spark_demo.artists LIMIT ?" is ignored because an entry already exists in the prepared statements map with the key select.

In the context of Zeppelin, a notebook can be scheduled to be executed at regular interval, thus it is necessary to avoid re-preparing many time the same statement (considered an anti-pattern).

III @bind


Once the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it:


    @bind[select_first]=10
                                

Bound values are not mandatory for the @bind statement. However if you provide bound values, they need to comply to some syntax:
  • String values should be enclosed between simple quotes ( ‘ )
  • Date values should be enclosed between simple quotes ( ‘ ) and respect the formats:
    1. yyyy-MM-dd HH:MM:ss
    2. yyyy-MM-dd HH:MM:ss.SSS
  • null is parsed as-is
  • boolean (true|false) are parsed as-is
  • collection values must follow the standard CQL syntax:
    • list: [‘list_item1’, ’list_item2’, ...]
    • set: {‘set_item1’, ‘set_item2’, …}
    • map: {‘key1’: ‘val1’, ‘key2’: ‘val2’, …}
  • tuple values should be enclosed between parenthesis (see tuple CQL syntax): (‘text’, 123, true)
  • udt values should be enclosed between brackets (see udt CQL syntax): {stree_name: ‘Beverly Hills’, number: 104, zip_code: 90020, state: ‘California’, …}

It is possible to use the @bind statement inside a batch:
    BEGIN BATCH
        @bind[insert_user]='jdoe','John DOE'
        UPDATE users SET age = 27 WHERE login='hsue';
    APPLY BATCH;
                            

IV @remove_prepare


To avoid for a prepared statement to stay forever in the prepared statement map, you can use the @remove_prepare[statement_name] syntax to remove it. Removing a non-existing prepared statement yields no error.

Instead of hard-coding your CQL queries, it is possible to use the mustache syntax ({{ }}) to inject simple value or multiple choices forms.

The syntax for simple parameter is: {{input_Label=default value}}. The default value is mandatory because the first time the paragraph is executed, we launch the CQL query before rendering the form so at least one value should be provided.

The syntax for multiple choices parameter is: {{input_Label=value1 | value2 | … | valueN }}. By default the first choice is used for CQL query the first time the paragraph is executed.

Example:


    #Secondary index on performer style
    SELECT name, country, performer
    FROM spark_demo.performers
    WHERE name='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}'
    AND styles CONTAINS '{{style=Rock}}';

                                

In the above example, the first CQL query will be executed for performer='Sheryl Crow' AND style='Rock'. For subsequent queries, you can change the value directly using the form. Please note that we enclosed the {{ }} block between simple quotes (') because Cassandra expects a String here. We could have also use the {{style='Rock'}} syntax but this time, the value displayed on the form is 'Rock' and not Rock.

It is also possible to use dynamic forms for prepared statements:
@bind[select]=='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}', '{{style=Rock}}'

The Cassandra interpreter comes with some some configuration values for the Java driver:

Interpreter Configuration

Parameter Default Value
cassandra.cluster Test Cluster
cassandra.compression.protocol NONE, possible values: LZ4, SNAPPY
cassandra.credentials.password none
cassandra.credentials.username none
cassandra.hosts localhost
cassandra.interpreter.parallelism 10
cassandra.keyspace system
cassandra.load.balancing.policy DEFAULT, or a FQCN of a custom class
cassandra.max.schema.agreement.wait.second 10
cassandra.native.port 9042
cassandra.pooling.core.connection.per.host.local Protocol V2 and below: 2, V3 and above: 1
cassandra.pooling.core.connection.per.host.remote Protocol V2 and below: 1, V3 and above: 1
cassandra.pooling.heartbeat.interval.seconds 30
cassandra.pooling.idle.timeout.seconds Test Cluster
cassandra.pooling.max.connection.per.host.local Protocol V2 and below: 8, V3 and above: 1
cassandra.pooling.max.connection.per.host.remote Protocol V2 and below: 2, V3 and above: 1
cassandra.pooling.max.request.per.connection.local Protocol V2 and below: 128, V3 and above: 1024
cassandra.pooling.max.request.per.connection.remote Protocol V2 and below: 128, V3 and above: 256
cassandra.pooling.new.connection.threshold.local Protocol V2 and below: 100, V3 and above: 800
cassandra.pooling.new.connection.threshold.remote Protocol V2 and below: 100, V3 and above: 200
cassandra.pooling.pool.timeout.millisecs 5000
cassandra.protocol.version 3
cassandra.query.default.consistency ONE
cassandra.query.default.fetchSize 5000
cassandra.query.default.serial.consistency SERIAL
cassandra.reconnection.policy DEFAULT, or a FQCN of a custom class
cassandra.retry.policy DEFAULT, or a FQCN of a custom class
cassandra.socket.connection.timeout.millisecs 500
cassandra.socket.read.timeout.millisecs 12000
cassandra.socket.tcp.no_delay true
cassandra.speculative.execution.policy DEFAULT, or a FQCN of a custom class

Execution parallelism

It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. Asynchronous execution is only possible when it is possible to return a Future value in the InterpreterResult. It may be an interesting proposal for the Zeppelin project.