For a variety of components (threat intelligence triage and field transformations) we have the need to do simple computation and transformation using the data from messages as variables.
For those purposes, there exists a simple, scaled down DSL created to do simple computation and transformation.
The query language supports the following:
The following keywords need to be single quote escaped in order to be used in Stellar expressions:
not | else | exists | if | then |
and | or | in | == | != |
<= | > | >= | + | - |
< | ? | * | / | , |
Using parens such as: “foo” : “<ok>” requires escaping; “foo”: “'<ok>'”
Below is how the == operator is expected to work:
The != operator is the negation of the above.
The following is an example query (i.e. a function which returns a boolean) which would be seen possibly in threat triage:
IN_SUBNET( ip, '192.168.0.0/24') or ip in [ '10.0.0.1', '10.0.0.2' ] or exists(is_local)
This evaluates to true precisely when one of the following is true:
The following is an example transformation which might be seen in a field transformation:
TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC'))
For a message with a timestamp and dc field, we want to set the transform the timestamp to an epoch timestamp given a timezone which we will lookup in a separate map, called dc2tz.
This will convert the timestamp field to an epoch timestamp based on the
A REPL (Read Eval Print Loop) for the Stellar language that helps in debugging, troubleshooting and learning Stellar. The Stellar DSL (domain specific language) is used to act upon streaming data within Apache Storm. It is difficult to troubleshoot Stellar when it can only be executed within a Storm topology. This REPL is intended to help mitigate that problem by allowing a user to replicate data encountered in production, isolate initialization errors, or understand function resolution problems.
The shell supports customization via ~/.inputrc as it is backed by a proper readline implementation.
Shell-like operations are supported such as
$ $METRON_HOME/bin/stellar Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>> %functions BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, ... [Stellar]>>> ?PROTOCOL_TO_NAME PROTOCOL_TO_NAME desc: Convert the IANA protocol number to the protocol name args: IANA Number ret: The protocol name associated with the IANA number. [Stellar]>>> ip.protocol := 6 6 [Stellar]>>> PROTOCOL_TO_NAME(ip.protocol) TCP
$ $METRON_HOME/bin/stellar -h usage: stellar -h,--help Print help -irc,--inputrc <arg> File containing the inputrc if not the default ~/.inputrc -v,--variables <arg> File containing a JSON Map of variables -z,--zookeeper <arg> Zookeeper URL -na,--no_ansi Make the input prompt not use ANSI colors.
Optional
Optionally load a JSON map which contains variable assignments. This is intended to give you the ability to save off a message from Metron and work on it via the REPL.
Optional
Attempts to connect to Zookeeper and read the Metron global configuration. Stellar functions may require the global configuration to work properly. If found, the global configuration values are printed to the console.
$ $METRON_HOME/bin/stellar -z node1:2181 Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>>
Stellar has no concept of variable assignment. For testing and debugging purposes, it is important to be able to create variables that simulate data contained within incoming messages. The REPL has created a means for a user to perform variable assignment outside of the core Stellar language. This is done via the := operator, such as foo := 1 + 1 would assign the result of the stellar expression 1 + 1 to the variable foo.
[Stellar]>>> foo := 2 + 2 4.0 [Stellar]>>> 2 + 2 4.0
The REPL has a set of magic commands that provide the REPL user with information about the Stellar execution environment. The following magic commands are supported.
This command lists all functions resolvable in the Stellar environment. Stellar searches the classpath for Stellar functions. This can make it difficult in some cases to understand which functions are resolvable.
[Stellar]>>> %functions BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, GET, GET_FIRST, GET_LAST, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP_EXISTS, MAP_GET, MONTH, PROTOCOL_TO_NAME, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR [Stellar]>>>
Lists all variables in the Stellar environment.
Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>> %vars [Stellar]>>> foo := 2 + 2 4.0 [Stellar]>>> %vars foo = 4.0
Returns formatted documentation of the Stellar function. Provides the description of the function along with the expected arguments.
[Stellar]>>> ?BLOOM_ADD BLOOM_ADD desc: Adds an element to the bloom filter passed in args: bloom - The bloom filter, value* - The values to add ret: Bloom Filter [Stellar]>>> ?IS_EMAIL IS_EMAIL desc: Tests if a string is a valid email address args: address - The String to test ret: True if the string is a valid email address and false otherwise. [Stellar]>>>
The format of the global enrichment is a JSON String to Object map. This is intended for configuration which is non sensor specific configuration.
This configuration is stored in zookeeper, but looks something like
{ "es.clustername": "metron", "es.ip": "node1", "es.port": "9300", "es.date.format": "yyyy.MM.dd.HH", "fieldValidations" : [ { "input" : [ "ip_src_addr", "ip_dst_addr" ], "validation" : "IP", "config" : { "type" : "IPV4" } } ] }
Inside of the global configuration, there is a validation framework in place that enables the validation that messages coming from all parsers are valid. This is done in the form of validation plugins where assertions about fields or whole messages can be made.
The format for this is a fieldValidations field inside of global config. This is associated with an array of field validation objects structured like so:
Configurations should be stored on disk in the following structure starting at $BASE_DIR:
By default, this directory as deployed by the ansible infrastructure is at $METRON_HOME/config/zookeeper
While the configs are stored on disk, they must be loaded into Zookeeper to be used. To this end, there is a utility program to assist in this called $METRON_HOME/bin/zk_load_config.sh
This has the following options:
-f,--force Force operation -h,--help Generate Help screen -i,--input_dir <DIR> The input directory containing the configuration files named like "$source.json" -m,--mode <MODE> The mode of operation: DUMP, PULL, PUSH -o,--output_dir <DIR> The output directory which will store the JSON configuration from Zookeeper -z,--zk_quorum <host:port,[host:port]*> Zookeeper Quorum URL (zk1:port,zk2:port,...)
Usage examples: