For a variety of components (threat intelligence triage and field transformations) we have the need to do simple computation and transformation using the data from messages as variables.
For those purposes, there exists a simple, scaled down DSL created to do simple computation and transformation.
The query language supports the following:
The following keywords need to be single quote escaped in order to be used in Stellar expressions:
not | else | exists | if | then |
and | or | in | == | != |
<= | > | >= | + | - |
< | ? | * | / | , |
Using parens such as: “foo” : “<ok>” requires escaping; “foo”: “'<ok>'”
Below is how the == operator is expected to work:
The != operator is the negation of the above.
Stellar provides the capability to pass lambda expressions to functions which wish to support that layer of indirection. The syntax is:
where
In the core language functions, we support basic functional programming primitives such as
The following is an example query (i.e. a function which returns a boolean) which would be seen possibly in threat triage:
IN_SUBNET( ip, '192.168.0.0/24') or ip in [ '10.0.0.1', '10.0.0.2' ] or exists(is_local)
This evaluates to true precisely when one of the following is true:
The following is an example transformation which might be seen in a field transformation:
TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC'))
For a message with a timestamp and dc field, we want to set the transform the timestamp to an epoch timestamp given a timezone which we will lookup in a separate map, called dc2tz.
This will convert the timestamp field to an epoch timestamp based on the
A microbenchmarking utility is included to assist in executing microbenchmarks for Stellar functions. The utility can be executed via maven using the exec plugin, like so, from the metron-common directory:
mvn -DskipTests clean package && \ mvn exec:java -Dexec.mainClass="org.apache.metron.common.stellar.benchmark.StellarMicrobenchmark" -Dexec.args="..."
where exec.args can be one of the following:
-e,--expressions <FILE> Stellar expressions -h,--help Generate Help screen -n,--num_times <NUM> Number of times to run per expression (after warmup). Default: 1000 -o,--output <FILE> File to write output. -p,--percentiles <NUM> Percentiles to calculate per run. Default: 50.0,75.0,95.0,99.0 -v,--variables <FILE> File containing a JSON Map of variables to use -w,--warmup <NUM> Number of times for warmup per expression. Default: 100
For instance, to run with a set of Stellar expression in file /tmp/expressions.txt:
# simple functions TO_UPPER('casey') TO_LOWER(name) # math functions 1 + 2*(3 + int_num) / 10.0 1.5 + 2*(3 + double_num) / 10.0 # conditionals if ('foo' in ['foo']) OR one == very_nearly_one then 'one' else 'two' 1 + 2*(3 + int_num) / 10.0 #Network funcs DOMAIN_TO_TLD(domain) DOMAIN_REMOVE_SUBDOMAINS(domain)
And variables in file /tmp/variables.json:
{ "name" : "casey", "int_num" : 1, "double_num" : 17.5, "one" : 1, "very_nearly_one" : 1.000001, "domain" : "www.google.com" }
Written to file /tmp/output.txt would be the following command:
mvn -DskipTests clean package && \ mvn exec:java -Dexec.mainClass="org.apache.metron.common.stellar.benchmark.StellarMicrobenchmark" \ -Dexec.args="-e /tmp/expressions.txt -v /tmp/variables.json -o ./output.json"
The Stellar Shell is a REPL (Read Eval Print Loop) for the Stellar language that helps troubleshooting, learning Stellar or even interacting with a live Metron cluster.
The Stellar DSL (domain specific language) is used to act upon streaming data within Apache Storm. It is difficult to troubleshoot Stellar when it can only be executed within a Storm topology. This REPL is intended to help mitigate that problem by allowing a user to replicate data encountered in production, isolate initialization errors, or understand function resolution problems.
The shell supports customization via ~/.inputrc as it is backed by a proper readline implementation.
Shell-like operations are supported such as
Note: Stellar classpath configuration from the global config is honored here if the REPL knows about zookeeper.
To run the Stellar Shell from within a deployed Metron cluster, run the following command on the host where Metron is installed.
$ $METRON_HOME/bin/stellar Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>> %functions BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, ... [Stellar]>>> ?PROTOCOL_TO_NAME PROTOCOL_TO_NAME desc: Convert the IANA protocol number to the protocol name args: IANA Number ret: The protocol name associated with the IANA number. [Stellar]>>> ip.protocol := 6 6 [Stellar]>>> PROTOCOL_TO_NAME(ip.protocol) TCP
$ $METRON_HOME/bin/stellar -h usage: stellar -h,--help Print help -irc,--inputrc <arg> File containing the inputrc if not the default ~/.inputrc -v,--variables <arg> File containing a JSON Map of variables -z,--zookeeper <arg> Zookeeper URL -na,--no_ansi Make the input prompt not use ANSI colors.
Optional
Optionally load a JSON map which contains variable assignments. This is intended to give you the ability to save off a message from Metron and work on it via the REPL.
Optional
Attempts to connect to Zookeeper and read the Metron global configuration. Stellar functions may require the global configuration to work properly. If found, the global configuration values are printed to the console. If specified, then the classpath may be augmented by the paths specified in the stellar config in the global config.
$ $METRON_HOME/bin/stellar -z node1:2181 Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>>
Stellar has no concept of variable assignment. For testing and debugging purposes, it is important to be able to create variables that simulate data contained within incoming messages. The REPL has created a means for a user to perform variable assignment outside of the core Stellar language. This is done via the := operator, such as foo := 1 + 1 would assign the result of the stellar expression 1 + 1 to the variable foo.
[Stellar]>>> foo := 2 + 2 4.0 [Stellar]>>> 2 + 2 4.0
The REPL has a set of magic commands that provide the REPL user with information about the Stellar execution environment. The following magic commands are supported.
This command lists all functions resolvable in the Stellar environment. Stellar searches the classpath for Stellar functions. This can make it difficult in some cases to understand which functions are resolvable.
[Stellar]>>> %functions BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, GET, GET_FIRST, GET_LAST, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP_EXISTS, MAP_GET, MONTH, PROTOCOL_TO_NAME, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR [Stellar]>>>
Lists all variables in the Stellar environment.
Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>> %vars [Stellar]>>> foo := 2 + 2 4.0 [Stellar]>>> %vars foo = 4.0
Returns formatted documentation of the Stellar function. Provides the description of the function along with the expected arguments.
[Stellar]>>> ?BLOOM_ADD BLOOM_ADD desc: Adds an element to the bloom filter passed in args: bloom - The bloom filter, value* - The values to add ret: Bloom Filter [Stellar]>>> ?IS_EMAIL IS_EMAIL desc: Tests if a string is a valid email address args: address - The String to test ret: True if the string is a valid email address and false otherwise. [Stellar]>>>
To run the Stellar Shell directly from the Metron source code, run a command like the following. Ensure that Metron has already been built and installed with mvn clean install -DskipTests.
$ mvn exec:java \ -Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \ -pl metron-platform/metron-enrichment ... Stellar, Go! Please note that functions are loading lazily in the background and will be unavailable until loaded fully. [Stellar]>>> Functions loaded, you may refer to functions now... [Stellar]>>> %functions ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, ENRICHMENT_EXISTS, ENRICHMENT_GET, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GEO_GET, GET, GET_FIRST, GET_LAST, HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH, OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE, PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW, PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR
Changing the project passed to the -pl argument will define which dependencies are included and ultimately which Stellar functions are available within the shell environment.
This can be useful for troubleshooting function resolution problems. The previous example defines which functions are available during Enrichment. For example, to determine which functions are available within the Profiler run the following.
$ mvn exec:java \ -Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \ -pl metron-analytics/metron-profiler ... Stellar, Go! Please note that functions are loading lazily in the background and will be unavailable until loaded fully. [Stellar]>>> Functions loaded, you may refer to functions now... %functions ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GET, GET_FIRST, GET_LAST, HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH, OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE, PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW, PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR
The format of the global enrichment is a JSON String to Object map. This is intended for configuration which is non sensor specific configuration.
This configuration is stored in zookeeper, but looks something like
{ "es.clustername": "metron", "es.ip": "node1", "es.port": "9300", "es.date.format": "yyyy.MM.dd.HH", "parser.error.topic": "indexing" "fieldValidations" : [ { "input" : [ "ip_src_addr", "ip_dst_addr" ], "validation" : "IP", "config" : { "type" : "IPV4" } } ] }
Stellar can be configured in a variety of ways from the global config. In particular, there are three main configuration parameters around configuring Stellar:
If specified, Stellar will use a custom classloader which will wrap the context classloader and allow for the resolution of classes stored in jars not shipped with Metron and stored in a variety of mediums:
This path is a comma separated list of
{ ... "stellar.function.paths" : "hdfs://node1:8020/apps/metron/stellar/metron-management-0.4.0.jar, hdfs://node1:8020/apps/metron/3rdparty/.*.jar" }
Please be aware that this classloader does not reload functions dynamically and the classpath specified here in the global config is read on topology start. A change in classpath, to be picked up, would necessitate a topology restart at the moment
If specified, this defines one or more regular expressions applied to the classes implementing the Stellar function that specify what should be included when searching for Stellar functions.
{ ... "stellar.function.resolver.includes" : "org.apache.metron.*,com.myorg.stellar.*" }
Inside of the global configuration, there is a validation framework in place that enables the validation that messages coming from all parsers are valid. This is done in the form of validation plugins where assertions about fields or whole messages can be made.
The format for this is a fieldValidations field inside of global config. This is associated with an array of field validation objects structured like so:
Configurations should be stored on disk in the following structure starting at $BASE_DIR:
By default, this directory as deployed by the ansible infrastructure is at $METRON_HOME/config/zookeeper
While the configs are stored on disk, they must be loaded into Zookeeper to be used. To this end, there is a utility program to assist in this called $METRON_HOME/bin/zk_load_config.sh
This has the following options:
-f,--force Force operation -h,--help Generate Help screen -i,--input_dir <DIR> The input directory containing the configuration files named like "$source.json" -m,--mode <MODE> The mode of operation: DUMP, PULL, PUSH -o,--output_dir <DIR> The output directory which will store the JSON configuration from Zookeeper -z,--zk_quorum <host:port,[host:port]*> Zookeeper Quorum URL (zk1:port,zk2:port,...)
Usage examples:
Errors generated in Metron topologies are transformed into JSON format and follow this structure:
{ "exception": "java.lang.IllegalStateException: Unable to parse Message: ...", "failed_sensor_type": "bro", "stack": "java.lang.IllegalStateException: Unable to parse Message: ...", "hostname": "node1", "source:type": "error", "raw_message": "{\"http\": {\"ts\":1488809627.000000.31915,\"uid\":\"C9JpSd2vFAWo3mXKz1\", ...", "error_hash": "f7baf053f2d3c801a01d196f40f3468e87eea81788b2567423030100865c5061", "error_type": "parser_error", "message": "Unable to parse Message: {\"http\": {\"ts\":1488809627.000000.31915,\"uid\":\"C9JpSd2vFAWo3mXKz1\", ...", "timestamp": 1488809630698 }
Each topology can be configured to send error messages to a specific Kafka topic. The parser topologies retrieve this setting from the the parser.error.topic setting in the global config:
{ "es.clustername": "metron", "es.ip": "node1", "es.port": "9300", "es.date.format": "yyyy.MM.dd.HH", "parser.error.topic": "indexing" }
Error topics for enrichment and threat intel errors are passed into the enrichment topology as flux properties named enrichment.error.topic and threat.intel.error.topic. These properties can be found in $METRON_HOME/config/enrichment.properties.
The error topic for indexing errors is passed into the indexing topology as a flux property named index.error.topic. This property can be found in either $METRON_HOME/config/elasticsearch.properties or $METRON_HOME/config/solr.properties depending on the search engine selected.
By default all error messages are sent to the indexing topic so that they are indexed and archived, just like other messages. The indexing config for error messages can be found at $METRON_HOME/config/zookeeper/indexing/error.json.