The purpose of the Metron PCAP backend is to create a storm topology capable of ingesting rapidly raw packet capture data directly into HDFS from Kafka.
This component must be fed by fast packet capture components upstream via Kafka. The two supported components shipped with Metron are as follows:
Both of these sensors feed kafka raw packet data directly into Kafka. The format of the record structure that this component expects is the following:
The structure of the topology is extremely simple. In fact, it is a spout-only topology. The Storm Kafka spout is used but extended to allow a callback to be used rather than having a separate bolt.
The following happens as part of this spout for each packet:
The sequence files on HDFS fit the following pattern: $BASE_PATH/pcap_$TOPIC_$TS_$PARTITION_$UUID
where
These files contain a set of packet data with headers on them in sequence files.
The configuration file for the Flux topology is located at $METRON_HOME/config/etc/env/pcap.properties and the possible options are as follows:
To assist in starting the topology, a utility script which takes no arguments has been created to make this very simple. Simply, execute $METRON_HOME/bin/start_pcap_topology.sh.
In order to ensure that data can be read back out, a utility, $METRON_HOME/bin/pcap_inspector.sh has been created to read portions of the sequence files.
usage: PcapInspector -h,--help Generate Help screen -i,--input <SEQ_FILE> Input sequence file on HDFS -n,--num_packets <N> Number of packets to dump
This tool exposes the two methods for filtering PCAP data via a command line tool:
The tool is executed via
${metron_home}/bin/pcap_query.sh [fixed|query]
usage: Fixed filter options -bop,--base_output_path <arg> Query result output path. Default is '/tmp' -bp,--base_path <arg> Base PCAP data path. Default is '/apps/metron/pcap' -da,--ip_dst_addr <arg> Destination IP address -df,--date_format <arg> Date format to use for parsing start_time and end_time. Default is to use time in millis since the epoch. -dp,--ip_dst_port <arg> Destination port -et,--end_time <arg> Packet end time range. Default is current system time. -nr,--num_reducers <arg> The number of reducers to use. Default is 10. -h,--help Display help -ir,--include_reverse Indicates if filter should check swapped src/dest addresses and IPs -p,--protocol <arg> IP Protocol -sa,--ip_src_addr <arg> Source IP address -sp,--ip_src_port <arg> Source port -st,--start_time <arg> (required) Packet start time range.
usage: Query filter options -bop,--base_output_path <arg> Query result output path. Default is '/tmp' -bp,--base_path <arg> Base PCAP data path. Default is '/apps/metron/pcap' -df,--date_format <arg> Date format to use for parsing start_time and end_time. Default is to use time in millis since the epoch. -et,--end_time <arg> Packet end time range. Default is current system time. -nr,--num_reducers <arg> The number of reducers to use. Default is 10. -h,--help Display help -q,--query <arg> Query string to use as a filter -st,--start_time <arg> (required) Packet start time range.