Appenders

Decanter appenders receive the data from the collectors, and store the data into a storage backend.

Log

The Decanter Log Appender creates a log message for each event received from the collectors.

The decanter-appender-log feature installs the log appender:

karaf@root()> feature:install decanter-appender-log

The log appender doesn’t require any configuration.

Elasticsearch

The Decanter Elasticsearch Appender stores the data (coming from the collectors) into an Elasticsearch instance.

This appender transforms the data as a json object. This json object is stored in the Elasticsearch instance.

It’s probably one of the most interesting way to use Decanter.

The decanter-appender-elasticsearch feature installs the elasticsearch appender:

karaf@root()> feature:install decanter-appender-elasticsearch

This feature installs the elasticsearch appender, especially the etc/org.apache.karaf.decanter.appender.elasticsearch.cfg configuration file containing:

################################################
# Decanter Elasticsearch Appender Configuration
################################################

# Hostname of the elasticsearch instance
host=localhost
# Port number of the elasticsearch instance
port=9300
# Name of the elasticsearch cluster
clusterName=elasticsearch

This file contains the elasticsearch instance connection properties:

  • the host property contains the hostname (or IP address) of the Elasticsearch instance

  • the port property contains the port number of the Elasticsearch instance

  • the clusterName property contains the name of the Elasticsearch cluster where to send the data

Embedding Decanter Elasticsearch

For convenience, Decanter provides an elasticsearch feature starting an embedded Elasticsearch instance:

karaf@root()> feature:install elasticsearch

Thanks to this elasticsearch instance, by default, the decanter-appender-elasticsearch will send the data to this instance.

The feature also installs the etc/elasticsearch.yml configuration file:

###############################################################################
##################### Elasticsearch Decanter Configuration ####################
###############################################################################

# WARNING: change in this configuration file requires a refresh or restart of
# the elasticsearch bundle

################################### Cluster ###################################

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elasticsearch
cluster.routing.schedule: 50ms


#################################### Node #####################################

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
node.name: decanter

# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
#node.master: true
#
# Allow this node to store data (enabled by default):
#
node.data: true

# You can exploit these settings to design advanced cluster topologies.
#
# 1. You want this node to never become a master node, only to hold data.
#    This will be the "workhorse" of your cluster.
#
#node.master: false
#node.data: true
#
# 2. You want this node to only serve as a master: to not store any data and
#    to have free resources. This will be the "coordinator" of your cluster.
#
#node.master: true
#node.data: false
#
# 3. You want this node to be neither master nor data node, but
#    to act as a "search load balancer" (fetching data from nodes,
#    aggregating results, etc.)
#
#node.master: false
#node.data: false

# Use the Cluster Health API [http://localhost:9200/_cluster/health], the
# Node Info API [http://localhost:9200/_nodes] or GUI tools
# such as <http://www.elasticsearch.org/overview/marvel/>,
# <http://github.com/karmi/elasticsearch-paramedic>,
# <http://github.com/lukas-vlcek/bigdesk> and
# <http://mobz.github.com/elasticsearch-head> to inspect the cluster state.

# A node can have generic attributes associated with it, which can later be used
# for customized shard allocation filtering, or allocation awareness. An attribute
# is a simple key value pair, similar to node.key: value, here is an example:
#
#node.rack: rack314

# By default, multiple nodes are allowed to start from the same installation location
# to disable it, set the following:
#node.max_local_storage_nodes: 1


#################################### Index ####################################

# You can set a number of options (such as shard/replica options, mapping
# or analyzer definitions, translog settings, ...) for indices globally,
# in this file.
#
# Note, that it makes more sense to configure index settings specifically for
# a certain index, either when creating it or by using the index templates API.
#
# See <http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html> and
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html>
# for more information.

# Set the number of shards (splits) of an index (5 by default):
#
#index.number_of_shards: 5

# Set the number of replicas (additional copies) of an index (1 by default):
#
#index.number_of_replicas: 1

# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
#index.number_of_shards: 1
#index.number_of_replicas: 0

# These settings directly affect the performance of index and search operations
# in your cluster. Assuming you have enough machines to hold shards and
# replicas, the rule of thumb is:
#
# 1. Having more *shards* enhances the _indexing_ performance and allows to
#    _distribute_ a big index across machines.
# 2. Having more *replicas* enhances the _search_ performance and improves the
#    cluster _availability_.
#
# The "number_of_shards" is a one-time setting for an index.
#
# The "number_of_replicas" can be increased or decreased anytime,
# by using the Index Update Settings API.
#
# Elasticsearch takes care about load balancing, relocating, gathering the
# results from nodes, etc. Experiment with different settings to fine-tune
# your setup.

# Use the Index Status API (<http://localhost:9200/A/_status>) to inspect
# the index status.


#################################### Paths ####################################

# Path to directory containing configuration (this file and logging.yml):
#
#path.conf: /path/to/conf

# Path to directory where to store index data allocated for this node.
#
#path.data: /path/to/data
#
# Can optionally include more than one location, causing data to be striped across
# the locations (a la RAID 0) on a file level, favouring locations with most free
# space on creation. For example:
#
#path.data: /path/to/data1,/path/to/data2
path.data: data

# Path to temporary files:
#
#path.work: /path/to/work

# Path to log files:
#
#path.logs: /path/to/logs

# Path to where plugins are installed:
#
#path.plugins: /path/to/plugins
path.plugins: ${karaf.home}/elasticsearch/plugins

#################################### Plugin ###################################

# If a plugin listed here is not installed for current node, the node will not start.
#
#plugin.mandatory: mapper-attachments,lang-groovy


################################### Memory ####################################

# Elasticsearch performs poorly when JVM starts swapping: you should ensure that
# it _never_ swaps.
#
# Set this property to true to lock the memory:
#
#bootstrap.mlockall: true

# Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set
# to the same value, and that the machine has enough memory to allocate
# for Elasticsearch, leaving enough memory for the operating system itself.
#
# You should also make sure that the Elasticsearch process is allowed to lock
# the memory, eg. by using `ulimit -l unlimited`.


############################## Network And HTTP ###############################

# Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens
# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
# communication. (the range means that if the port is busy, it will automatically
# try the next port).

# Set the bind address specifically (IPv4 or IPv6):
#
#network.bind_host: 192.168.0.1

# Set the address other nodes will use to communicate with this node. If not
# set, it is automatically derived. It must point to an actual IP address.
#
#network.publish_host: 192.168.0.1

# Set both 'bind_host' and 'publish_host':
#
#network.host: 192.168.0.1
network.host: 127.0.0.1

# Set a custom port for the node to node communication (9300 by default):
#
#transport.tcp.port: 9300

# Enable compression for all communication between nodes (disabled by default):
#
#transport.tcp.compress: true

# Set a custom port to listen for HTTP traffic:
#
#http.port: 9200

# Set a custom allowed content length:
#
#http.max_content_length: 100mb

# Enable HTTP:
#
http.enabled: true
http.cors.enabled: true
http.cors.allow-origin: /.*/


################################### Gateway ###################################

# The gateway allows for persisting the cluster state between full cluster
# restarts. Every change to the state (such as adding an index) will be stored
# in the gateway, and when the cluster starts up for the first time,
# it will read its state from the gateway.

# There are several types of gateway implementations. For more information, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html>.

# The default gateway type is the "local" gateway (recommended):
#
#gateway.type: local

# Settings below control how and when to start the initial recovery process on
# a full cluster restart (to reuse as much local data as possible when using shared
# gateway).

# Allow recovery process after N nodes in a cluster are up:
#
#gateway.recover_after_nodes: 1

# Set the timeout to initiate the recovery process, once the N nodes
# from previous setting are up (accepts time value):
#
#gateway.recover_after_time: 5m

# Set how many nodes are expected in this cluster. Once these N nodes
# are up (and recover_after_nodes is met), begin recovery process immediately
# (without waiting for recover_after_time to expire):
#
#gateway.expected_nodes: 2


############################# Recovery Throttling #############################

# These settings allow to control the process of shards allocation between
# nodes during initial recovery, replica allocation, rebalancing,
# or when adding and removing nodes.

# Set the number of concurrent recoveries happening on a node:
#
# 1. During the initial recovery
#
#cluster.routing.allocation.node_initial_primaries_recoveries: 4
#
# 2. During adding/removing nodes, rebalancing, etc
#
#cluster.routing.allocation.node_concurrent_recoveries: 2

# Set to throttle throughput when recovering (eg. 100mb, by default 20mb):
#
#indices.recovery.max_bytes_per_sec: 20mb

# Set to limit the number of open concurrent streams when
# recovering a shard from a peer:
#
#indices.recovery.concurrent_streams: 5


################################## Discovery ##################################

# Discovery infrastructure ensures nodes can be found within a cluster
# and master node is elected. Multicast discovery is the default.

# Set to ensure a node sees N other master eligible nodes to be considered
# operational within the cluster. This should be set to a quorum/majority of
# the master-eligible nodes in the cluster.
#
#discovery.zen.minimum_master_nodes: 1

# Set the time to wait for ping responses from other nodes when discovering.
# Set this option to a higher value on a slow or congested network
# to minimize discovery failures:
#
#discovery.zen.ping.timeout: 3s

# For more information, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html>

# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
#
# 1. Disable multicast discovery (enabled by default):
#
#discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
#    to perform discovery when new nodes (master or data) are started:
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

# EC2 discovery allows to use AWS EC2 API in order to perform discovery.
#
# You have to install the cloud-aws plugin for enabling the EC2 discovery.
#
# For more information, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html>
#
# See <http://elasticsearch.org/tutorials/elasticsearch-on-ec2/>
# for a step-by-step tutorial.

# GCE discovery allows to use Google Compute Engine API in order to perform discovery.
#
# You have to install the cloud-gce plugin for enabling the GCE discovery.
#
# For more information, see <https://github.com/elasticsearch/elasticsearch-cloud-gce>.

# Azure discovery allows to use Azure API in order to perform discovery.
#
# You have to install the cloud-azure plugin for enabling the Azure discovery.
#
# For more information, see <https://github.com/elasticsearch/elasticsearch-cloud-azure>.

################################## Slow Log ##################################

# Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms

################################## GC Logging ################################

#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms

#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s

################################## Security ################################

# Uncomment if you want to enable JSONP as a valid return transport on the
# http server. With this enabled, it may pose a security risk, so disabling
# it unless you need it is recommended (it is disabled by default).
#
#http.jsonp.enable: true

It’s a "standard" elasticsearch configuration file, allowing you to configure the embedded elasticsearch instance.

Warning: if you change the etc/elasticsearch.yml file, you have to restart (with the bundle:restart command) the Decanter elasticsearch bundle in order to load the changes.

The Decanter elasticsearch node also supports loading and override of the settings using a etc/org.apache.karaf.decanter.elasticsearch.cfg configuration file. This file is not provided by default, as it’s used for override of the default settings.

You can override the following elasticsearch properties in this configuration file:

  • cluster.name

  • http.enabled

  • node.data

  • node.name

  • node.master

  • path.data

  • network.host

  • cluster.routing.schedule

  • path.plugins

  • http.cors.enabled

  • http.cors.allow-origin

The advantage of using this file is that the elasticsearch node is automatically restarted in order to reload the settings as soon as you change the cfg file.

Embedding Decanter Kibana

In addition of the embedded elasticsearch instance, Decanter also provides an embedded Kibana instance, containing ready to use Decanter dashboards.

The kibana feature installs the embedded kibana instance:

karaf@root()> feature:install kibana

By default, the kibana instance is available on http://host:8181/kibana.

The Decanter Kibana instance provides ready to use dashboards:

  • Karaf dashboard uses the data harvested by the default JMX collector, and the log collector. Especially, it provides details about the threads, memory, garbage collection, etc.

  • Camel dashboard uses the data harvested by the default JMX collector, or the Camel (JMX) collector. It can also leverage the Camel Tracer collector. It provides details about routes processing time, the failed exchanges, etc. This dashboard requires some tuning (updating the queries to match the route IDs).

  • ActiveMQ dashboard uses the data harvested by the default JMX collector, or the ActiveMQ (JMX) collector. It provides details about the pending queue, the system usage, etc.

  • OperatingSystem dashboard uses the data harvested by the system collector. The default dashboard expects data containing the filesystem usage, and temperature data. It’s just a sample, you have to tune the system collector and adapt this dashboard accordingly.

You can change these dashboards to add new panels, change the existing panels, etc.

Of course, you can create your own dashboards, starting from blank or simple dashboards.

JDBC

The Decanter JDBC appender allows your to store the data (coming from the collectors) into a database.

The Decanter JDBC appender transforms the data as a json string. The appender stores the json string and the timestamp into the database.

The decanter-appender-jdbc feature installs the jdbc appender:

karaf@root()> feature:install decanter-appender-jdbc

This feature also installs the etc/org.apache.karaf.decanter.appender.jdbc.cfg configuration file:

#######################################
# Decanter JDBC Appender Configuration
#######################################

# Name of the JDBC datasource
datasource.name=jdbc/decanter

# Name of the table storing the collected data
table.name=decanter

# Dialect (type of the database)
# The dialect is used to create the table
# Supported dialects are: generic, derby, mysql
# Instead of letting Decanter created the table, you can create the table by your own
dialect=generic

This configuration file allows you to specify the connection to the database:

  • the datasource.name property contains the name of the JDBC datasource to use to connect to the database. You can create this datasource using the Karaf jdbc:create command (provided by the jdbc feature).

  • the table.name property contains the table name in the database. The Decanter JDBC appender automatically creates the table for you, but you can create the table by yourself. The table is simple and contains just two column:

    • timestamp as INTEGER

    • content as VARCHAR or CLOB

  • the dialect property allows you to specify the database type (generic, mysql, derby). This property is only used for the table creation.

JMS

The Decanter JMS appender "forwards" the data (collected by the collectors) to a JMS broker.

The appender sends a JMS Map message to the broker. The Map message contains the harvested data.

The decanter-appender-jms feature installs the JMS appender:

karaf@root()> feature:install decanter-appender-jms

This feature also installs the etc/org.apache.karaf.decanter.appender.jms.cfg configuration file containing:

#####################################
# Decanter JMS Appender Configuration
#####################################

# Name of the JMS connection factory
connection.factory.name=jms/decanter

# Name of the destination
destination.name=decanter

# Type of the destination (queue or topic)
destination.type=queue

# Connection username
# username=

# Connection password
# password=

This configuration file allows you to specify the connection properties to the JMS broker:

  • the connection.factory.name property specifies the JMS connection factory to use. You can create this JMS connection factory using the jms:create command (provided by the jms feature).

  • the destination.name property specifies the JMS destination name where to send the data.

  • the destination.type property specifies the JMS destination type (queue or topic).

  • the username property is optional and specifies the username to connect to the destination.

  • the password property is optional and specifies the username to connect to the destination.

Camel

The Decanter Camel appender sends the data (collected by the collectors) to a Camel endpoint.

It’s a very flexible appender, allowing you to use any Camel route to transform and forward the harvested data.

The Camel appender creates a Camel exchange and set the "in" message body with a Map of the harvested data. The exchange is send to a Camel endpoint.

The decanter-appender-camel feature installs the Camel appender:

karaf@root()> feature:install decanter-appender-camel

This feature also installs the etc/org.apache.karaf.decanter.appender.camel.cfg configuration file containing:

#
# Decanter Camel appender configuration
#

# The destination.uri contains the URI of the Camel endpoint
# where Decanter sends the collected data
destination.uri=direct-vm:decanter

This file allows you to specify the Camel endpoint where to send the data:

  • the destination.uri property specifies the URI of the Camel endpoint where to send the data.

The Camel appender send an exchange. The "in" message body contains a Map of the harvested data.

For instance, in this configuration file, you can specify:

destination.uri=direct-vm:decanter

And you can deploy the following Camel route definition:

<?xml version="1.0" encoding="UTF-8"?>
<blueprint xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0">

  <camelContext xmlns="http://camel.apache.org/schema/blueprint">
    <route id="decanter">
      <from uri="direct-vm:decanter"/>
      ...
      ANYTHING
      ...
    </route>
  </camelContext>

</blueprint>

This route will receive the Map of harvested data. Using the body of the "in" message, you can do what you want:

  • transform and convert to another data format

  • use any Camel EIPs (Enterprise Integration Patterns)

  • send to any Camel endpoint