Collection Agent configuration reference

Overview Copied

Collection Agent configuration reference contains details about setting up collectors, reporters, workflows and plugins.

Caution
Make sure to read Upgrading Collection Agent, which outlines the breaking changes that may have an impact on your upgrade, before you upgrade to the latest version of Geneos and Collection Agent.

Configuration reference Copied

Configure the Collection Agent using the following YAML configuration reference file:

# Collection Agent Configuration Reference

# Directory containing plugin artifacts. Required.
pluginDirectory: /usr/local/lib/geneos/plugins

# Agent monitoring and self-metrics.
# This section is optional.
monitoring:

  # Optional. Defaults to true.
  enabled: true

  # Health and metrics reporting interval in milliseconds. Defaults to 10 seconds.
  reportingInterval: 10000

  # The agent will listen on an HTTP port so that an external system can probe its health.
  # In Kubernetes, this can be used in conjunction with the readiness/liveness probes.
  # 200 is returned if the agent is started, 500 otherwise.
  healthProbe:

    # Optional. Defaults to true.
    enabled: true

    # HTTP listen port, defaults to 8080.
    listenPort: 8080

  # Agent self metrics.
  selfMetrics:

    # Whether to enable self metric collection (optional, defaults to true).
    enabled: true

    # Dimensions to add to all self metrics from this agent (optional).
    dimensions:
      custom: value

    # Properties to add to all self metrics from this agent (optional).
    properties:
      custom: value

#
# Custom singleton service components for shared use by other components.
#
# Services are:
# - Packaged in plugins.
# - Started in the order as defined in the config, and stopped in the reverse order.
# - Started before all collectors, processors and reporters are started, and stopped after all those components.
#
services:

  # Optional name used in logging.  If omitted, an auto-generated name will be assigned.
- name: myService

  # Fully qualified or simple class name of the collector in a plugin jar.
  className: MyServiceClass

  # Whether the service should be bootstrapped and started.
  enabled: true

  # Services can have custom configuration.
  someSetting: true

#
# A collector creates data points and submits them to a workflow.
#
collectors:

  # Collector type (all collectors are of type 'plugin').
  - type: plugin

    # Optional. Defaults to true.
    enabled: true

    # Optional name used in logging.  If omitted, an auto-generated name will be assigned.
    name: statsd

    # Fully qualified or simple class name of the collector in the plugin jar.
    className: StatsdServer

    # Data point processors applied to data points published from this collector.
    # This optional processing chain allows for manipulating and/or filtering data points prior
    # to workflow publication.  This is the recommended way to perform edge processing, when applicable, so that
    # unneeded data can be dropped before incurring workflow overhead.
    processors:
      # For example, drop all events collected from statsd.  See "workflow -> common -> processors" section for
      # details on each type of processor.
      - type: drop-filter
        matchers:
          - type: kind
            kind: generic-event

    # Additional properties are specific to each collector type.  See plugin's configuration reference for details.
    listenPort: 8125

#
# A reporter receives data points from the workflow and sends them to a remote target.
# At least one reporter must be configured.
#
reporters:

  # Each reporter has these common configuration settings:
  - type: [ logging | tcp | routing | plugin ]

    # Optional. Defaults to true.
    enabled: true

    # Reporter name.  Referenced from a pipeline's 'reporter' setting.
    name: myReporterName

    # Persist all data points sent to this reporter.  It's intended for testing purposes only
    # and is disabled by default.
    recording:
      enabled: false

      # Directory where the recording is saved.  If undefined, a directory with the name of the reporter
      # will be created in the current working directory.
      directory: /var/lib/geneos/recording

      # Maximum number of data points to record.  Recording will stop when capacity is reached.
      # Default value shown.
      capacity: 1000000

    # Optional: Enabled store and forward based reporting.
    # This is typically only used when a reporter is being used as a routing reporter destination
    # or by workflow pipeline(s) in passThrough mode.
    storeAndForward:
      # Mandatory. Root directory for store and forward persisted messages.
      directory: /var/lib/geneos/collection-agent
      # Optional. Store capacity. Defaults to 8192 (8 Ki messages).
      capacity: 8192
      # Optional. Max store file length in bytes. Defaults to 16777216 (16 MiB).
      maxFileLength: 16777216
      # Optional. The maximum number of times to retry a failed data point. Defaults to 3.
      # A negative value (e.g. -1) means retry forever.
      maxRetries: 3
      # Optional. The maximum age in milliseconds (defined as now minus the data point timestamp)
      # of data points to be forwarded.
      # By default, no age limits are applied. A negative age limit implies no limit.
      # An age limit > 0 milliseconds is enforced both on the initial forward attempt
      # and on all subsequent retries.
      maxAges:
        # Age limit of all metrics data points.
        metrics: -1
        # Age limit of all log and event data points.
        logs: -1
        # Age limit of all trace data points.
        traces: -1
        # Age limit of all attribute data points.
        attributes: -1
      # Optional. Switch on advisory file locking. Defaults to false.
      fileLocking: false
      # Optional. Switch on flush-on-write intervals. Defaults to -1.
      # Value has the following meanings:
      #   -1: disable force flush
      #    0: force flush on each write
      #  X>0: force flush at approximately X ms intervals (X can be any positive integer).
      flushOnWrite: -1

  # Logging reporter that simply logs each data point to stdout.  This is intended for testing purposes only.
  - type: logging
    name: logging

    # Log level at which each data point is logged.  Can be: error, info (default), warn, debug or trace.
    level: info

  # TCP reporter that sends data points over a TCP connection.
  - type: tcp
    name: myTcpReporter

    # The TCP server hostname. Default value shown.
    hostname: localhost

    # The TCP server port. Default value shown.
    port: 7137

    # The TCP server connection timeout in milliseconds. Default value shown.
    connectionTimeoutMillis: 10000

    # The TCP server write timeout in milliseconds. Default value shown.
    writeTimeoutMillis: 10000

    # Optional TLS configuration.
    tlsConfig:
      # Whether to enforce server authentication or not. Optional. Defaults to false (server authentication is enforced).
      # If set to true, the trust chain file setting is ignored.
      insecure: false
      # Client trust chain. Optional. Used for authenticating server certificates.
      # The system default is used when not set.
      trustChainFile: /path/to/trust_chain.pem
      # The client key. Optional. Used only for mTLS (client authentication).
      keyFile: /path/to/private_key.pem
      # The client certificate. Optional. Used only for mTLS (client authentication).
      certFile: /path/to/cert_file.pem
      # The list of TLS protocols to enable. Optional. Defaults to TLSv1.3 and TLSv1.2 only.
      protocols: [ TLSv1.3, TLSv1.2 ]

  # Routing reporter which can be used to route data points via 1 of N other reporters based on matching criteria.
  # Note that for other reporters to be valid routing destinations they must:
  # - appear in the configuration before this reporter
  # - have a 'name' attribute by which they can be referenced.
  - type: routing
    # Optional but recommended component name.
    name: router
    # Optional. Route eviction timeout. Default is 2 seconds.
    # The router automatically evicts slow routes from the list of alternatives when routing messages for the duration
    # of the 'route-restoration-timeout'. This is to minimise the impact of slow routes on the system as a whole.
    # It is possible, on a per-route basis, to override this such that specific routes are not evicted regardless of
    # their timing characteristics.
    # A value of 0 disables automatic route eviction completely.
    routeEvictionTimeout: 2000
    # Optional. Timeout for automatically restoring an evicted route. Defaults to 60 seconds.
    routeRestorationTimeout: 60000
    # Optional. Route restoration timeout. Default is 60 seconds.
    # Optional. Do not reject data points that don't match any route. Defaults to false.
    ignoreNoMatchingRoute: false
    # Optional. Routing type: [ first (route only via first matching route) | all (route via all matching routes)].
    # Defaults to 'first'.
    routeType: first
    # List of possible routes.
    # The routes are searched in the order specified for the first match.
    routes:
      # Mandatory. Destination reporter name.
      - reporter: destination-reporter-1
        # Optional. Manually enable/disable this route. Defaults to true.
        enabled: true
        # Optional. Specify that this route is never to be evicted. Defaults to false.
        doNotEvict: false
        # Optional. Set the 'ca_deliver_to_name' property on all data points on all matched routes, which is carried on
        # the wire, that is designed to be used in the multi-hop routing case.
        deliverTo: multihop-reporter
        # Optional. Match condition type: 'any' (logical OR) | 'all' (logical AND) over the list of matchers.
        # Defaults to 'any'.
        match: any
        # Mandatory. List of matchers.
        matchers:
          # Mandatory. The data point field to be matched: [ name | namespace | dimension | property ].
          # If the match type is 'dimension' or 'property' then the 'key' attribute is also required.
          - type: name
            # Matching regular expression.
            pattern: promtest.*
          - type: dimension
            # Mandatory when type is 'dimension' or 'property'. Specifies dimension key containing value to match.
            key: dimension_key
            pattern: dimension_value
      - reporter: destination-reporter-2
        match: any
        matchers:
          - type: name
            pattern: jvm.*
      - reporter: destination-reporter-3
        match: all
        matchers:
          # Will match any data point, to this can be used as a catch-all for any previously unmatched data points.
          - type: name
            pattern: .*
          # Except any that are explicitly excluded (note that match type above is 'all', i.e. logical AND).
          - type: name
            pattern: not_me
            # It is possible to set any matcher to be an exclusive matcher instead of (the default) an inclusive matcher.
            exclude: true
      - reporter: destination-reporter-4
        match: all
        matchers:
          # Match all data points collected by the 'statsd' collector ...
          - type: property
            # The special property 'ca_collector_name' is guaranteed to be available and populated with the name of
            # the CA collector.
            key: ca_collector_name
            pattern: statsd
          # ... Except for internal self-monitoring data points.
          - type: property
            key: ca_collector_name
            # The special collector name 'ca_internal' applies to data points generated internally by the CA itself
            # for self-monitoring.
            pattern: ca_internal
            exclude: true
      - reporter: destination-reporter-5
        match: any
        matchers:
          # Match any data point of the specified types.
          # Valid data types are:
          #   [entity_attribute | counter | gauge | generic_event | generic_histogram | log_event | status_metric |
          #    signal_event | snooze_event | entity_attribute_group_snapshot | open_telemetry_span_group]
          - type: data-type
            pattern: status_metric
          - type: data-type
            pattern: counter
          - type: data-type
            pattern: gauge

  # Example configuration of a multihop router (i.e. the intermediate link) which matches the 'ca_deliver_to_name'
  # property to route.
  - type: routing
    name: multihop-router
    routeType: first
    routes:
      # By convention, if the reporter name matches the value assigned to the 'ca_deliver_to_name' then it is erased
      # before being delivered to the target reporter and therefore will not appear on the wire.
      - reporter: multihop-reporter
        match: all
        matchers:
          - type: property
            key: ca_deliver_to_name
            pattern: multihop-reporter

  # Target reporter corresponding to the final link in a multihop route (can be any type).
  - type: logging
    name: multihop-reporter

  # External/custom reporters are defined using the 'plugin' type.
  - type: plugin
    name: myCustomReporter

    # Fully qualified or simple class name of the reporter in the plugin jar.
    className: CustomReporter

    # Additional properties are specific to each reporter type.  See plugin's configuration reference for details.
    customProp: asdf

#
# Standard workflow settings for controlling the flow of data points from plugins to reporters.
# This section is optional - default settings are used if omitted.
#
# 'workflow' is an alias for 'standardWorkflow', which may also be used.
workflow:
  # Directory to store pipeline persistence.
  # Required only if at least one pipeline uses 'disk' store type.
  # The directory must be writable.
  storeDirectory: /var/lib/geneos/collection-agent

  # Optionally override the default validations applied to all data points before ingesting.
  #
  # Validation runs after all configured processors have been applied and just before writing the data point to the
  # pipeline store.
  #
  # The class must implement the WorkflowDataPointValidator interface and may extend DefaultWorkflowDataPointValidator
  # in order to retain the defaults.
  #
  # The value can be the fully qualified or a simple class name,
  # and must be present in a plugin jar residing in `pluginDirectory`.
  validatorClass: com.example.MyValidator

  # Pipelines.
  #
  # A pipeline exists for each class of data (metrics/logs/events/attributes/traces)
  #
  # Each pipeline is enabled by default if omitted from the configuration.
  #
  # At least one pipeline must be enabled.  A runtime error will occur if a plugin attempts delivery to a pipeline
  # that is not configured.
  #

  # Metrics pipeline.
  metrics:

    # Reporter to which all data points on this pipeline are sent.
    # This property is optional if there is only one reporter configured.  Otherwise the value is required and
    # must correspond to the 'name' of a reporter defined above.
    reporter: logging

    # Optional. Defaults to true.
    enabled: true

    # Optional. Whether internal resources are pooled or not. Defaults to false. Does not apply in pass-through mode.
    # Resource pools consume more static memory but result in less garbage collection and therefore less CPU load.
    pooling: false

    # Number of retries after initial delivery fails.  Defaults to 3.  For infinite retries set to -1.
    # The interval between consecutive retries for the same message increases from 1 second up to 120 seconds.
    maxRetries: 3

    # Optional pass through mode configuration (disabled by default).
    #
    # In pass through mode, there is no buffering between collectors and the pipeline - data points pass through the
    # pipeline on the thread of the collector. This means that collector threads are directly coupled to the behavior of
    # the eventual reporter. The nature of the coupling is determined by whether the reporter is synchronous or
    # asynchronous and whether pipeline retries are enabled. It is therefore possible that a collector thread becomes
    # blocked awaiting reporter completion.
    passThrough:
      # Optional. Defaults to false.
      enabled: false

      # Optional. Enable fire and forget mode (disabled by default) on this pipeline.
      # Only applicable when pass-through mode is enabled and max-retries is 0.
      #
      # This option can be used to achieve higher throughput when best effort reporting (i.e. no failure notifications
      # or retries) is an acceptable tradeoff.
      fireAndForget: false

      # Optional. Only applies when the workflow is in 'pass-through' mode. Defaults to 'parallel'.
      #
      # Defines whether multiple threads are allowed to enter the pipeline concurrently or not.
      # Serial mode can be used in the case of stateful pipeline processors.
      concurrency: [ parallel | serial ]

    # Store settings.
    #
    # Data points are stored either in memory or on disk before delivery to a reporter.
    #
    # If a reporter's target becomes unavailable, data points are queued until either the store is full or
    # the reporter target becomes available again.
    #
    # Plugins are informed when a store becomes full and are free to handle the situation in a way that makes
    # sense for that plugin (i.e. dropping the message if not critical, or waiting for the store to re-open before
    # collecting any more data).
    store:

      # Store type.
      #
      # Permitted values:
      # 'memory':  A circular, fixed-size, in-memory store that provides no persistence.  The oldest data point
      #            is removed when adding to a full store, therefore this store never rejects new data points
      #            and will begin to drop data if a slow reporter cannot keep up.
      #
      # 'disk':    A fixed-size store that is persisted to disk.  Requires the workflow 'storeDirectory' setting
      #            to be configured.
      #
      # For the metrics pipeline, it is recommended (and the default) to use a memory store, as metric data is
      # generally non-critical and loses relevance if delayed.
      #
      type: memory

      # Maximum number of data points to hold before the store is considered full and new data points are rejected.
      # The default capacity for a memory store is 8192 data points and 10,000,000 data points for a disk store.
      capacity: 8192

    # Custom processing of data points on this pipeline.  Processors can manipulate, enrich and/or filter
    # data points before reporting.
    #
    # See the 'common' pipeline for more details.
    processors:
      - type: enrichment
        name: metrics-enricher
        dimemsions:
          custom_dimension: value

  # Logs pipeline.
  logs:
    reporter: logging
    store:
      # For logs, it is recommended (and the default) to use a disk store if data loss is not tolerable.
      type: disk

      # Maximum size (in bytes) of one store file. Only applicable when store type is "disk".
      # The value must be a multiple of 4096.
      # Optional - default value is 128MB for logs and 16MB for events.
      maxFileLength: 134217728

      # Optional. Whether to use file level locking to prevent multiple processes operating over a single persistence
      # store. Defaults to false. Only applicable when the store type is "disk".
      # When this option is set a second collection agent misconfigured to load the same persistence store as an
      # already running collection agent will fail to start.
      # Lock files are cleared on termination except in the case of abrupt termination (e.g. via SIGKILL) in which case
      # there is a possibility of stale lock files remaining beyond process exit. This situation requires manual
      # intervention to remove the lock files, or the disabling of this option, prior to the next instance being started.
      fileLocking: false

      # Optional. Switch on flush-on-write intervals. Defaults to -1. Only applicable when the store type is "disk".
      # Value has the following meanings:
      #   -1: disable force flush
      #    0: force flush on each write
      #  X>0: force flush at approximately X ms intervals (X can be any positive integer).
      flushOnWrite: -1

    # For logs, it is recommended (and the default) to retry infinitely if data loss is not tolerable.
    maxRetries: -1

  # Events pipeline.
  events:
    reporter: logging
    store:
      # For events, it is recommended (and the default) to use a disk store if data loss is not tolerable.
      type: disk

    # For events, it is recommended (and the default) to retry infinitely if data loss is not tolerable.
    maxRetries: -1

  # Attributes pipeline.
  attributes:
    reporter: logging
    store:
      # For attributes, it is recommended (and the default) to use a disk store if data loss is not tolerable.
      type: disk

    # For attributes, it is recommended (and the default) to retry infinitely if data loss is not tolerable.
    maxRetries: -1

  # Open Telemetry traces pipeline.
  # Note: This is special pipeline for use exclusively by the Open Telemetry collector and reporter
  # either directly or via a 'routing' reporter.
  # Compatibility: This pipeline was formerly known as the 'traces' pipeline and that name remains a
  # valid alias for the same pipeline.
  otelTraces:
    reporter: otel-reporter
    store:
      # For traces, it is recommended (and the default) to use a memory store, as data is generally non-critical
      # and loses relevance if delayed.
      type: memory

    # For traces, it is recommended (and the default) to set retries to 3.
    maxRetries: 3

  # Open Telemetry logs pipeline.
  # Note: This is special pipeline for use exclusively by the Open Telemetry collector and reporter
  # either directly or via a 'routing' reporter.
  # DEPRECATED: Prefer using the routing-workflow instead.
  otelLogs:
    reporter: otel-reporter
    store:
      type: disk
    maxRetries: -1

  # Open Telemetry logs pipeline.
  # Note: This is special pipeline for use exclusively by the Open Telemetry collector and reporter
  # either directly or via a 'routing' reporter.
  # DEPRECATED: Prefer using the routing-workflow instead.
  otelMetrics:
    reporter: otel-reporter
    store:
      type: memory
    maxRetries: 3


  # Common pipeline.
  #
  # This is a unique pipeline that only has data-point processors (there is no reporter). The processors are applied
  # to data points on all pipelines, before any pipeline-specific processors are applied.
  common:
    # Optional. As for the 'pooling' option for the individual pipelines.
    # Can be used as a shortcut to apply to all pipelines.
    # Can also be overridden by individual pipelines.
    pooling: false

    # Data-point processors.
    #
    # Processors can manipulate, enrich and/or filter data points before reporting.  They are applied before
    # a data point is saved in the pipeline's store.
    #
    processors:

      # Enrichment processor.  Adds dimensions and/or properties to all data points.
      - type: enrichment

        # Optional. Defaults to true.
        enabled: true

        # Optional name used in logging.  If omitted, an auto-generated name will be assigned.
        name: enricher

        # Whether to overwrite an existing dimension or property with the same name (defaults to false)
        overwrite: false

        # Dimensions to add
        dimensions:
          node_name: ${env:NODE_NAME}

        # Properties to add
        properties:
          prop: value

      # Translation processor.
      #
      # Translates:
      # - data point names
      # - dimension and/or property key/values
      #
      - type: translation

        # Translate data point name via a search and replace operation (optional).
        nameTranslation:
          # The search regular expression.
          # The name is not modified unless it matches this pattern.
          # The pattern may contain group captures which may be reference in the 'replace' pattern.
          search: search-pattern

          # The replace regular expression.
          # May contain group references from the 'search' pattern.
          replace: replace-pattern

        # List of dimension and/or property key/value translators (optional).
        keyValueTranslations:

          # First translator.
          #
          # The source key/value.
          - from:
              # Either 'dimension' or 'property'.
              type: dimension

              # Source dimension or property name.
              name: dim1

              # Optional source value search pattern.
              search: search-pattern

              # Whether or not to delete the source key/value (default is true).
              delete: true

            # The target key/value.
            # If the 'from' specifies 'delete' then the 'to' section may be omitted.
            to:
              # Either 'dimension' or 'property'.
              type: property

              # Target dimension or property name.
              name: prop1

              # Optional target value replace pattern
              replace: replace-pattern

              # Whether or not to overwrite the target if it already exists (default is true).
              overwrite: true

          # Second translator.
          - from:
              type: dimension
              name: dim2
              delete: true
            to:
              type: property
              name: prop2
              overwrite: true

      # Drop filter processor.  Drops data points that match the configured criteria.
      - type: drop-filter

        # One or more match criteria.
        # For a data point to be dropped, all configured criteria must match, otherwise the data point
        # will be forwarded.  If no matchers are configured, all data points will be forwarded.
        matchers:

          # Match by data point name, either exactly or via regex.
          - type: name

            # Exact match
            name: kubernetes_node_cpu_usage

            # Regex match (only one of 'name' or 'namePattern' can be configured)
            namePattern: kubernetes_.*

          # Match by data point dimension key and either an exact value or a regex pattern.
          - type: dimension
            key: namespace

            # Exact value match
            value: finance

            # Regex match (only one of 'value' or 'valuePattern' can be configured)
            valuePattern: ns.*

          # Match by data point property key and either an exact value or a regex pattern.
          - type: property
            key: someProperty

            # Exact value match
            value: someValue

            # Regex match (only one of 'value' or 'valuePattern' can be configured)
            valuePattern: value.*

          # Match by data point type. Value kinds are: [attribute|counter|gauge|generic-event|log-event|histogram]
          - type: kind
            kind: counter

      # Forward filter processor.  Forwards data points that match the configured criteria.
      # This behaves inversely to "drop-filter" above but is configured identically.
      - type: forward-filter

        # One or more match criteria.
        # For a data point to be forwarded, all configured criteria must match, otherwise the data point
        # will be dropped.  If no matchers are configured, all data points will be dropped.
        # See "drop-filter" for details on each type of matcher.
        matchers:
          - type: name
            pattern: myCounter

      # Normalize processor.  Normalizes dimension names for consistency in subsequent processing and reporting.
      - type: normalize

        # Optional name used in logging.  If omitted, an auto-generated name will be assigned.
        name: normalize

        # Dimension normalization settings.
        dimensions:

          # Default overwrite behavior, can be overridden per mapping.  Defaults to false.
          overwrite: false

          # Dimension mappings.
          mappings:

            # Old dimension name.
            - from: project

              # New dimension name.
              to: namespace

              # Whether to overwrite if a dimension already exists with the same name.  Defaults to parent setting.
              overwrite: false

      # Simple statistics processor. Logs simple statistics on data points received on this workflow pipeline.
      - type: statistics

        # Processor name. Recommended to use a name representative of the pipeline on which the processor is configured.
        name: metric-stats

        # Optional. Forward data points as normal or drop them. Default is false (drop).
        forward: false

        # Optional. Reporting interval in milliseconds. Default is 10000ms.
        reportingInterval: 10000

      # External/custom processors are defined using the 'plugin' type.
      - type: plugin

        # Optional name used in logging.  If omitted, an auto-generated name will be assigned.
        name: kube-enricher

        # Fully qualified or simple class name of the processor in the plugin jar.
        className: KubernetesEnricher

        # Additional properties are specific to each processor type.  See plugin's configuration reference for details.
        customProp: abc

# Routing workflow settings for controlling the routing of telemetry data.
# This section is optional and is only required when telemetry routing is in operation.
# Note that even when this workflow is enabled, the standard workflow is *also* required
# when self monitoring is enabled as self monitoring metrics are not supported by this
# workflow.
#
# The routing workflow is an alternate path through the Collection Agent specifically
# designed for routing telemetry data received via the Internal Ingestion service and/or
# the OpenTelemetry Metrics/Logs/Traces services. This workflow differs from the standard
# workflow in (at least) the following ways:
# - The only processing performed is the evaluation of routing rules.
# - Processes telemetry batches in their protobuf formats - no collection normalization is
#   needed.
# - Handles the store and forward aspects for Reporters on their behalf.
# - Only delivers to 'compliant' Reporters (i.e. they have to be explicitly written to be
#   compliant). If non-compliant reporters are set in routing destinations the application
#   will not start.
# - Emits additional self metrics regarding router performance.
routingWorkflow:
  # All destinations are protected by a circuit breaker.
  # Data points are only routed to the circuit break when it is closed.
  # When a failure is detected routing a destination, the breaker trips open
  # and remains open until the timeout expires.
  # Note that reporters with 'storeAndForward' enabled do not have circuit
  # breakers as they are always available for routing regardless of the
  # state of the connection. The only limiting factor is the capacity of
  # its underlying store.
  # For non-storeAndForward destinations, an open circuit breaker results in
  # data points not being directed to them for the duration of the open
  # circuit.
  # Defaults to 10000 ms (10 s).
  autoCloseTimeout: 10000
  # Define all the available routes.
  routes:
    # The reporters for this route.
    # Must correspond to compliant 'routing' reporters.
    # All matching telemetry this route is sent to all reporters.
    - reporters: [ reporter-1, reporter-2, reporter-3 ]
      # Data passes through in protobuf batches (i.e. as it is received).
      # When it is known that all data points in a batch are guaranteed to come
      # from the same source (e.g. from a particular Geneos gateway), then for
      # some conditions it is adequate to test a single data point because it
      # they are all expected to be the same. When possible this allows for
      # optimisation since in order to route the entire batch, only a single
      # batch element to be inspected.
      # The 'scope' refers to whether 'all' (default) data points in the batch
      # just the 'first' data point in the batch is are inspected.
      # With the scope set to 'first', either the entire batch is routed to all
      # destinations or none of it is.
      # With the scope set to 'all', only the selected data points (if any) are routed.
      # Defaults to 'all': Apply routing conditions to each data point in the batch
      # and only route matching ones (potentially dropping non-matching ones).
      scope: all
      # Conditions for this route.
      # - 'any' implies a logical OR of all conditions (routed if any condition matches).
      # - 'all' implies a logical AND of all conditions (routed if all conditions match).
      any:
          # Required. Field specific which individual data point field is used for this condition.
        - field: dimensions
          # Required. Operator to apply in rule condition.
          operator: contains
          # Optional (required for 'dimensions' and 'properties' fields).
          key: service.namespace
          # Required (optional for 'dimensions' and 'properties' fields).
          value: some-namespace

          # Second condition.
          # Data points will be routed to all destinations if they match either condition.
        - field: name
          operator: matches
          value: my-app.*

          # Field reference.
          #
          # The following fields are available:
          # - type: accesses data point type.
          # - name: accesses data point name.
          # - dimensions: accesses data point dimensions / semantic resource attributes (OpenTelemetry).
          # - properties: accesses data point properties / data point attributes (OpenTelemetry).
          # - message: accesses data point (when available, e.g. log_event).
          # - severity: accesses data point severity (when available, e.g. log_event).
          # - value: accesses data point numeric value (when available), e.g. gauge).
          # - status_value: accesses data point string value (when available, e.g. status_metric).
          #
          # The available 'type' values are:
          # - gauge
          # - counter
          # - status_metric
          # - log_event
          # - generic_event
          # - signal_event
          # - snooze_event
          # - audit_event
          # - entity_attribute
          # - entity_group_snapshot
          # - entity_attribute_group_snapshot
          # - otel_gauge
          # - otel_sum
          # - otel_summary
          # - otel_histogram
          # - otel_exp_histogram
          # - otel_log
          # - otel_event
          # - otel_span
          #
          # For the 'log_event' and 'generic_event' types, the available severities are:
          # - none
          # - trace
          # - debug
          # - info
          # - warn
          # - error
          # - critical
          #
          # For the 'signal_event' type, the available severities are:
          # - none
          # - warning
          # - critical
          # - ok
          #
          # For the 'otel_log' type, the available severities correspond to all those defined by OpenTelemetry.
        - field: type
          # Operator reference.
          #
          # The following operators are available:
          # - eq
          # - ne
          # - eq_ignore_case
          # - ne_ignore_case
          # - contains
          # - does_not_contain
          # - starts_with
          # - starts_with_ignore_case
          # - ends_with
          # - ends_with_ignore_case
          # - lt
          # - le
          # - gt
          # - ge
          # - matches ('value' is a regular expression)
          #
          # The meaning of each operator is hopefully self-explanatory.
          # The exact application of each operator depends on the types presented to it
          # and not all operators apply to all field types. The idea is that they should
          # apply 'when it makes sense'.
          #
          # The 'value' for the 'matches' operator is a regular expression corresponding
          # to how the 'value' should be matched.
          operator: eq
          # Key semantics:
          #
          # 'key' need only be specified when the field corresponds to a map value:
          # - dimensions
          # - properties
          #
          # When used with the 'contains' or 'does_not_contain' operators it may be used
          # on its own to determine existence if the key (or not) in the specified map.
          # When used in conjunction with the 'value' it is used to locate the test value
          # which is then used by the operator in evaluation.
          key: key
          # Value semantics:
          #
          # 'value' is always required (except for in the 'contains' and 'does_not_contain'
          # special cases above) and is used to specify the test value.
          # When the operator is 'matches', 'value' is a regular expression.
          value: value

The table below summarizes each of the sections in the configuration reference:

YAML key	Purpose	Required
`pluginDirectory`	Filesystem directory that contains plugin artifacts (collectors, reporters, processors, services).	✓
`monitoring`	Health probe HTTP endpoint, reporting interval, and self-metrics dimensions or properties.	x
`services`	Plugin-backed singleton services that start before collectors and reporters.	x
`collectors`	Collection Agent plugin collectors that create datapoints and submit them to the standard workflow (optionally with per-collector processors).	✓
`reporters`	Reporters that receive datapoints from workflows and send them to a target (TCP, logging, routing, or plugin). At least one enabled reporter must be configured.	✓
`workflow` (`standardWorkflow`)	Standard workflow: per-pipeline stores, retries, pass-through options, `common` processors, and reporter wiring. Configuration is optional, but defaults are applied if omitted (`workflow` is an alias for `standardWorkflow`).	✓
`routingWorkflow`	Alternate routing path for internal ingestion and OpenTelemetry telemetry, only when that mode is used. If self monitoring is enabled, the standard `workflow` is still required in addition.	x

When you set up the Collection Agent for basic monitoring, it is important that you indicate in the YAML configuration where plugins load from (pluginDirectory), what collects metrics and events (collectors), how datapoints are sent onward (reporters), and how the standard workflow ties those pieces together (workflow).

Plugin directory Copied

The pluginDirectory defines the filesystem path to the directory that contains Collection Agent plugin binaries (collectors, reporters, processors, services implemented as plugins).

The Collection Agent loads plugin classes from this location. Without it, the Collection Agent cannot resolve plugin-based collectors and reporters.

You can use environment variable substitution where supported, for example:

pluginDirectory: ${env:CA_PLUGIN_DIR}

Ensure the variable is set in the environment that starts the Java process.

Collectors Copied

The collectors create datapoints and publish them into the standard workflow (subject to optional per-collector processors). In this section, you can configure the type: plugin, className (or fully qualified class name), optional name, enabled, and plugin-specific settings (for example listenPort for StatsD).

Collectors can also attach a processors chain to filter or transform datapoints before workflow publication (for example a drop-filter).

Reporters Copied

The reporters receive datapoints from workflows and send them to a remote target or a local sink. At least one reporter must be configured.

The following are the common type values:

Value	Role
`tcp`	Sends datapoints to a TCP server (typical for Geneos Netprobe). Set `hostname`, `port`, timeouts; optional `tlsConfig` for TLS/mTLS.
`logging`	Logs each datapoint.
`routing`	Chooses one of several other reporters by matchers (name, namespace, dimension, property, data-type, and so on). Destinations must appear earlier in the file and have a `name`. Supports `routeType`: `first` (first match) or `all` (all matches).
`plugin`	Custom reporter class in a plugin JAR (`className`).

Each reporter should have a unique name when more than one exists as workflow pipelines reference reporters by that name.

Optional recording on a reporter can persist datapoints for debugging. storeAndForward enables disk-backed queuing for that reporter.

Workflow (standard workflow) Copied

The standardWorkflow defines how each class of data (metrics, logs, events, attributes, OpenTelemetry pipelines, and the common processor chain) flows from the collectors to a named reporter, including store type (memory or disk), capacity, retries, and optional passThrough or fireAndForget behavior.

storeDirectory — required when any pipeline uses a disk store. The directory must be writable.
Pipelines — each pipeline can set reporter to the reporter name. If you have only one reporter, the reference may infer it. If there are multiple reporters, set reporter explicitly on each pipeline you use.

The common subsection applies processors to datapoints on all pipelines before pipeline-specific processors.

Enabling self-monitoring using the configuration file Copied

The monitoring block in the reference configuration file is optional. When enabled, it controls Collection Agent health probing and self-metrics. The following subsections can be configured:

enabled — defaults to true.
reportingInterval — how often monitoring reports in milliseconds. Default value is 10 seconds.
healthProbe — HTTP endpoint for liveness-style checks (for example in Kubernetes). Returns 200 when the Collection Agent is started, 500 otherwise. Configure the listenPort (default 8080).
selfMetrics — whether the Collection Agent publishes metrics about itself, optional dimensions and properties applied to those metrics.

By default, self-metrics are produced internally and follow the same reporting path as other datapoints following the configured workflow reporters. You can also export self-metrics via OpenTelemetry by:

Setting an environment variable USE_OTEL_INST to any value.
Configuring OpenTelemetry auto instrumentation using its Java Agent.
Ensuring an OpenTelemetry collector is available in the environment to receive the data.

The following are the metrics the Collection Agent records about itself:

Metric	Type	Unit	Dimensions	Description
`ca_uptime`	gauge	milliseconds	—	Agent uptime.
`ca_healthy`	gauge	—	—	`1` if all components are healthy, else `0`.
`ca_workflow_collected`	counter	—	`pipeline`	Datapoints collected on the pipeline.
`ca_workflow_reported`	counter	—	`pipeline`	Datapoints successfully reported.
`ca_workflow_dropped`	counter	—	`pipeline`, `reason`	Dropped due to filtering, full store, or invalid data.
`ca_workflow_failed`	counter	—	`pipeline`	Processing or reporting failures.
`ca_workflow_buffered`	gauge	—	`pipeline`	Datapoints currently buffered in the pipeline store.
`ca_workflow_latency`	histogram	—	`pipeline`	Collection-to-reporting latency.

Previous article Next article

Collection Agent configuration reference

Overview Copied

Configuration reference Copied

Plugin directory Copied

Collectors Copied

Reporters Copied

Workflow (standard workflow) Copied

Enabling self-monitoring using the configuration file Copied

Was this topic helpful?

Your thoughts...

How can we improve this topic?

Your thoughts...

Thank you for your feedback!