ZEDEDA and Splunk Integration

Introduction and Foundational Concepts

This article provides a step-by-step procedure for integrating your ZEDEDA enterprise infrastructure with a Splunk data platform. The purpose of this integration is to establish an automated pipeline for forwarding critical operational data from your ZEDEDA environment directly into your Splunk instance for advanced monitoring, alerting, searching, and dashboarding.

Prerequisites

Supported data for forwarding

The ZEDEDA platform natively supports the forwarding of two primary types of data to a third-party endpoint like Splunk:

  1. Enterprise Events: These are audit and activity logs generated within the ZEDEDA Controller. This includes actions such as user logins, configuration changes, device provisioning, and other significant events that occur at the enterprise level. Forwarding these events provides a comprehensive audit trail and security overview within Splunk.
  2. Edge Application Logs: These are the standard output (stdout) and standard error (stderr) logs generated by the applications running within virtual machines or containers on your deployed edge nodes. Centralizing these logs in Splunk is invaluable for debugging application behavior, monitoring performance across your entire fleet of devices, and correlating application issues with system-level events.

Device-level logs from the EVE-OS itself are not directly forwarded through this mechanism, but a workaround exists by converting specific, critical device logs into events, which can then be forwarded.

High-level system architecture

To effectively configure and troubleshoot this integration, it is crucial to understand the path your data takes from its source to its destination.

  1. Data Generation: An event is generated in the ZEDEDA Controller (for example, a user logs in), or an application on an edge node writes to its log.
  2. Initial Processing (ZEDEDA Controller): The ZEDEDA Controller's internal services (named Ganges and Tigris) receive the raw data. They check if a data stream for that data type (Event or Log) has been configured for forwarding.
  3. Queuing (Apache Kafka): The data is placed onto a specific topic within a message queuing system called Apache Kafka. This acts as a reliable buffer to handle the flow of data.
  4. Streaming Processor (Benthos): An open-source streaming processor named Benthos is subscribed to the Kafka topic. It pulls the data from the queue, formats it, and is responsible for making the outbound connection to your Splunk server.
  5. Data Ingestion (Splunk): The Benthos processor sends the data to your Splunk instance's HTTP Event Collector (HEC), which is the designated endpoint for receiving data over HTTP/S.

Understanding this flow is key because a failure can occur at any stage. This guide will provide tools to verify and debug each major stage of the process.

Configure the Splunk Instance

Before you can configure ZEDEDA to send data, you must first prepare your Splunk instance to receive it. This involves creating a specific data input known as an HTTP Event Collector (HEC) and generating an authentication token.

Note that the 3rd-party navigation could change at any time and get out of date.   

Navigate to data inputs

  1. Log in to your Splunk Web interface with administrative privileges.
  2. In the main navigation bar, locate and click Settings.
  3. Under the DATA category, click Data Inputs.

Create a new HTTP event collector

  1. On the Data Inputs page, locate and click HTTP Event Collector in the list of input types. It is often the first item. 
  2. This will take you to the HTTP Event Collector management page. Click the green New Token button in the upper-right corner to begin the creation wizard.

Configure the HEC token

You will now be presented with a form to configure the new data input. Follow these instructions carefully.

  1. Name: Provide a clear and descriptive name for this input. For example, zededa_enterprise_data. This is for your reference within Splunk.
  2. Source name override (optional): You can leave this blank.
  3. Description (optional): It is highly recommended to add a description, such as HEC for receiving Events and Application Logs from the ZEDEDA production enterprise.
  4. Output Group (optional): You can leave this on the default setting unless you have advanced Splunk routing configurations.
  5. Enable indexer acknowledgment: This is a critical step. DO NOT check this box. The ZEDEDA Benthos streaming processor is not configured to work with Splunk's indexer acknowledgment feature. Enabling this will cause the connection to fail.
  6. Click Next.

Select the destination index

  1. On the Input Settings page, you will be asked to select which Splunk index the incoming data should be stored in.
  2. From the Allowed Indexes list, select the index you wish to use (for example, main, or a custom index you created like zededa_data).
  3. Move your desired index to the Selected Indexes list.
  4. Choose a Default Index from the dropdown. This will be the index used for the data.
  5. Click Review.

Review and create the token

  1. Review all your settings to ensure they are correct.
  2. Click Submit.

Copy the token value

  1. Splunk will now display a screen titled Token created successfully.
  2. A field named Token Value will contain a long string of letters and numbers. This is your unique authentication token.
  3. Copy this entire token value to your clipboard. You will need it in the next major section when configuring the ZEDEDA side of the integration. This token is a secret; treat it like a password.

Your Splunk instance is now ready to receive data from ZEDEDA.

Configuration the ZEDEDA Controller

This part of the process involves two distinct but related steps within the ZEDEDA user interface. You must complete both steps for the integration to function correctly.

Create the third-party integration endpoint

This step defines where ZEDEDA should send the data (your Splunk server) and how it should authenticate (using the token you just created).

  1. Log in to the ZEDEDA Cloud GUI (such as https://zedcontrol.YOUR_CLUSTER_NAME.zededa.net).
  2. On the right top corner, click your profile > Enterprise > Third Party Integrations.
  3. Click Add Integration.
  4. Fill out the form with the following details, explaining each field:
    • Name: Give this integration a unique, descriptive name. This name will be used later, so make it easy to identify. For example: Splunk-Production-Server.
    • Category: Select Data from the dropdown menu.
    • Type: After Data is selected, in the integration type select Splunk
    • Token: Paste the Token Value that you copied from the Splunk HEC configuration in the previous section.
    • Host: Enter the fully qualified domain name (FQDN) or the IP address of your Splunk HEC endpoint. For example, http-inputs-mycompany.splunkcloud.com or 203.0.113.100.
    • Port: This is a crucial field that depends on your Splunk setup.
      • For Splunk Cloud, the standard port is 443.
      • For a self-hosted Splunk instance using a local Docker container for testing, the standard port is 8088.
      • Consult your Splunk administrator for the correct port if you are unsure.
    • Enable TLS: Transport Layer Security encrypts the data in transit.
      • For Splunk Cloud, you must select Yes.
      • For a local test instance, you might select No if you have not configured TLS. Production environments should always use TLS.
    • Skip Verify: This option to skip server certificate verification should generally be Unchecked. You would only select Yes in a lab environment where your Splunk HEC endpoint is using a self-signed certificate that is not trusted by the ZEDEDA controller. For Splunk Cloud, this is not necessary as their certificates are signed by a well-known Certificate Authority (CA).
    • URL Extension: This is the specific path on the Splunk server for the HEC service.
      • The standard value is /services/collector.
      • Some configurations might also work with /services/collector/event.
      • It is critical not to use a path like /services/collector/log, as the ZEDEDA integration sends both events and logs to the same endpoint, and the routing is handled internally.
    • Click Add.

Configure the data streams

Now that you have defined the Splunk endpoint, you must explicitly tell ZEDEDA which data streams (Events and/or Logs) to send to it.

  1. In the ZEDEDA GUI, click your profile > Enterprise > Data Streams.
  2. Click Add Data Streams.
  3. You will see two pre-defined data streams types in the drop-down menu: Events Streaming and Application Logs Streaming.
  4. To enable events streaming:
    • Select Events Streaming.
    • Enter a Name for your data stream.
    • In the Third-Party Integration dropdown, select the name of the Third-Party Integration Endpoint you created (for example, Splunk-Production-Server).
    • Click Add.
  5. To enable application logs streaming:
    • Select Application Logs Streaming.
    • In the Third-Party Integration dropdown, select the name of the Third-Party Integration Endpoint you created (for example, Splunk-Production-Server).
    • Click Add.

CRITICAL NOTE: A common point of failure is completing the third party integration endpoint but forgetting to configure the data streams. If you do not associate a Data Stream with your configured Third-Party Integration, no data will be sent.

After saving, you might see a status of Verified in the UI. This status is misleading. It only confirms that the configuration was saved correctly within the ZEDEDA database; it does not confirm that a successful connection to your Splunk server can be made. The next section details how to perform a true verification.

Verify the Connection Between ZEDEDA and Splunk

To truly verify that your configuration is correct and that ZEDEDA can reach your Splunk server, you must use the ZEDEDA Command-Line Interface (ZCLI). This tool allows you to send a live test probe from your machine, through the ZEDEDA controller, to the Splunk endpoint.

Access and log in to ZCLI

You must have ZCLI installed and configured for your enterprise. 

Run the plugin probe command

The command to perform the test is zcli plugin probe.

  1. Open your terminal or command prompt.
  2. Execute the following command, replacing <plugin-name> with the exact name you gave your integration in Third-Party Integration Endpoint (for example, Splunk-Production-Server):
    #bash
    zcli plugin probe <plugin-name> 
    Example:
    #bash
    zcli plugin probe Splunk-Production-Server

Interpret the probe results

  • A Successful Probe: If the configuration is correct, the command will return a success message and list the data streams associated with that plugin. This confirms that the Host, Port, Token, and TLS settings are all working correctly.
    Response: ok
    Data Streams configured for this plugin:
    - Events Data Stream
    - Application Logs Stream
  • A Failed Probe: If there is a problem with your configuration, the command will return an error. Common errors follow:
    Error: Post "https://your-splunk-host:443/services/collector": remote error: tls: bad certificate
    Error: ... connection refused

    This indicates a problem with the Host, Port, or TLS configuration.

Confirm the probe in splunk

The successful zcli plugin probe command sends a distinct test message to Splunk. You can confirm its arrival to be 100% certain the end-to-end path is open.

  1. Go to your Splunk instance and open the Search & Reporting app.
  2. In the search bar, enter the following query, ensuring you are searching in the correct index and time range:
    "ZCLI plugin probe test"
  3. You should see an event that looks like this:
    #JSON
    {
      "event": "ZCLI plugin probe test by user 'user@company.com' in enterprise 'Your-Enterprise-Name' from IP 'your_ip_address' for third-party splunk plugin 'Splunk-Production-Server'."
    }
  4. Seeing this event in Splunk is the ultimate confirmation that your configuration is correct and the communication channel is open.

Advanced Troubleshooting Guide

This section covers what to do when things do not work as expected.

Scenario 1: The ZCLI plugin probe command fails

If the probe command returns an error, ZCLI includes an automated diagnostic feature.

  1. Run the probe command again, but this time add the --try-fix (or -T) flag.
    #bash
    zcli plugin probe Splunk-Production-Server --try-fix
  2. This command will attempt to connect using common alternative settings. For example, if you configured port 8088 but TLS is required, it will try port 443. If it finds a working combination, it will tell you what the correct settings are.
    [TRYING-FIX] Initial probe failed...
    [TRYING-FIX] Attempting with port 443 and TLS enabled...
    [TRYING-FIX] Success! The following configuration is recommended:
    Host: http-inputs-mycompany.splunkcloud.com
    Port: 443
    Enable TLS: Yes
  3. Use the recommended settings to update your Third-Party Integration configuration in the ZEDEDA UI.

Scenario 2: probe succeeds, but no data arrives in splunk

This is a more complex scenario that indicates the problem lies within the ZEDEDA cloud's internal data pipeline (Kafka to Benthos). Troubleshooting this requires access that is typically limited to ZEDEDA Site Reliability Engineers (SREs) or cluster administrators.

  1. Check for Benthos Errors in Kibana: The first step is to check the logs of the z-benthos Docker container within the ZEDEDA cluster's Kibana instance. Search for errors related to the Splunk output. This can provide clues, such as repeated connection timeouts or HTTP error codes from the Splunk server.
  2. Use the benthos-stream.sh Script for Live Stats: For live, definitive proof of data flow, an SRE can use a special script on the ZEDEDA cluster.
    • Action: The SRE runs a command like ./benthos-stream.sh -c <cluster_name> -s stats.
    • Output: The script queries the Benthos processor and returns live statistics for each data stream. The key metrics to watch are:
      1. input.received: This counter increments every time Benthos pulls a message (an event or a log bundle) from the Kafka queue.
      2. output.sent: This counter increments every time Benthos successfully sends a message to the Splunk endpoint.
      3. output.error: This counter increments on failed send attempts.
      4. output.codes.2xx: This counter specifically tracks successful HTTP 2xx responses from Splunk.
    • Debugging Steps:
      1. The SRE runs the script to get a baseline of the counters.
      2. You generate a test event (for example, by logging out and logging back into the ZEDEDA UI).
      3. The SRE runs the script again.
      4. If input.received and output.sent (or output.codes.2xx) both increase by one, it is absolute proof that the ZEDEDA cloud has successfully processed and sent the data. If the data is still not visible in Splunk, the issue is highly likely on the Splunk side (for example, an incorrect index, a routing rule, or an ingestion issue).
      5. If input.received increases but output.sent does not (and output.error does), it proves the issue is in the final connection from Benthos to Splunk.

This level of debugging provides evidence of where the data flow is stopping. If you encounter this scenario, you will need to engage with ZEDEDA support and provide them with this information.

Was this article helpful?
1 out of 1 found this helpful