Troubleshoot ZEDEDA Edge Kubernetes Service and ZEDEDA Edge App Flows

Introduction

This section contains commands and tips to troubleshoot the kubernetes solution. It is based on Rancher/Fleet, but is not exactly the same. Specifically, the default namespace is zks-fleet-system

This is a series of articles. You will likely follow them in this order.

  1. ZEDEDA Edge Kubernetes Service and ZEDEDA Edge Kubernetes App Flows Overview 
  2. Create and Manage ZEDEDA Edge Kubernetes Service and App Flows using the API
  3. Create and Manage a ZEDEDA Edge Kubernetes Service Cluster Using the GUI
  4. Manage an App from the ZEDEDA Edge Kubernetes App Flows Marketplace Using the GUI
  5. Manage ZEDEDA Edge Kubernetes App Flows Installed Applications Using the GUI
  6. Create and Manage ZEDEDA Edge Kubernetes App Flows Cluster Groupings Using the GUI 
  7. Create ZEDEDA Edge Kubernetes App Flows GitOps Repositories Using the GUI 
  8. Troubleshoot ZEDEDA Edge Kubernetes Service and App Flows - You are here!

Prerequisites

Options for running kubectl commands 

You have options for running kubectl commands on your downstream clusters. Downstream clusters are where your applications and services are actually deployed and running. 

  1. KubeCtl Shell from ZEDEDA Cloud
  2. Download kubeconfig from ZEDEDA Cloud. Run it similar to EdgeView download:
    kubectl --kubeconfig=/tmp/download/kube-config.yaml get nodes
  3. EdgeView script download plus the following tcp/kube command:
    ./run.edge_node.<1734127774>.edgeview.sh tcp/kube
    This downloads the kube-config.yaml file to your local system where you can run kubectl commands from your bash shell such as:
    kubectl --kubeconfig=/tmp/download/kube-config.yaml get nodes
  4. SSH to the edge node and use interactive kubectl directly by running the following command:
    eve enter kube

The following describes how to use KubeCtl Shell from ZEDEDA Cloud.

  1. Log into the ZEDEDA Cloud GUI (such as https://zedcontrol.YOUR_INSTANCE_NAME.zededa.net).
  2. Go to EDGE Kubernetes > Clusters > YOUR_CLUSTER.
  3. Click KubeCtl Shell in the upper-right corner.
  4. After you see the “Connected” message, run kubectl commands in the shell that displays in the window. 
  5. Kubectl commands will be executed on YOUR_CLUSTER.

Look for Root Causes of Issues

The first things to check when the kubernetes solution state is abnormal would be the following. 

Get nodes

Get a summary of the nodes in your cluster.

Example: 

kubectl get nodes

Example Response:

NAME              STATUS   ROLES                       AGE    VERSION

cshari-asus-nuc   Ready    control-plane,etcd,master   111d   v1.28.5+k3s1

Example with additional information: 

kubectl get nodes -o wide
Example Response:
NAME              STATUS   ROLES                       AGE    VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                  KERNEL-VERSION                  CONTAINER-RUNTIME

cshari-asus-nuc   Ready    control-plane,etcd,master   111d   v1.28.5+k3s1   10.244.244.1   <none>        0.0.0-naiming-terminate-k3s-fix-5708a05c-kubevirt-amd64   6.1.112-linuxkit-63f4d774fbc8   containerd://1.7.11-k3s2

 

The status can indicate the following: 

  • Ready: Node has successfully joined the cluster, all of its necessary services are running, and it is ready to accept and run pods.
  • NotReady: Node is not healthy and cannot accept new pods. For example, the issue could be network, kube, resource exhaustion, etc. 
  • SchedulingDisabled: Node is temporarily marked as unschedulable. For example, the issue could be a reboot or an upgrade.

Find all the pods 

Fetch all the pods from every namespace in your cluster.

Example: 

kubectl get pod -A

Example Response: 

NAMESPACE            NAME                                                READY   STATUS      RESTARTS       AGE

kube-system          coredns-6799fbcd5-x7ttz                             1/1     Running     16 (23h ago)   111d

zks-fleet-system     fleet-agent-5d86d659f7-4pbd9                        1/1     Running     2 (23h ago)    20d

zks-system           dashboard-shell-cvh59                               2/2     Running     0              74s

Take note of the NAMESPACE and pod NAME columns, as you will use them in later commands.

Example with additional information

kubectl get pods -o wide

Example Response

NAME                                                       READY   STATUS    RESTARTS      AGE     IP            NODE              NOMINATED NODE   READINESS GATES

z9ntw86-grafana-5c7d9979c6-tsjqc                           1/1     Running   2 (28h ago)   21d     10.42.0.190   cshari-asus-nuc   <none>           <none>

z9ntw86-grafana-test                                       0/1     Error     0             21d     <none>        cshari-asus-nuc   <none>           <none>

zflskkg-grafana-test                                       0/1     Error     0             22d     <none>        cshari-asus-nuc   <none>           <none>

zpv7ltl-edgeai-inference-business-logic-7c9859565c-vk7j8   1/1     Running   0             3h53m   10.42.0.84    cshari-asus-nuc   <none>           <none>

zpv7ltl-edgeai-inference-openvino-server-d446bbc58-plgrk   1/1     Running   0             3h53m   10.42.0.85    cshari-asus-nuc   <none>           <none>

zrole5i-grafana-test                                       0/1     Error     0             47d     <none>        cshari-asus-nuc   <none>           <none>

ztggro7-grafana-test                                       0/1     Error     0             22d     <none>        cshari-asus-nuc   <none>           <none>

For example, during a node failover scenario you can see the pod move from one node to another. 

The status can indicate the following: 

  • Pending: Pod has been accepted by the Kubernetes cluster, but one or more of its containers has not been created yet. For example, the issue could be network speed, resource constraints, etc.
  • Running: pod has been successfully scheduled to a node, and all of its containers have been created and are running without any fatal errors. 
  • Succeeded: All the containers within the pod have completed their tasks successfully and have terminated. This is the expected final state for pods that run a specific job or task to completion (for example, a batch process or a database migration script).
  • Error: At least one container in the pod has terminated with an error. 
  • CrashLoopBackOff: A container in the pod is repeatedly starting, crashing, and being restarted by Kubernetes. For example, the issue could be an application or configuration error. 
  • Terminating: A pod is in the process of being shut down. For example, the issue could be a deployment update or a manually deleted pod. 
  • Unknown: The state of the pod could not be determined. For example, the issuance could be a network issue or an issue with the node itself. 

Get detailed information about a pod

Get detailed information about a specific pod within a particular namespace, including the pod's current state and history. The Events section is often helpful in diagnosing a problem. 

Example Syntax: 

kubectl describe pod -n <namespace> <pod-name> 

Example Command: 

kubectl describe pod -n zks-fleet-system fleet-agent-5d86d659f7-4pbd9 

Example Response:

Name:             fleet-agent-5d86d659f7-4pbd9

Namespace:        zks-fleet-system

Priority:         0

Service Account:  fleet-agent

Node:             cshari-asus-nuc/10.244.244.1

Start Time:       Wed, 24 Sep 2025 08:18:05 +0000

Labels:           app=fleet-agent

                  pod-template-hash=5d86d659f7

Annotations:      k8s.v1.cni.cncf.io/network-status:

                    [{

                        "name": "cbr0",

                        "interface": "eth0",

                        "ips": [

                            "10.42.0.206"

                        ],

                        "mac": "6e:93:0a:21:5e:fa",

                        "default": true,

                        "dns": {}

                    }]

                  k8s.v1.cni.cncf.io/networks-status:

                    [{

                        "name": "cbr0",

                        "interface": "eth0",

                        "ips": [

                            "10.42.0.206"

                        ],

                        "mac": "6e:93:0a:21:5e:fa",

                        "default": true,

                        "dns": {}

                    }]

Status:           Running

IP:               10.42.0.206

IPs:

  IP:           10.42.0.206

Controlled By:  ReplicaSet/fleet-agent-5d86d659f7

Containers:

  fleet-agent:

    Container ID:  containerd://4d61bd3bc6cabc132bc20100045cd07b745e4b3351c11d615716073001b69366

    Image:         zededa/zks-fleet-agent:v0.12.4

    Image ID:      docker.io/zededa/zks-fleet-agent@sha256:07eebc520444a9ac0afd39637fe42c75c638e66c08657954ef5d9328005b3cdb

    Port:          <none>

    Host Port:     <none>

    Command:

      fleetagent

    State:          Running

      Started:      Mon, 13 Oct 2025 17:29:36 +0000

    Last State:     Terminated

      Reason:       Unknown

      Exit Code:    255

      Started:      Thu, 09 Oct 2025 23:35:06 +0000

      Finished:     Mon, 13 Oct 2025 17:28:41 +0000

    Ready:          True

    Restart Count:  2

    Environment:

      BUNDLEDEPLOYMENT_RECONCILER_WORKERS:  50

      DRIFT_RECONCILER_WORKERS:             50

      NAMESPACE:                            zks-fleet-system (v1:metadata.namespace)

      AGENT_SCOPE:                          

      CHECKIN_INTERVAL:                     15m0s

      CATTLE_ELECTION_LEASE_DURATION:       30s

      CATTLE_ELECTION_RETRY_PERIOD:         10s

      CATTLE_ELECTION_RENEW_DEADLINE:       25s

    Mounts:

      /.kube from kube (rw)

      /tmp from tmp (rw)

      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p9z5v (ro)

Conditions:

  Type              Status

  Initialized       True 

  Ready             True 

  ContainersReady   True 

  PodScheduled      True 

Volumes:

  kube:

    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)

    Medium:     

    SizeLimit:  <unset>

  tmp:

    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)

    Medium:     

    SizeLimit:  <unset>

  kube-api-access-p9z5v:

    Type:                    Projected (a volume that contains injected data from multiple sources)

    TokenExpirationSeconds:  3607

    ConfigMapName:           kube-root-ca.crt

    ConfigMapOptional:       <nil>

    DownwardAPI:             true

QoS Class:                   BestEffort

Node-Selectors:              kubernetes.io/os=linux

Tolerations:                 cattle.io/os=linux:NoSchedule

                             node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events:

  Type     Reason          Age                From     Message

  ----     ------          ----               ----     -------

  Warning  FailedMount     24h (x4 over 24h)  kubelet  MountVolume.SetUp failed for volume "kube-api-access-p9z5v" : object "zks-fleet-system"/"kube-root-ca.crt" not registered

  Normal   SandboxChanged  24h                kubelet  Pod sandbox changed, it will be killed and re-created.

  Normal   AddedInterface  24h                multus   Add eth0 [10.42.0.206/24] from cbr0

  Normal   Pulling         24h                kubelet  Pulling image "zededa/zks-fleet-agent:v0.12.4"

  Normal   Pulled          24h                kubelet  Successfully pulled image "zededa/zks-fleet-agent:v0.12.4" in 1.561s (1.561s including waiting)

  Normal   Created         24h                kubelet  Created container fleet-agent

  Normal   Started         24h                kubelet  Started container fleet-agent

Get the logs of a specific pod

Check the logs of a problematic pod. This command matches the one specific pod in a specific namespace. You can also use the -f flag to see what's happening in a pod in real-time.

Example Syntax: 

kubectl logs -n <namespace> <pod-name> -f

Example Command:

kubectl logs -n zks-fleet-system fleet-agent-5d86d659f7-4pbd9 -f

Example Response:

{"level":"info","ts":"2025-10-14T14:12:12Z","logger":"bundledeployment.update-status","msg":"Status not ready according to nonModified and nonReady","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"z9ntw86","namespace":"cluster-4422adc422c4ddf8-113bf54fcabd86b0-c-gknjd-dd7925507f33"},"namespace":"cluster-4422adc422c4ddf8-113bf54fcabd86b0-c-gknjd-dd7925507f33","name":"z9ntw86","reconcileID":"037b07be-5ae8-442a-8cba-24db0862b99e","nonModified":true,"nonReady":[{"uid":"19c898da-b80f-4df7-b64b-77014f26d22b","kind":"Pod","apiVersion":"v1","namespace":"default","name":"z9ntw86-grafana-test","summary":{"state":"unavailable","transitioning":true,"message":["Pod has completed, but not successfully"]}}]}

I1014 14:14:12.084027       1 reflector.go:556] "Warning: watch ended with error" reflector="pkg/mod/k8s.io/client-go@v0.33.1/tools/cache/reflector.go:285" type="*v1alpha1.BundleDeployment" err="an error on the server (\"unable to decode an event from the watch stream: stream error: stream ID 311; INTERNAL_ERROR; received from peer\") has prevented the request from succeeding"

Get the log from the fleet-agent

Use the label selector approach when you want to check the logs of an application or service (like fleet-agent) without caring about which specific pod instance is running it. You can also use the -f flag to see what's happening in a pod in real-time.

Example syntax:

kubectl logs -l app=fleet-agent -n <namespace> -f

Example: 

kubectl logs -l app=fleet-agent -n zks-fleet-system -f

Example Response:

{"level":"info","ts":"2025-10-14T17:58:36Z","logger":"bundledeployment.update-status","msg":"Status not ready according to nonModified and nonReady","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"z9ntw86","namespace":"cluster-4422adc422c4ddf8-113bf54fcabd86b0-c-gknjd-dd7925507f33"},"namespace":"cluster-4422adc422c4ddf8-113bf54fcabd86b0-c-gknjd-dd7925507f33","name":"z9ntw86","reconcileID":"058e2218-5256-4290-8fb4-bfe2957410ad","nonModified":true,"nonReady":[{"uid":"19c898da-b80f-4df7-b64b-77014f26d22b","kind":"Pod","apiVersion":"v1","namespace":"default","name":"z9ntw86-grafana-test","summary":{"state":"unavailable","transitioning":true,"message":["Pod has completed, but not successfully"]}}]}

I1014 18:01:35.926255       1 reflector.go:556] "Warning: watch ended with error" reflector="pkg/mod/k8s.io/client-go@v0.33.1/tools/cache/reflector.go:285" type="*v1alpha1.BundleDeployment" err="an error on the server (\"unable to decode an event from the watch stream: stream error: stream ID 351; INTERNAL_ERROR; received from peer\") has prevented the request from succeeding"

Collect-info 

You can also use collect-info-sh on each node in the cluster to help capture hardware component logs, network logs, operating system logs, and kubernetes pod logs (persist-kubelog/pods/).   

Next Steps

This is a series of articles. You will likely follow them in this order.

  1. ZEDEDA Edge Kubernetes Service and ZEDEDA Edge Kubernetes App Flows Overview 
  2. Create and Manage ZEDEDA Edge Kubernetes Service and App Flows using the API
  3. Create and Manage a ZEDEDA Edge Kubernetes Service Cluster Using the GUI
  4. Manage an App from the ZEDEDA Edge Kubernetes App Flows Marketplace Using the GUI
  5. Manage ZEDEDA Edge Kubernetes App Flows Installed Applications Using the GUI
  6. Create and Manage ZEDEDA Edge Kubernetes App Flows Cluster Groupings Using the GUI 
  7. Create ZEDEDA Edge Kubernetes App Flows GitOps Repositories Using the GUI 
  8. Troubleshoot ZEDEDA Edge Kubernetes Service and App Flows - You are here!
Was this article helpful?
0 out of 0 found this helpful