Introduction
This section contains commands and tips to troubleshoot the kubernetes solution. It is based on Rancher/Fleet, but is not exactly the same. Specifically, the default namespace is zks-fleet-system.
This is a series of articles. You will likely follow them in this order.
- ZEDEDA Edge Kubernetes Service and ZEDEDA Edge Kubernetes App Flows Overview
- Create and Manage ZEDEDA Edge Kubernetes Service and App Flows using the API
- Create and Manage a ZEDEDA Edge Kubernetes Service Cluster Using the GUI
- Manage an App from the ZEDEDA Edge Kubernetes App Flows Marketplace Using the GUI
- Manage ZEDEDA Edge Kubernetes App Flows Installed Applications Using the GUI
- Create and Manage ZEDEDA Edge Kubernetes App Flows Cluster Groupings Using the GUI
- Create ZEDEDA Edge Kubernetes App Flows GitOps Repositories Using the GUI
- Troubleshoot ZEDEDA Edge Kubernetes Service and App Flows - You are here!
Prerequisites
- You have onboarded one or more edge nodes.
- Your edge nodes are online.
- Your edge nodes are running the EVE-k v16.0.0 or greater (EVE-OS KVM is not supported).
- You have either the SysManager or SysAdmin role in your ZEDEDA Cloud enterprise.
- You have already Created a ZEDEDA Edge Kubernetes Service Cluster.
Options for running kubectl commands
You have options for running kubectl commands on your downstream clusters. Downstream clusters are where your applications and services are actually deployed and running.
- KubeCtl Shell from ZEDEDA Cloud
- Download kubeconfig from ZEDEDA Cloud. Run it similar to EdgeView download:
kubectl --kubeconfig=/tmp/download/kube-config.yaml get nodes -
EdgeView script download plus the following tcp/kube command:
./run.edge_node.<1734127774>.edgeview.sh tcp/kube
This downloads the kube-config.yaml file to your local system where you can run kubectl commands from your bash shell such as:kubectl --kubeconfig=/tmp/download/kube-config.yaml get nodes -
SSH to the edge node and use interactive kubectl directly by running the following command:
eve enter kube
The following describes how to use KubeCtl Shell from ZEDEDA Cloud.
- Log into the ZEDEDA Cloud GUI (such as https://zedcontrol.YOUR_INSTANCE_NAME.zededa.net).
- Go to EDGE Kubernetes > Clusters > YOUR_CLUSTER.
- Click KubeCtl Shell in the upper-right corner.
- After you see the “Connected” message, run kubectl commands in the shell that displays in the window.
- Kubectl commands will be executed on YOUR_CLUSTER.
Look for Root Causes of Issues
The first things to check when the kubernetes solution state is abnormal would be the following.
Get nodes
Get a summary of the nodes in your cluster.
Example:
kubectl get nodes
Example Response:
NAME STATUS ROLES AGE VERSION
cshari-asus-nuc Ready control-plane,etcd,master 111d v1.28.5+k3s1
Example with additional information:
kubectl get nodes -o wide
Example Response:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
cshari-asus-nuc Ready control-plane,etcd,master 111d v1.28.5+k3s1 10.244.244.1 <none> 0.0.0-naiming-terminate-k3s-fix-5708a05c-kubevirt-amd64 6.1.112-linuxkit-63f4d774fbc8 containerd://1.7.11-k3s2
The status can indicate the following:
- Ready: Node has successfully joined the cluster, all of its necessary services are running, and it is ready to accept and run pods.
- NotReady: Node is not healthy and cannot accept new pods. For example, the issue could be network, kube, resource exhaustion, etc.
- SchedulingDisabled: Node is temporarily marked as unschedulable. For example, the issue could be a reboot or an upgrade.
Find all the pods
Fetch all the pods from every namespace in your cluster.
Example:
kubectl get pod -A
Example Response:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6799fbcd5-x7ttz 1/1 Running 16 (23h ago) 111d
zks-fleet-system fleet-agent-5d86d659f7-4pbd9 1/1 Running 2 (23h ago) 20d
zks-system dashboard-shell-cvh59 2/2 Running 0 74s
Take note of the NAMESPACE and pod NAME columns, as you will use them in later commands.
Example with additional information:
kubectl get pods -o wide
Example Response:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
z9ntw86-grafana-5c7d9979c6-tsjqc 1/1 Running 2 (28h ago) 21d 10.42.0.190 cshari-asus-nuc <none> <none>
z9ntw86-grafana-test 0/1 Error 0 21d <none> cshari-asus-nuc <none> <none>
zflskkg-grafana-test 0/1 Error 0 22d <none> cshari-asus-nuc <none> <none>
zpv7ltl-edgeai-inference-business-logic-7c9859565c-vk7j8 1/1 Running 0 3h53m 10.42.0.84 cshari-asus-nuc <none> <none>
zpv7ltl-edgeai-inference-openvino-server-d446bbc58-plgrk 1/1 Running 0 3h53m 10.42.0.85 cshari-asus-nuc <none> <none>
zrole5i-grafana-test 0/1 Error 0 47d <none> cshari-asus-nuc <none> <none>
ztggro7-grafana-test 0/1 Error 0 22d <none> cshari-asus-nuc <none> <none>
For example, during a node failover scenario you can see the pod move from one node to another.
The status can indicate the following:
- Pending: Pod has been accepted by the Kubernetes cluster, but one or more of its containers has not been created yet. For example, the issue could be network speed, resource constraints, etc.
- Running: pod has been successfully scheduled to a node, and all of its containers have been created and are running without any fatal errors.
- Succeeded: All the containers within the pod have completed their tasks successfully and have terminated. This is the expected final state for pods that run a specific job or task to completion (for example, a batch process or a database migration script).
- Error: At least one container in the pod has terminated with an error.
- CrashLoopBackOff: A container in the pod is repeatedly starting, crashing, and being restarted by Kubernetes. For example, the issue could be an application or configuration error.
- Terminating: A pod is in the process of being shut down. For example, the issue could be a deployment update or a manually deleted pod.
- Unknown: The state of the pod could not be determined. For example, the issuance could be a network issue or an issue with the node itself.
Get detailed information about a pod
Get detailed information about a specific pod within a particular namespace, including the pod's current state and history. The Events section is often helpful in diagnosing a problem.
Example Syntax:
kubectl describe pod -n <namespace> <pod-name>
Example Command:
kubectl describe pod -n zks-fleet-system fleet-agent-5d86d659f7-4pbd9
Example Response:
Name: fleet-agent-5d86d659f7-4pbd9
Namespace: zks-fleet-system
Priority: 0
Service Account: fleet-agent
Node: cshari-asus-nuc/10.244.244.1
Start Time: Wed, 24 Sep 2025 08:18:05 +0000
Labels: app=fleet-agent
pod-template-hash=5d86d659f7
Annotations: k8s.v1.cni.cncf.io/network-status:
[{
"name": "cbr0",
"interface": "eth0",
"ips": [
"10.42.0.206"
],
"mac": "6e:93:0a:21:5e:fa",
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "cbr0",
"interface": "eth0",
"ips": [
"10.42.0.206"
],
"mac": "6e:93:0a:21:5e:fa",
"default": true,
"dns": {}
}]
Status: Running
IP: 10.42.0.206
IPs:
IP: 10.42.0.206
Controlled By: ReplicaSet/fleet-agent-5d86d659f7
Containers:
fleet-agent:
Container ID: containerd://4d61bd3bc6cabc132bc20100045cd07b745e4b3351c11d615716073001b69366
Image: zededa/zks-fleet-agent:v0.12.4
Image ID: docker.io/zededa/zks-fleet-agent@sha256:07eebc520444a9ac0afd39637fe42c75c638e66c08657954ef5d9328005b3cdb
Port: <none>
Host Port: <none>
Command:
fleetagent
State: Running
Started: Mon, 13 Oct 2025 17:29:36 +0000
Last State: Terminated
Reason: Unknown
Exit Code: 255
Started: Thu, 09 Oct 2025 23:35:06 +0000
Finished: Mon, 13 Oct 2025 17:28:41 +0000
Ready: True
Restart Count: 2
Environment:
BUNDLEDEPLOYMENT_RECONCILER_WORKERS: 50
DRIFT_RECONCILER_WORKERS: 50
NAMESPACE: zks-fleet-system (v1:metadata.namespace)
AGENT_SCOPE:
CHECKIN_INTERVAL: 15m0s
CATTLE_ELECTION_LEASE_DURATION: 30s
CATTLE_ELECTION_RETRY_PERIOD: 10s
CATTLE_ELECTION_RENEW_DEADLINE: 25s
Mounts:
/.kube from kube (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p9z5v (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-p9z5v:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: cattle.io/os=linux:NoSchedule
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 24h (x4 over 24h) kubelet MountVolume.SetUp failed for volume "kube-api-access-p9z5v" : object "zks-fleet-system"/"kube-root-ca.crt" not registered
Normal SandboxChanged 24h kubelet Pod sandbox changed, it will be killed and re-created.
Normal AddedInterface 24h multus Add eth0 [10.42.0.206/24] from cbr0
Normal Pulling 24h kubelet Pulling image "zededa/zks-fleet-agent:v0.12.4"
Normal Pulled 24h kubelet Successfully pulled image "zededa/zks-fleet-agent:v0.12.4" in 1.561s (1.561s including waiting)
Normal Created 24h kubelet Created container fleet-agent
Normal Started 24h kubelet Started container fleet-agent
Get the logs of a specific pod
Check the logs of a problematic pod. This command matches the one specific pod in a specific namespace. You can also use the -f flag to see what's happening in a pod in real-time.
Example Syntax:
kubectl logs -n <namespace> <pod-name> -f
Example Command:
kubectl logs -n zks-fleet-system fleet-agent-5d86d659f7-4pbd9 -f
Example Response:
{"level":"info","ts":"2025-10-14T14:12:12Z","logger":"bundledeployment.update-status","msg":"Status not ready according to nonModified and nonReady","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"z9ntw86","namespace":"cluster-4422adc422c4ddf8-113bf54fcabd86b0-c-gknjd-dd7925507f33"},"namespace":"cluster-4422adc422c4ddf8-113bf54fcabd86b0-c-gknjd-dd7925507f33","name":"z9ntw86","reconcileID":"037b07be-5ae8-442a-8cba-24db0862b99e","nonModified":true,"nonReady":[{"uid":"19c898da-b80f-4df7-b64b-77014f26d22b","kind":"Pod","apiVersion":"v1","namespace":"default","name":"z9ntw86-grafana-test","summary":{"state":"unavailable","transitioning":true,"message":["Pod has completed, but not successfully"]}}]}
I1014 14:14:12.084027 1 reflector.go:556] "Warning: watch ended with error" reflector="pkg/mod/k8s.io/client-go@v0.33.1/tools/cache/reflector.go:285" type="*v1alpha1.BundleDeployment" err="an error on the server (\"unable to decode an event from the watch stream: stream error: stream ID 311; INTERNAL_ERROR; received from peer\") has prevented the request from succeeding"
Get the log from the fleet-agent
Use the label selector approach when you want to check the logs of an application or service (like fleet-agent) without caring about which specific pod instance is running it. You can also use the -f flag to see what's happening in a pod in real-time.
Example syntax:
kubectl logs -l app=fleet-agent -n <namespace> -f
Example:
kubectl logs -l app=fleet-agent -n zks-fleet-system -f
Example Response:
{"level":"info","ts":"2025-10-14T17:58:36Z","logger":"bundledeployment.update-status","msg":"Status not ready according to nonModified and nonReady","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"z9ntw86","namespace":"cluster-4422adc422c4ddf8-113bf54fcabd86b0-c-gknjd-dd7925507f33"},"namespace":"cluster-4422adc422c4ddf8-113bf54fcabd86b0-c-gknjd-dd7925507f33","name":"z9ntw86","reconcileID":"058e2218-5256-4290-8fb4-bfe2957410ad","nonModified":true,"nonReady":[{"uid":"19c898da-b80f-4df7-b64b-77014f26d22b","kind":"Pod","apiVersion":"v1","namespace":"default","name":"z9ntw86-grafana-test","summary":{"state":"unavailable","transitioning":true,"message":["Pod has completed, but not successfully"]}}]}
I1014 18:01:35.926255 1 reflector.go:556] "Warning: watch ended with error" reflector="pkg/mod/k8s.io/client-go@v0.33.1/tools/cache/reflector.go:285" type="*v1alpha1.BundleDeployment" err="an error on the server (\"unable to decode an event from the watch stream: stream error: stream ID 351; INTERNAL_ERROR; received from peer\") has prevented the request from succeeding"
Collect-info
You can also use collect-info-sh on each node in the cluster to help capture hardware component logs, network logs, operating system logs, and kubernetes pod logs (persist-kubelog/pods/).
Next Steps
This is a series of articles. You will likely follow them in this order.
- ZEDEDA Edge Kubernetes Service and ZEDEDA Edge Kubernetes App Flows Overview
- Create and Manage ZEDEDA Edge Kubernetes Service and App Flows using the API
- Create and Manage a ZEDEDA Edge Kubernetes Service Cluster Using the GUI
- Manage an App from the ZEDEDA Edge Kubernetes App Flows Marketplace Using the GUI
- Manage ZEDEDA Edge Kubernetes App Flows Installed Applications Using the GUI
- Create and Manage ZEDEDA Edge Kubernetes App Flows Cluster Groupings Using the GUI
- Create ZEDEDA Edge Kubernetes App Flows GitOps Repositories Using the GUI
- Troubleshoot ZEDEDA Edge Kubernetes Service and App Flows - You are here!