Longhorn storage problems can look like application problems at first. Pods stay in ContainerCreating, the deployment never becomes healthy, and the application team naturally assumes the application is broken.

This note is about proving when the real problem is volume attachment instead.

1. Reproduce the Problem with a Minimal PVC

The useful part of the original note was not the failing workload name. It was the decision to reproduce the behavior with a minimal PVC and a simple BusyBox pod.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: longhorn
---
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
    - name: my-container
      image: busybox
      command: ["/bin/sh", "-c", "while true; do echo $(date) >> /mnt/data/log; sleep 5; done"]
      volumeMounts:
        - name: my-pvc-volume
          mountPath: /mnt/data
  volumes:
    - name: my-pvc-volume
      persistentVolumeClaim:
        claimName: my-pvc

That is a good troubleshooting habit. Strip the workload down until the storage behavior is the only thing left.

2. Check Events Instead of Guessing

When the pod stuck in ContainerCreating, the next useful command was:

1
kubectl get events -n <namespace>

The key signal in the note was:

1
AttachVolume.Attach failed ... rpc error: code = DeadlineExceeded

That tells you the application is not even at the stage where its own logs matter yet. The volume is not attaching cleanly to the node.

3. Check the Longhorn Volume State Directly

The other good move in the note was checking the Longhorn objects themselves:

1
kubectl get volumes -A

That made it possible to see states like:

  • attaching
  • detached
  • detaching
  • faulted

Once you can line up the stuck pod with a volume stuck in attaching, the story becomes much clearer.

4. Use Force Deletes Carefully

The note also included force deletion of stuck PVC and namespace resources:

1
2
kubectl delete pvc <name> -n <namespace> --grace-period=0 --force
kubectl delete namespace <name> --grace-period=0 --force

That can be useful during cleanup, but I treat it as cleanup, not diagnosis. I still want the evidence first.

Closing Thought

The best part of this note is the troubleshooting shape:

  1. reproduce with a tiny PVC and pod
  2. inspect the pod events
  3. inspect the Longhorn volume state
  4. only then decide whether the problem is application-side or storage-side

That pattern is worth keeping.