Debugging a `504 Gateway Timeout` Through Kubernetes Ingress

This draft comes from a troubleshooting note around an NGINX ingress returning 504 Gateway Timeout for a Kubernetes application. The original note was short, so I treated this as a first-pass draft built around the debugging workflow I followed rather than a fully polished post with every final environment detail.

Even so, the pattern is useful because the core question is common: how do you debug a timeout when the application looks healthy from inside the cluster?

The Symptom

The public endpoint timed out through ingress, even though the application pods seemed to be running.

That immediately narrowed the possible fault lines to:

ingress rule configuration
service wiring
pod readiness
connectivity between ingress and backend
application behavior behind the service

1. Check the Ingress Rule

Start with the ingress object itself:

1
2
kubectl edit ingress -n develop-env app-ingress
kubectl get ingress -n develop-env app-ingress -o yaml

A representative rule looked like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
spec:
  ingressClassName: nginx
  rules:
    - host: tenant1-dev.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: admin-ui
                port:
                  number: 8080

At this stage I wanted to confirm that:

the host matched the request I was testing
the backend service name was correct
the port matched the service definition

2. Check the Service

Then inspect the backend service:

1
kubectl get svc -n develop-env

The goal here is to verify that the service exists, exposes the expected port, and points to the right workload.

3. Check the Pods

Next, confirm the backend pods are actually healthy:

1
2
kubectl get pods -n develop-env
kubectl logs -n develop-env admin-ui-<pod-id> -f

If the pods are not ready, ingress is only showing you the symptom. The real problem is farther inside.

In the original note, the pods were running, which made the timeout more interesting.

4. Test From the Ingress Controller

This is the step I value most in notes like this. Instead of guessing, I check from inside the ingress controller container:

1
kubectl exec -it -n ingress-nginx ingress-nginx-controller-<pod-id> -- bash

Then curl the backend service directly:

1
curl -I http://10.233.42.115:8080

In the original note, that returned HTTP/1.1 200 OK.

That one test tells a lot. If the ingress controller can reach the backend directly and the backend responds successfully, then the timeout is less likely to be basic network reachability and more likely to be:

ingress config mismatch
host or path mismatch
upstream timeout settings
request routing inconsistency

5. Narrow the Fault Line

At this point, the problem space becomes much smaller:

the backend service is alive
the application is responding
ingress can reach the backend IP

So the next place to focus is the actual ingress definition and generated NGINX behavior rather than the application process itself.

This is the kind of transition point I like to record in a work log because it prevents wasted time. Once you know ingress can curl the backend successfully, you stop blaming the wrong layer.

Why This Draft Is Still Useful

The original note did not capture every final conclusion, so I want to be honest about that. But it still contains a strong troubleshooting structure, and that alone is worth documenting:

inspect the ingress
inspect the service
inspect the pods
test from inside the ingress controller
narrow the fault to routing or ingress behavior

That sequence is reusable even when the exact root cause differs from one incident to another.

Closing Thought

Some of the best technical posts start as incomplete notes. What makes them useful is not perfect recall of every environment detail. It is capturing the diagnostic path clearly enough that the next person can reason through the same class of problem faster.

The Symptom#

1. Check the Ingress Rule#

2. Check the Service#

3. Check the Pods#

4. Test From the Ingress Controller#

5. Narrow the Fault Line#

Why This Draft Is Still Useful#

Closing Thought#