This draft comes from a real recovery note I kept while bringing back a broken FusionAuth environment. The original version contained live database credentials, internal service names, and environment-specific details, so I rewrote it here using placeholders and a cleaner sequence.

This was not a greenfield deployment. It was recovery work under pressure, which is exactly why I wanted the note preserved.

What Went Wrong

The service had to be recovered from database dumps, and part of the problem space involved Elasticsearch instability. In the original logs, Elasticsearch components were crashing and the application could not come back cleanly against the existing state.

That meant the recovery plan had to cover both:

  • the application database state
  • the search layer state

Recovery Strategy

The steps I followed were roughly:

  1. Stop the application
  2. Preserve or locate the database dumps
  3. Clean or rebuild Elasticsearch state
  4. Recreate the database state from schema and data dumps
  5. Start FusionAuth again
  6. Watch logs closely until the application fully stabilized

1. Scale Down FusionAuth

Before touching the database or search layer, stop the application:

1
kubectl scale deployment/fusionauth-app -n identity-prod --replicas=0

This avoids the application writing into a half-restored environment.

2. Keep Schema and Data Dumps Separate

One useful detail in the original note was keeping schema and data as separate dump files. That makes the restore process easier to reason about and often easier to repeat.

If you are taking the dumps as part of migration or recovery, a pattern like this works:

1
2
3
4
kubectl exec -n identity-prod -it deployment/postgres-prod -- bash -c \
  "PGPASSWORD='<postgres-admin-password>' pg_dumpall -p 3001 -U postgres_admin \
  -h postgres-prod.identity-prod.svc.cluster.local" \
  | gzip -9 > fusionauth-backup-$(date +%F).tgz

3. Clean Up Elasticsearch State

In the original recovery, Elasticsearch was part of the failure pattern. I first checked pod status and logs to understand whether the cluster was salvageable or needed a clean rebuild:

1
2
kubectl get pods -n identity-prod | grep elasticsearch
kubectl logs -n identity-prod elasticsearch-master-0

One of the recorded problems was memory pressure and repeated crash loops. When search indices are disposable or rebuildable from the source application, clearing them can be part of the recovery path.

If destructive index operations are justified, the workflow looks like this:

1
2
3
4
5
6
7
8
9
curl -XPUT "http://elasticsearch.identity-prod.svc.cluster.local:9200/_cluster/settings" \
  -H 'Content-Type: application/json' \
  -d '{
    "persistent": {
      "action.destructive_requires_name": false
    }
  }'

curl -XDELETE "http://elasticsearch.identity-prod.svc.cluster.local:9200/_all"

That step should only be used when you understand the data impact and know the application can rebuild the missing indices.

4. Reinstall or Reconfigure Elasticsearch If Needed

In this case, the recovery note also captured a scenario where Elasticsearch needed to be reinstalled with more realistic memory settings.

Example pattern:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
helm -n identity-prod delete elasticsearch

helm install -n identity-prod elasticsearch bitnami/elasticsearch \
  --set data.replicaCount=1 \
  --set master.replicaCount=1 \
  --set coordinating.replicaCount=1 \
  --set data.heapSize=5120m \
  --set data.resources.requests.memory=10240m \
  --set master.heapSize=2048m \
  --set master.resources.requests.memory=4096m \
  --set coordinating.heapSize=2048m \
  --set coordinating.resources.requests.memory=4096m

If the cluster requires host-level sysctl tuning, apply that before expecting the pods to stay healthy:

1
2
3
4
5
6
sudo tee -a /etc/sysctl.conf >/dev/null <<'EOF'
vm.max_map_count=262144
fs.file-max=65536
EOF

sudo sysctl -p

5. Recreate the Database

Once the search layer is either clean or stable enough, restore the application database.

A representative sequence is:

1
kubectl exec -n identity-prod -it deployment/postgres-prod -- bash

Inside the database container or against the database service:

1
2
PGPASSWORD='<postgres-admin-password>' dropdb -p 3001 -U postgres_admin -h postgres-prod fusionauth
PGPASSWORD='<postgres-admin-password>' psql -p 3001 -U postgres_admin -h postgres-prod

Then recreate the database and role as needed:

1
2
3
4
CREATE DATABASE fusionauth;
CREATE ROLE fusionauth_user WITH LOGIN PASSWORD '<fusionauth-db-password>';
GRANT ALL PRIVILEGES ON DATABASE fusionauth TO fusionauth_user;
ALTER DATABASE fusionauth OWNER TO fusionauth_user;

Restore schema and data separately:

1
2
PGPASSWORD='<postgres-admin-password>' psql -p 3001 -U postgres_admin -h postgres-prod -d fusionauth < fusionauth_schemaonly.dmp
PGPASSWORD='<postgres-admin-password>' psql -p 3001 -U postgres_admin -h postgres-prod -d fusionauth < fusionauth_dataonly.dmp

6. Start FusionAuth Again

After the database and search layer are ready, start the application back up:

1
kubectl scale deployment/fusionauth-app -n identity-prod --replicas=1

Then watch the logs closely:

1
kubectl logs deployment/fusionauth-app -n identity-prod

What I wanted to see during recovery was:

  • the application loading its config cleanly
  • database version checks passing
  • search connections succeeding
  • missing indices being recreated instead of hard failing

Why This Was Worth Writing Down

This kind of note is useful because outage recovery is rarely one command. It is sequencing work. If you do the right actions in the wrong order, recovery takes longer or gets messier.

For me, the value of the note was not just the commands themselves. It was the operational pattern:

  • stop writes first
  • understand which layers are corrupted or disposable
  • restore the durable state
  • bring services back in a controlled order

Closing Thought

This is exactly the kind of work log I like to keep. It captures what happened, what failed, what I changed, and how I got the service back. Once sanitized, it also becomes a useful reference for other admins dealing with stateful application recovery in Kubernetes.