There is a difference between installing monitoring and actually making it useful for an operations team.

This note leaned more toward the second half: not just getting Prometheus into the cluster, but wiring the alerting and service exposure in a way that makes the stack usable outside the cluster too.

1. Install the Base Stack

The initial deployment was the usual kube-prometheus flow:

1
2
3
4
5
6
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus

kubectl apply --server-side -f manifests/setup
kubectl wait --for condition=Established --all CustomResourceDefinition --namespace=monitoring
kubectl apply -f manifests/

Then the monitoring services were exposed:

1
2
kubectl patch svc -n monitoring prometheus-k8s -p '{"spec":{"type":"LoadBalancer"}}'
kubectl patch svc -n monitoring grafana -p '{"spec":{"type":"LoadBalancer"}}'

2. Add Alertmanager Configuration

The note also included the operational step that often gets delayed: actually wiring alert delivery.

The generalized shape looked like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: alerts-config
  namespace: monitoring
spec:
  receivers:
    - name: slack
      slackConfigs:
        - apiURL:
            key: webhook
            name: alerts-secret
          channel: "#alerts"
          sendResolved: true
  route:
    receiver: slack

The original note contained a live webhook secret and internal channel details, so those are intentionally replaced here.

3. Extend the Prometheus RBAC

Like the other monitoring note in this batch, this one needed more permissions than the default cluster role had.

The useful part to preserve is the pattern:

  • check what Prometheus is trying to scrape
  • compare that to the current role
  • add the missing get, list, and sometimes watch verbs for the relevant resources

That is not glamorous work, but it is the sort of thing you end up doing in real clusters.

4. Treat Monitoring as a Real Service

The practical lesson from this note is that monitoring is not “done” when the pods are Running.

It is done when:

  • Prometheus can actually see the resources you care about
  • Grafana is reachable where operators need it
  • Alertmanager is wired to something a human will actually see

That is a different definition of success, and it is usually the more useful one.