Postmortem: Stale App Paths and Gateway Resource Conflicts#
Date: 2026-03-02
Cluster: eu-dev-rancher (Rancher Desktop / k3s)
Trigger: Multiple ArgoCD apps showing Unknown sync or Degraded health after the platform/ → infra/ rename refactor
Summary#
After renaming platform/ to infra/ and environments/*/platform/ to environments/*/apps/ (commit cc7bdf3), five ArgoCD issues emerged:
| App | Symptom | Root Cause |
|---|---|---|
root-eu-dev-rancher | Unknown sync | Bootstrap not re-applied; cluster still had old path |
cert-manager-config | Unknown sync | Child of root; inherited stale state |
gateway-api-crds | Unknown sync | Kustomize remote base unsupported in ArgoCD build env |
gateway-api-config | Unknown sync + Degraded | Old paths + Gateway listener port mismatch |
alpha-svc3 | Degraded | HTTPRoute attached to unprogrammed Gateway |
traefik | OutOfSync cycling | GatewayClass owned by both Helm app and infra kustomization |
Root Cause Detail#
1. Stale Root App Path (Unknown sync cascade)#
The root app (root-eu-dev-rancher) is bootstrapped manually via kubectl apply. When bootstrap.yaml in Git changed path: environments/eu-dev-rancher/platform → path: environments/eu-dev-rancher/apps, the cluster object was never updated because no ArgoCD Application manages the root app itself.
ArgoCD tried to compare live child Applications against the old, now-deleted platform/ path. Since the path didn’t exist, all child apps became Unknown.
Key lesson: The bootstrap/root Application lives outside ArgoCD’s control. Any path change in bootstrap.yaml must be manually re-applied with kubectl apply.
Fix: kubectl apply -f environments/eu-dev-rancher/bootstrap.yaml
2. Gateway Listener Port Mismatch (gateway-api-config Degraded + svc3 Degraded)#
The Gateway resource in environments/eu-dev-rancher/gateway-api/gateway.yaml had port: 80. Traefik’s internal entryPoint web listens on port 8000 (the LoadBalancer maps external 80 → internal 8000). Traefik rejected the Gateway listener with:
| |
Because the Gateway was not programmed, the HTTPRoute for svc3 got Accepted: False / NotAllowedByListeners → Degraded health.
The correct port to specify in a Gateway resource is Traefik’s internal entryPoint port, not the external LoadBalancer port.
Fix: Changed port: 80 → port: 8000 in gateway.yaml.
3. Kustomize Remote Base Unsupported (gateway-api-crds Unknown)#
infra/gateway-api/crds/kustomization.yaml referenced a remote base:
| |
ArgoCD’s bundled kustomize treated this as a local path rather than fetching it remotely, resulting in:
| |
The Gateway API CRDs are already installed in the cluster (Rancher Desktop / Traefik Helm install them). The Application was both broken and redundant.
Fix: Deleted environments/eu-dev-rancher/apps/gateway-api-crds.yaml. CRDs remain installed and are not managed by ArgoCD.
Key lesson: ArgoCD’s kustomize build does not support github.com/... remote bases in this environment. Use local YAML files, Helm charts, or pre-install CRDs outside ArgoCD when remote bases are needed.
4. Shared Resource Conflict - GatewayClass (traefik OutOfSync cycling)#
GatewayClass/traefik was managed by both:
- The
traefikHelm Application (via Helm chart) - The
gateway-api-configApplication (viainfra/gateway-api/kustomization.yaml→gatewayclass.yaml)
This caused a war: each app synced its own version of the resource, immediately making the other OutOfSync. The traefik app was cycling between Synced and OutOfSync every ~30 seconds.
ArgoCD reported: GatewayClass/traefik is part of applications argocd/traefik and gateway-api-config
Fix: Removed gatewayclass.yaml from infra/gateway-api/kustomization.yaml. The Traefik Helm chart is now the sole owner of GatewayClass/traefik.
Key lesson: A single Kubernetes resource must have a single ArgoCD Application as its owner. When a Helm chart creates a resource (GatewayClass, CRDs, etc.), do not also declare that resource in a separate Kustomize or raw-yaml Application.
Timeline#
| Time (UTC) | Event |
|---|---|
| 2026-03-01 ~23:30 | Refactor commit cc7bdf3 renames platform/ → infra/ and apps/; Git paths updated |
| 2026-03-02 15:30 | Issue discovered: 5 apps in Unknown/Degraded state |
| 15:35 | Diagnosed: root app stale, gateway port wrong, kustomize remote base broken, shared GatewayClass |
| 15:36 | Fix 1: gateway.yaml port 80 → 8000, committed and pushed |
| 15:36 | Fix 2: kubectl apply bootstrap.yaml to update root app path in cluster |
| 15:36 | ArgoCD cascade syncs; cert-manager-config, gateway-api-config, svc3 all recover |
| 15:38 | Fix 3 + 4: Remove gateway-api-crds.yaml; remove gatewayclass.yaml from infra kustomization |
| 15:40 | Force sync traefik app to recreate GatewayClass; all apps green |
Commits Applied#
| SHA | Description |
|---|---|
327fca7 | fix: gateway listener port 80 → 8000 to match Traefik internal entryPoint |
e5c24fb | fix: remove broken gateway-api-crds app; stop managing GatewayClass via infra kustomization |
Incident 2: argocd-config Stale “Sync Failed” Operation State#
Symptom#
argocd-config showed a “Sync Failed” badge in the UI. Actual sync and health status were both green (Synced, Healthy).
Root Cause#
At commit e1b7d6b, environments/eu-dev-rancher/argocd/ contained a file called argocd-server-deployment-patch.yaml:
| |
This was a Kustomize strategic merge patch (metadata-only, no spec) left over from an experiment adding a Stakater Reloader annotation. The argocd-config Application sources the directory as plain YAML (no kustomization.yaml), so ArgoCD applied every file as a standalone manifest. Kubernetes rejected the skeleton Deployment:
| |
The file was introduced in 185c3e3 and deleted in cc7bdf3 (Reloader removal). The sync failure happened at e1b7d6b, which sat between those two commits. After cc7bdf3 removed the file, subsequent syncs succeeded - but ArgoCD retained the last failed operation state in its status, causing the “Sync Failed” badge to persist in the UI even though the app was healthy.
Fix#
A fresh forced sync overwrites the stale last-operation state with a successful result:
| |
Key Lessons#
- Kustomize patch files must live inside a directory with a
kustomization.yamlthat declares them aspatches:. In a plain-directory ArgoCD source, every.yamlis treated as a standalone resource - a metadata-only Deployment will be rejected. - ArgoCD retains the last operation state even after the app recovers. A stale “Sync Failed” badge does not always mean the app is currently broken - always check
Sync StatusandHealth Statusalongside the operation state. - To clear a stale failed operation badge, trigger a fresh sync. The new (successful) operation result replaces the old one.
Prevention Checklist#
After any repo rename / restructure:
- Always re-apply
bootstrap.yamlafter changing the root app’spath- the cluster object does not self-update. - Check for split resource ownership - if a Helm chart creates a resource, do not declare it again in a Kustomize manifest. Use
argocd app diffon all apps after structural changes. - Avoid kustomize remote bases in this ArgoCD setup - use local files or Helm charts instead.
- Verify Gateway ports match Traefik’s internal entryPoints - Traefik’s
webentryPoint is8000internally;80is only the external LoadBalancer port. - Run
kubectl get applications -n argocd -o wideafter every structural commit as a quick sanity check. - Never leave Kustomize patch files in a plain-directory ArgoCD source - either add a
kustomization.yamlor remove them when the feature they support is removed.