Skip to main content
  1. Posts/
  2. Learning ArgoCD/
  3. Platform & Infrastructure/

Sync Waves: Cluster-Complete Bootstrap

·1569 words·8 mins
Ravi Singh
Author
Ravi Singh
Software engineer with 15+ years building backend systems and cloud platforms across fintech, automotive, and academia. I write about the things I build, debug, and learn — so I don’t forget them.
Learning ArgoCD - This article is part of a series.
Part 2: This Article

Sync Waves: Cluster-Complete Bootstrap
#

What This Covers
#

How to bootstrap a fully self-contained cluster environment - cert-manager, Traefik, ArgoCD ingress, and services - using ArgoCD sync waves, with a single kubectl apply as the only manual step after ArgoCD itself is installed.


What Are Sync Waves?
#

ArgoCD processes resources in a sync operation in wave order. Each wave must reach Healthy before the next wave starts.

You set a wave with an annotation:

1
2
3
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "2"

Waves are integers. Lower = earlier. Default = wave 0. Negative values are valid (useful for CRD-installing resources that must precede wave 0).

Key rule: ArgoCD waits for all resources in wave N to be Synced + Healthy before processing wave N+1. For Application CRDs (which are themselves ArgoCD resources), an Application is Healthy when its child resources are all healthy.

This means sync waves on Application manifests give you cluster-level dependency ordering - not just resource ordering within a single app.


Folder Structure Philosophy
#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
environments/eu-dev-rancher/
  bootstrap.yaml          # Root Application - the single kubectl apply
  platform/               # Orchestration layer: Application + ApplicationSet CRDs only
    cert-manager.yaml         # Wave 0
    cert-manager-config.yaml  # Wave 1
    traefik.yaml              # Wave 2
    argocd-config.yaml        # Wave 3
    appset.yaml               # Wave 4
  argocd/                 # Cluster-specific ArgoCD manifests
    argocd-ingress.yaml
  services/               # Service values files, auto-discovered by AppSet
    svc1.yaml
    svc2.yaml
  observability/          # Future: OTel, Prometheus manifests

platform/ = orchestration only. It contains Application and ApplicationSet CRDs - nothing else. Each Application CRD points to a subfolder (config/cert-manager, environments/eu-dev-rancher/argocd, etc.) that holds the actual manifests.

This separation means:

  • You can read platform/ to understand the full cluster topology at a glance.
  • Adding a new component = one new platform/otel.yaml (the CRD) + one new subfolder (the manifests). The two concerns never mix.

Extensibility example: Adding OTel later means adding platform/otel.yaml (wave 5, points to environments/eu-dev-rancher/observability/) and the actual OTel manifests in observability/. The bootstrap.yaml never changes.


Why AppSet Lives in platform/
#

The ApplicationSet is platform team policy, not a dev team concern:

  1. Ownership: The AppSet defines what Helm chart all services use, what namespace they land in, what sync policy they get, what labels they carry. Dev teams only add a values file to services/ - they never touch the AppSet.

  2. Policy enforcement: The AppSet is the contract between platform and dev teams. “Bring us a values file and we deploy it according to this template.” Keeping it in platform/ makes it as governed as cert-manager or Traefik.

  3. Wave ordering: The AppSet (wave 4) must run after cert-manager (waves 0–1) and Traefik (wave 2) are healthy. Services that boot before their ingress controller or CA issuer exists will fail TLS issuance and endpoint health checks.


Wave Ordering
#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
kubectl apply -f environments/eu-dev-rancher/bootstrap.yaml
root-eu-dev-rancher watches environments/eu-dev-rancher/platform/
       ├── Wave 0: cert-manager
       │     Helm installs operator + CRDs → namespace: cert-manager
       │     Waits: all cert-manager pods Running + Healthy
       ├── Wave 1: cert-manager-config
       │     Points to config/cert-manager/ → selfsigned issuer → CA cert → CA issuer
       │     May retry briefly (cert-manager webhook warmup); selfHeal handles it
       │     Waits: ClusterIssuers READY=True
       ├── Wave 2: traefik
       │     Helm installs Traefik v3 → namespace: ingress
       │     Waits: Traefik pods Running, LoadBalancer IP/port assigned
       ├── Wave 3: argocd-config
       │     Applies environments/eu-dev-rancher/argocd/ → ArgoCD ingress with TLS
       │     Hostname: argocd.eu-dev-rancher.ravikrs.local
       │     Waits: argocd-config Application Synced + Healthy
       └── Wave 4: appset
             Generates alpha-svc1-eu-dev-rancher and alpha-svc2-eu-dev-rancher
             Services deploy with TLS via local-ca-issuer → namespace: alpha-dev

Namespace Decisions
#

ComponentNamespaceRationale
ArgoCDargocdStandard
cert-managercert-managerStandard
TraefikingressGroups all ingress infra; descriptive
Services (svc1/2)alpha-devTeam/env scoped

Why bootstrap.yaml sets destination.namespace: argocd: The resources it deploys are Application and ApplicationSet CRDs, which must live in the argocd namespace by ArgoCD convention. The child Applications each deploy their workloads into their own namespaces (cert-manager, ingress, alpha-dev, etc.).


One-Time Manual Steps
#

These run once per cluster. After step 5, ArgoCD takes over - no further kubectl apply is needed for platform components or services.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 1. Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd \
  -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
kubectl wait --for=condition=Ready pods --all -n argocd --timeout=180s

# 2. Apply the GitHub PAT credential
#    (bootstrap/ is gitignored - fill in the PAT in the file first)
kubectl apply -f bootstrap/repo-secret-eu-dev-rancher.yaml

# 3. Bootstrap - everything else is GitOps from here
kubectl apply -f environments/eu-dev-rancher/bootstrap.yaml

Why the PAT can’t be GitOps-managed
#

ArgoCD needs the repo credential to read the repo - but the credential would be inside that same repo. ArgoCD cannot sync a file it needs to read before it can sync. This is a fundamental chicken-and-egg: the PAT must exist before bootstrap.

bootstrap/ is gitignored for exactly this reason. You fill in the PAT locally and apply it once. It never hits the git history.

Why the admin password hash can be committed
#

A bcrypt hash is one-way - knowing the hash doesn’t reveal the password. Committing $2a$10$... is safe. Generate it once locally, commit it to environments/eu-dev-rancher/argocd/argocd-admin-password.yaml, and ArgoCD applies it on wave 3. No manual patching ever needed after that.


Wave-by-Wave Verification
#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Watch all applications
watch kubectl get applications -n argocd

# Wave 0: cert-manager operator
kubectl get pods -n cert-manager

# Wave 1: CA chain
kubectl get clusterissuers
kubectl get certificate local-ca -n cert-manager

# Wave 2: Traefik
kubectl get pods -n ingress
kubectl get svc -n ingress          # look for EXTERNAL-IP (127.0.0.1 on Rancher Desktop)

# Wave 3: ArgoCD ingress
kubectl get ingress -n argocd

# Wave 4: services + TLS certs
argocd app list -l cluster=rancher  # requires argocd CLI + port-forward or ingress
kubectl get certificate -n alpha-dev

# HTTPS smoke test (after adding to /etc/hosts: 127.0.0.1 svc1.eu-dev-rancher.ravikrs.local)
kubectl get secret local-ca-secret -n cert-manager \
  -o jsonpath='{.data.tls\.crt}' | base64 -d > /tmp/local-ca.crt
curl --cacert /tmp/local-ca.crt https://svc1.eu-dev-rancher.ravikrs.local

Adding a New Service
#

Drop a values file into services/:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# environments/eu-dev-rancher/services/svc3.yaml
nameOverride: svc3
fullnameOverride: svc3
replicaCount: 1
image:
  repository: nginx
  tag: "1.27"
ingress:
  enabled: true
  className: traefik
  annotations:
    cert-manager.io/cluster-issuer: local-ca-issuer
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
  hosts:
    - host: svc3.eu-dev-rancher.ravikrs.local
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: svc3-eu-dev-rancher-tls
      hosts:
        - svc3.eu-dev-rancher.ravikrs.local

Commit and push. The ApplicationSet detects the new file and creates alpha-svc3-eu-dev-rancher automatically. No changes to platform/appset.yaml, no new Application manifest, no ArgoCD UI interaction.


Extending to eu-staging
#

To add a staging cluster (e.g. Minikube), create a parallel folder:

1
2
3
4
5
environments/eu-staging-minikube/
  bootstrap.yaml          # Same structure; different repoURL destination server
  platform/               # Same wave pattern; pin different chart versions if needed
  argocd/
  services/               # Staging values: higher replicas, different image tags

The config/cert-manager/ manifests are reused as-is - ClusterIssuer manifests are cluster-agnostic (they reference no cluster-specific hostname or secret name).


Future: Adding Observability
#

1
2
3
4
5
6
# 1. Add manifests
environments/eu-dev-rancher/observability/
  otel-collector.yaml     # Helm values or plain manifest

# 2. Add Application CRD to platform/ - that's the only file that changes
environments/eu-dev-rancher/platform/otel.yaml   # wave 5, points to observability/

The bootstrap.yaml never changes. ArgoCD picks up the new platform/otel.yaml on the next sync and adds OTel to the wave sequence.


Gotchas
#

ArgoCD UI returns 500 Internal Server Error through Traefik ingress

This took two rounds of debugging. Both root causes stem from argocd-server’s default TLS behaviour.

Round 1 - protocol mismatch

argocd-server runs TLS on port 443 by default. Traefik terminates TLS at the ingress, then forwards plain HTTP to the backend - but argocd-server:443 expects TLS. Traefik logs show a generic 500 with no further detail.

Diagnosed by checking kubectl describe ingress argocd-server -n argocd (backend port 443) and kubectl get configmap argocd-cmd-params-cm -n argocd (no server.insecure set, so TLS mode is active).

Round 2 - x509 IP SAN validation failure

Fixing the protocol mismatch by adding serversscheme: https (so Traefik re-encrypts to the backend) exposed the next problem: Traefik validates the backend TLS cert, and argocd-server’s self-signed cert has no IP SAN for the pod IP. Traefik rejects it.

Diagnosed via kubectl logs -n ingress -l app.kubernetes.io/name=traefik:

1
2
tls: failed to verify certificate: x509: cannot validate certificate for
10.42.0.90 because it doesn't contain any IP SANs

Root fix - run argocd-server in insecure mode

The clean solution for both: set server.insecure: true so argocd-server serves plain HTTP on port 80. Traefik connects with HTTP - no cert validation, no protocol mismatch. External traffic (browser → Traefik) is still TLS-encrypted via the cert-manager cert.

1
2
3
# environments/eu-dev-rancher/argocd/argocd-server-config.yaml
data:
  server.insecure: "true"
1
2
3
4
5
6
7
# environments/eu-dev-rancher/argocd/argocd-ingress.yaml
# backend port → 80, serversscheme annotation removed
backend:
  service:
    name: argocd-server
    port:
      number: 80

Final traffic flow:

1
Browser → HTTPS → Traefik (TLS via cert-manager) → HTTP → argocd-server:80

Regular services (nginx) are not affected - they already serve plain HTTP on port 80. No config changes needed for svc1, svc2, or any standard HTTP backend.

Learning ArgoCD - This article is part of a series.
Part 2: This Article