Problem statement: A cronjob refreshes an image-pull secret in every namespace so nodes can authenticate to a private registry. It does the obvious™ thing: delete the old secret, then create the new one right away: kubectl delete secret --ignore-not-found aws-ecr-token -n "$namespace" kubectl create secret docker-registry aws-ecr-token -n "$namespace" \ --docker-server="https://$AWS_ACCOUNT.dkr.ecr.$AWS_REGION.amazonaws.com" \ --docker-username="AWS" \ --docker-password="$(aws ecr get-login-password --region "$AWS_REGION")" Every couple of hours, for a fraction of a second, that secret does not exist in the cluster (booo!). Most of the time nobody notices. Occasionally a deploy step that copies the secret into a tenant namespace runs into exactly that gap, its kubectl get returns non-zero, and the whole deployment fails with ImagePullError. Occasional flakiness, our favorite kind. The idiomatic fix is to never delete. The solution is to employ kubectl create with a client-side dry-run…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.