(Almost) Every infrastructure decision I endorse or regret after 4 years running infrastructure at a startup

canpolat@programming.dev · 1 year ago

(Almost) Every infrastructure decision I endorse or regret after 4 years running infrastructure at a startup

pearable@lemmy.ml · 1 year ago

Super useful write up. I’ve taken some notes I’ll use moving forward on a personal project. Thanks for sharing

procesd@lemmy.world · 1 year ago

If I could be bothered to sit and write down a distilled version of my last decade at work it would be something very similar. Any junior SRE can benefit from this.

Lodra@programming.dev · 1 year ago

This is excellent. I may copy the rough format for tracking things internally at my company!

Btw, I agree with most of your decisions in here with just a few exceptions.

kustomize > helm
Argo > flux

My last thought is less clear though. There are good observability solutions besides datadog. Grafana Cloud is great. Honeycomb has a similar offering. But all are pretty expensive though.

If you aren’t using OpenTelemetry, you’re probably doing observability wrong!

Piatro@programming.dev · 1 year ago

I’ve only used helm and hadn’t considered kustomize as an equivalent, what about kustomize makes it bette in your opinion?

Lodra@programming.dev · 1 year ago

First is complexity. A simple helm chart works great but more elaborate charts can turn into a maintenance problem. This is especially when managing a large number of apps and need to establish and maintain standards across them. E.g. you want to add a new label to every helm chart you use. You now get to making 60 PRs for 60 charts. Or you can tie them all together with chart dependencies. This can be done well but almost never is. It’s just too easy to build a bad helm chart. Kustomize allows you to do this from a “top-down” perspective

Second is modifications. Consider as an example that you want to run filebeat as a sidecar container on some pod to capture its logs. But the helm chart you’re using doesn’t include this feature. You have two choices: modify the pod when it’s created with a mutatingwebhook or similar (super complicated solution) or you can copy/fork the chart, add the functionality, and maintain it going forward. Kustomize just doesn’t have this problem. You can just modify a base manifest with overlays.

Last is the nature of Go templates which helm charts are based on. Everything outside of {{ }} is just plaintext. This leads to a ton of limitations. Got a whitespace issue? You’ll probably find out at runtime. Want your IDE to identify syntax issues, provide, intellisense, etc. on the final manifest? Good luck! You need to render that chart first. With Kustomize, every manifest is structured text (yaml). So you get the benefits of all standard tooling for yaml data in your IDEs and CI/CD pipelines.

Honestly, I could keep going (helm releases ugghhhh!). But helm definitely wins on one point and it’s a big one; Helm is the standard for distributing k8s manifests. So every meaningful project supplies helm charts. Kustomize doesn’t even come close on this one. That said, I think Kustomize manifests are just simpler to build. So having an official base manifest for every project just doesn’t matter too much.

procesd@lemmy.world · 1 year ago

Best I have used is Flux with helmrelease objects and kustomize. Only lacking some UI to get flux events and logs together. I guess that Argocd can do the same with application onjects using charts as source but haven’t tested it yet

But managing both k8s external components (external-dns, external-secrests, CSI, etc etc) and your own apps with kustomizable charts makes so much more sense than having thousand of manifests that I would recommend everyone to try.

RagnarokOnline@programming.dev · 1 year ago

Even for someone who’s only dabbled in infrastructure, this was a fun read.

jjjalljs@ttrpg.network · 1 year ago

I’m not devops but this seemed reasonable from what I’ve seen.

Though having infrastructure weigh in with “go should be your language of choice” seems weird.

RandomDevOpsDude@programming.dev · 1 year ago

I can’t believe I haven’t seen external secrets before. Sealed secrets are cool, but such a pain as you described. Gonna be setting up external secrets next week sounds like. Thanks for the great post

z3r0 · 1 year ago

What do you think about storing your encrypted secrets in your repos using Sops?

RandomDevOpsDude@programming.dev · 1 year ago

I prefer Sealed Secrets over sops since it has the namespace scoping element and can also be stored in repo (once encrypted). I also generally prefer having a controller deployed rather than forcing devs to learn kustomize (which we don’t widely use yet) so I guess less of a support burden for me.

z3r0 · 1 year ago

I understand your point. Anyway, if your devs are using Helm they can still use Sops with the helm-secrets plugin. Just create a separated values file (can be named as secrets.yaml) contaning all sensitive values and encrypt it with Sops.

RandomDevOpsDude@programming.dev · 1 year ago

Thanks for sharing! I definitely hadn’t seen that plugin. We definitely use helm, even though I hate it lol. I will take a look when I get around to looking at external secrets since I still haven’t had a chance to (you know how it goes… priorities made up by some random PM or whatever)

z3r0 · 1 year ago

If you still want more you can use Helmfile. Take care of your PMs 😁

Sheldan@programming.dev · 1 year ago

A good read, and interesting to see what services to consider.

themeatbridge@lemmy.world · 1 year ago

This is all incredibly valuable information.

fluckx@lemmy.world · 7 months ago

Any insight on why you prefer the nginx ingress vs the ALB ingress controller on AWS? you can group/combine ingresses as well and it will automatically load the correct certificate from ACM if it exists. Which means you won’t have to mess around with certbot. Your TLS ends in the loadbalancer in that case though.

EKS managed addons now support custom configuration( might not have when you started out ) though maybe not all the custom features you’re looking for are there. It’s not as flexible as the helm chart obviously, but usually supports the most basic things you’d want to use.

Interesting read otherwise!

Personally I’ve had issues selling people on gitops/kustomize as they all find helm charts a lot easier.