Update: Proton worldwide outage caused by Kubernetes migration, software change (link)

  • ChapulinColorado@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    12 hours ago

    I don’t use them, but I do work in tech and oopsies do happen even with a properly configured k8s set of clusters or well managed bare metal infrastructure and well trained engineers. A developer could not be fully aware of something as simple as logs going to a file being something that can bring down capacity due to evicted pods on k8s for example.

    It does sound like the post is beating around the bush on terms of what caused the outage, but if their post mortem acknowledged fully what it was and decent steps being taken to mitigate it, short and long term it could still be a lesson learned. Generally it’s not possible to just correct something that quickly on complex systems or environments that have been used to a certain workflow as much as customers and users would like (developers like anyone else make mistakes).

    Whether a noobie mistake on the code review process or something else if they are honest and clear it can still impress people willing to migrate. Using MS teams and O365 at work it feels like there is an intermittent outage every other month.