I’m pretty familiar with automated tests where you’re comparing a received value to an expected value (e.g. basically all unit/integration tests) — in a CI/CD workflow, you handle test failures by failing the whole pipeline, and then that commit/PR/etc has a pipeline that failed next to it.

However, what if I have some kind of “performance” measure I want to track, instead? Something that isn’t pass/fail, but rather a set of experimental results over time? (e.g. speed of responses from an API, wins/draw/loss rates on chess bot, confusion matrix scores for a classifier, etc.) Is there a tool that can show that kind of “automated experiment” results in order by git commit, pull request, etc?

I thought about sending the data to some kind of data store with a Grafana front-end, but I was hoping there might be some less “diy” method for creating such a display.

  • truami@programming.dev
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    I use datadog for this specific use case. You can log your own metrics through their API, then set up dashboards and alerting based off specific parameters and thresholds. I mainly use it to track web vitals over time to pinpoint problematic releases or assets, but it can be used for any numeric values you wish to track.

  • Fenzik@lemmy.ml
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    Feels like you could maybe (ab)use an ML experiment tracking tool for this, something like MLFlow. Except instead of training an ML model you just trigger your tests and report the statistics from those back to the tracking tool.

  • slowfouriertransform@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    I’m pretty interested in this too. I’ve thought about it in the past, and I think get stuck where you’re asking (the post processing and visualizing bit).

    I’d thought of having GitHub actions for the measurement, stashing the results as artifacts, then having another workflow that processes the results. Obviously pretty DIY so I’m curious if others have solutions.

  • key@lemmy.keychat.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Hard to recommend anything without some hint of your build systems. Java via Jenkins? Node via Bitbucket Pipelines? C# via Azure Devops?

    • GaussianInteger@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      My particular use case is actually for a hobby/fun project — developing a bot in Rust to play a game (particularly, Screeps), and I want to track how fast it hits certain game thresholds with each newly developed feature. Gitea Actions for CI/CD, but it’s all running on my local network/home lab so I’m happy to shift as needed.

  • vampatori@feddit.uk
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    This is on my list to do - if you find a good solution do let us know!

    I was thinking of just doing the quick-and-dirty approach of appending the data to a file in the repo and auto-committing it. Just have some previous commit information, test name, and results appended every time. That way the head always has the full history of data in order so you can just push/pull that into anything and analyse/graph it without messing about.

    I’d probably only do it on push/PR merge so in the grand scheme of things would never really become a lot of data, but you could truncate it as you go easy enough.

  • truami@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I use datadog for this specific use case. You can log your own metrics through their API, then set up dashboards and alerting based off specific parameters and thresholds. I mainly use it to track web vitals over time to pinpoint problematic releases or assets, but it can be used for any numeric values you wish to track.