I keep seeing posts about this kind of thing getting people’s hopes up, so let’s address this myth.

What’s an “AI detector”?

We’re talking about these tools that advertise the ability to accurately detect things like deep-fake videos or text generated by LLMs (like ChatGPT), etc. We are NOT talking about voluntary watermarking that companies like OpenAI might choose to add in the future.

What does “effective” mean?

I mean something with high levels of accuracy, both highly sensitive (low false negatives) and highly specific (low false positives). High would probably be at least 95%, though this is ultimately subjective.

Why should the accuracy bar be so high? Isn’t anything better than a coin flip good enough?

If you’re going to definitively label something as “fake” or “real”, you better be damn sure about it, because the consequences for being wrong with that label are even worse than having no label at all. You’re either telling people that they should trust a fake that they might have been skeptical about otherwise, or you’re slandering something real. In both cases you’re spreading misinformation which is worse than if you had just said “I’m not sure”.

Why can’t a good AI detector be built?

To understand this part you need to understand a little bit about how these neural networks are created in the first place. Generative Adversarial Networks (GANs) are a strategy often employed to train models that generate content. These work by having two different neural networks, one that generates content similar to existing content, and one that detects the difference between generated content and the existing content. These networks learn in tandem, each time one network gets better the other one also gets better.

That this means is that building a content generator and a fake content detector are effectively two different sides of the same coin. Improvements to one can always be translated directly and in an automated way into improvements into the other one. This means that the generator will always improve until the detector is fooled about 50% of the time.

Note that not all of these models are always trained in exactly this way, but the point is that anything CAN be trained this way, so even if a GAN wasn’t originally used, any kind of improved detection can always be directly translated into improved generation to beat that detection. This isn’t just any ordinary “arms race”, because the turn around time here is so fast there won’t be any chance of being ahead of the curve… the generators will always win.

Why do these “AI detectors” keep getting advertised if they don’t work?

  1. People are afraid of being saturated by fake content, and the media is taking advantage of that fear to sell snake oil
  2. Every generator network comes with its own free detector network that doesn’t really work all that well (~50% accuracy) because it was used to create the generator originally, so these detectors are ubiquitous among AI labs. That means the people that own the detectors are the SAME PEOPLE that created the problem in the first place, and they want to make sure you come back to them for the solution as well.
  • eleitl@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 year ago

    If you want to tell humans from machines it’s the only method that reliably works. If you want to prevent humans cheating with machines use proctoring.

    • jungle@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      Sure, but this post is about detecting machine-generated content. How does ID verification help there?

      • eleitl@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Challenge-response. There is no validation after the fact unless it’s been already notarized. Which involved id validation.

        This assumes that nation-states issuing the id have no incentive to cheat. Often not a safe assumption.

        • KairuByte@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Once someone has validated their ID, that can just be added to the deepfake. I’m not seeing how needing a few extra seconds of fakery is going to solve anything.

          Unless something like a TOTP identification is added, along with the current date and time displayed alongside it, there’s no real benefit to identification.