OpenAI just admitted it can’t identify AI-generated text. That’s bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

  • EuphoricPenguin@normalcity.life
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 months ago

    Unless I’m mistaken, aren’t GANs mostly old news? Most of the current SOTA image generation models and LLMs are either diffusion-based, transformers, or both. GANs can still generate some pretty darn impressive images, even from a few years ago, but they proved hard to steer and were often trained to generate a single kind of image.

    • BackupRainDancer@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      11 months ago

      I haven’t been in decision analytics for a while (and people smarter than I are working on the problem) but I meant more along the lines of the “model collapse” issue. Just because a human gives a thumbs up or down doesn’t make it human written training data to be fed back. Eventually the stuff it outputs becomes “most likely prompt response that this user will thumbs up and accept”. (Note: I’m assuming the thumbs up or down have been pulled back into model feedback).

      Per my understanding that’s not going to remove the core issue which is this:

      Any sort of AI detection arms race is doomed. There is ALWAYS new ‘real’ video for training and even if GANs are a bit outmoded, the core concept of using synthetically generated content to train is a hot thing right now. Technically whomever creates a fake video(s) to train would have a bigger training set than the checkers.

      Since we see model collapse when we feed too much of this back to the model we’re in a bit of an odd place.

      We’ve not even had a LLM available for the entire year but we’re already having trouble distinguishing.

      Making waffles so I only did a light google but I don’t really think chatgpt is leveraging GANs for it’s main algos, simply that the GAN concept could be applied easily to LLM text to further make delineation hard.

      We’re probably going to need a lot more tests and interviews on critical reasoning and logic skills. Which is probably how it should have been but it’ll be weird as that happens.

      sorry if grammar is fuckt - waffles