• grue@lemmy.world
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    3
    ·
    9 months ago

    Nevermind paying for content; AI firms should be required to abide by the terms of copyleft-licensed training data. In other words, all output of an AI trained on even a dataset containing even a single copyleft work should be required to be copyleft itself.

    • Rogers@lemmy.ml
      link
      fedilink
      arrow-up
      4
      arrow-down
      2
      ·
      9 months ago

      Yeah that won’t work. Any country that lets their companies use whatever data will flat out have better models. Potentially meaning grater economic output of the whole country if ai is as big as i think. Unfortunate but i don’t see an alternative yet unless they make it so you can use amy data but models have to be free

      • grue@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        9 months ago

        The alternative is to take the good training data and then accept the output being copyleft without whining about it.

      • a2800276@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        9 months ago

        Doesn’t that argument apply to any instance of ignoring intellectual property. Books, records and movies will also be cheaper in countries that let companies do what they want. Medicine would be more accessible, ignoring patients will greatly accelerate innovation in countries where permitted…

  • Auzy@beehaw.org
    link
    fedilink
    arrow-up
    7
    arrow-down
    2
    ·
    edit-2
    9 months ago

    I agree. It’s basically just stealing work from others and not paying them or agreeing to licensing. Long term, it’s not a good thing

    For software development, AI is a useful tool, but it’s likely also stealing licensed code and adding it to other people’s code in some circumstances

    If AI companies want to use training data, that’s ok, but they should pay all of the creators it’s trained from

    • NovaPrime@lemmy.ml
      link
      fedilink
      arrow-up
      5
      ·
      9 months ago

      Respect your opinion and where you’re coming from, but disagree simply for the fact that this will create a silo of corporatized AI because they will be the only ones who will have the ability to pay for the IP at the necessary scale (that’s not to say that this is not already happening to an extent with the existing model). I do think the conversation is worth having though about the public value of data that’s readily available on the internet and how it squares with our (imo) outdated IP laws. How do we ensure that individual creators retain full control and benefit of their art/content/knowledge, while not stifling or unduly hampering AI research? How much protection do we afford data that users willingly put on the internet that’s publicly available? And who pays for the data in the chain?

        • ZOSTED@sh.itjust.works
          link
          fedilink
          arrow-up
          2
          ·
          9 months ago

          There is enough freely licensed content to make whatever you want. I have no trouble at all making websites and comic books and video games using freely licensed content.

                • ZOSTED@sh.itjust.works
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  9 months ago

                  You paid your own money for every single copyright work you’ve ever seen in your life?

                  I never claimed this distinction, and I don’t think it’s a meaningful point.

                  I’m saying that I pay for art. These companies don’t, but more to the point, they seek to undermine their source once they’ve extracted all the training data they need. I’d go so far as to say it’s in poor taste to use free art, because it should be patently obvious that most artists putting out free art, did not anticipate its use by devices that let you bypass artists entirely.

                  There’s an alternate way that this could have all gone down: after some internal testing, we could have simply asked artists to volunteer their work for the project of training. There are enough people excited about the tech that this would have been plenty! It just wouldn’t have let companies rush for market share, and hope the business utility would gloss over any ethical qualms in the aftermath.

      • NovaPrime@lemmy.ml
        link
        fedilink
        arrow-up
        4
        arrow-down
        1
        ·
        edit-2
        9 months ago

        With the right tools and resources all content on the internet becomes freely available.

    • echo64@lemmy.world
      link
      fedilink
      arrow-up
      9
      arrow-down
      7
      ·
      9 months ago

      If Ais were capable of invention and creation, I might agree. But they aren’t. They regurgitate what they are modeled on.

      We don’t teach AIs, they don’t learn, there’s no university, there’s no fundamentals. We just have models that reproject. They take the training data, mix it all up, and then project it out again.

      There is use to that, but gpt isn’t a child. It can not learn, comprehend, or understand. It’s a tool, and as a tool, it depends heavily on the work created by others.

        • Sneezycat@sopuli.xyz
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          9 months ago

          Okay so according to your logic, it is impossible for us to have this conversation. No human could’ve invented those things, therefore they can’t exist.

          Or are you saying humans can learn, but our capacity for that is greatly amplified by the knowledge humanity gave us?

          If it’s the latter, yeah, we’re standing on the shoulders of giants. But AI is fundamentally different, that’s the point of the comment above.

          AI could never in however many million years get to the point humanity has gotten to, because we humans learn, and AIs don’t. They would stagnate without humans even if they could train from each other.

          • Jozzo@lemmy.world
            link
            fedilink
            arrow-up
            5
            ·
            9 months ago

            You cant really compare like that, learning is an input and regurgitating is an output.

            Humans learn and regurgitate much the same as an AI learns and regurgitates.

            A human can only output things based on input it’s received in the past. Try imagining a new color. Any color you could possibly come up with is just some combination of colors that already exist. By painting with purple are you not “regurgitating” the work of red and blue?

  • Pika@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    4
    ·
    9 months ago

    meanwhile japan stated that trademark and copyright doesn’t exist in the AI world. Man the two sides are spread wide

  • AutoTL;DR@lemmings.worldB
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 months ago

    This is the best summary I could come up with:


    The New York Times recently sued OpenAI, accusing the startup of unlawfully scraping “millions of [its] copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides and more.”

    Danielle Coffey, CEO of the News/Media Alliance trade association, noted that chatbots designed to crawl the web and act like a search engine, like Microsoft Bing or Perplexity, can summarize articles too.

    Readers could ask them to extract and condense information from news reports, meaning there would be less incentive for people to visit publishers’ sites, leading to a loss of traffic and ad revenue.

    Jeff Jarvis, who recently retired from the City University of New York’s Newmark Graduate School of Journalism, is against licensing for all uses and was afraid it could set precedents that would affect journalists and small, open source companies competing with Big Tech.

    Revealing their sources might make their AI tools look bad too, considering the amount of inappropriate text their models have ingested, including people’s personal information and toxic or NSFW content.

    “The notion that the tech industry is saying that it’s too complicated to license from such an array of content owners doesn’t stand up,” said Curtis LeGeyt, president and CEO of the National Association of Broadcasters.


    The original article contains 877 words, the summary contains 202 words. Saved 77%. I’m a bot and I’m open source!