GenAI tools ‘could not exist’ if firms are made to pay copyright

L4sBot@lemmy.world · 1 year ago

GenAI tools ‘could not exist’ if firms are made to pay copyright

1Fuji2Taka3Nasubi · 1 year ago

Reproduction of copyrighted material would be breaking the law. Studying it and using it as reference when creating original content is not.

I’m curious why we think otherwise when it is a student obtaining an unauthorized copy of a textbook to study, or researchers getting papers from sci-hub. Probably because it benefits corporations and they say so?

Marcbmann@lemmy.world · 1 year ago

While I would like to be in a world where knowledge is free, this is apples and oranges.

OpenAI can purchase a textbook and read it. If their AI uses the knowledge gained to explain maths to an individual, without reproducing the original material, then there’s no issue.

The difference is the student in your example didn’t buy their textbook. Someone else bought it and reproduced the original for others to study from.

If OpenAI was pirating textbooks, that would be a wholly separate issue.

sixCats@lemmy.dbzer0.com · 1 year ago

I was under the impression they mentioned at some point torrenting things

1Fuji2Taka3Nasubi · 1 year ago

Don’t know about OpenAI, but Meta used pirated books to train its AI.

https://www.techspot.com/news/101507-meta-admits-using-pirated-books-train-ai-but.html

Blackmist@feddit.uk · 1 year ago

The fact that the “AI” can spit out whole passages verbatim when given the right prompts, suggests that there is a big problem here and they haven’t a clue how to fix it.

It’s not “learning” anything other than the probable order of words.

FatCrab@lemmy.one · 1 year ago

I really hate this reduction of gpt models. Is the model probabilistic? Absolutely. But it isn’t simply learning a comprehensible probability of words–it is generating a massively complex conditional probability sequence for words. Largely, humans might be said to do the same thing. We make a best guess at the sequence of words we decide to use based on conditional probabilities along a myriad number of conditions (including semantics of the thing we want to say).

Marcbmann@lemmy.world · 1 year ago

Completely agree. And that should be the focal point of the issue.

Sam Altman is correctly stating that AI is not possible without using copyrighted materials. And I don’t think there’s anything wrong with that.

His mistake is not redirecting the conversation. He should be talking about the efforts they’re making to stop their machine from reproducing copyrighted works. Not whether or not they should be allowed to use it in the first place.

Even_Adder@lemmy.dbzer0.com · 1 year ago

What about these:

https://arxiv.org/abs/2310.02207

https://notes.aimodels.fyi/researchers-discover-emergent-linear-strucutres-llm-truth/

https://notes.aimodels.fyi/self-rag-improving-the-factual-accuracy-of-large-language-models-through-self-reflection/

1Fuji2Taka3Nasubi · 1 year ago

I agree that the issues

whether AI output are derivative works of its input, and
whether input to AI is fair use and requires no compensation

are separate, but I think they are related, in that AI companies are trying to impose whatever interpretation of copyright that is convenient to them to the rest of the society.

And indeed Meta pirated books to feed its AI.

https://www.techspot.com/news/101507-meta-admits-using-pirated-books-train-ai-but.html

GenAI tools ‘could not exist’ if firms are made to pay copyright

GenAI tools ‘could not exist’ if firms are made to pay copyright

GenAI tools ‘could not exist’ if firms are made to pay copyright | Computer Weekly