Meta’s new AI image generator was trained on 1.1 billion Instagram and Facebook photos

Lee Duna@lemmy.nz · 1 year ago

Meta’s new AI image generator was trained on 1.1 billion Instagram and Facebook photos

Mahlzeit@feddit.de · 1 year ago

That ought to satisfy all those who wanted “consent” for training data.

Esqplorer · 1 year ago

I wonder how they worked around user violations of copyright… Imagine all the content uploaded to Instagram/Facebook that the poster didn’t create but simply uploaded their download/screenshot.

Mahlzeit@feddit.de · 1 year ago

That shouldn’t be an issue. If you look at an unauthorized image copy, you’re not usually on the hook (unless you are intentionally pirating). It’s unlikely that they needed to get explicit “consent” (ie license the images) in the first place.

GiveMemes@jlai.lu · 1 year ago

Yeah but is it the same thing for a human to view data and an AI model to be trained on it? Not in my opinion as an AI doesn’t understand the concept of intellectual property and just spits out the most likely next word whereas a person can recognize when they are copying something.

Mahlzeit@feddit.de · 1 year ago

I understand. The idea would be to hold AI makers liable for contributory infringement, reminiscent of the Betamax case.

I don’t think that would work in court. The argument is much weaker here than in the Betamax case, and even then it didn’t convince. But yes, it’s prudent to get the explicit permission, just in case of a case.

GiveMemes@jlai.lu · edit-2 1 year ago

Doesn’t really seem the similar to me at all. One is a thing that’s actively making new content. Another is a machine with the purpose of time-shifting broadcasted content that’s already been paid for.

It’s reminiscent insofar as personal AI models on individual machines would go, but completely different as for corporate and monetizable usage.

Like if somebody sold you an AI box that you had to train yourself that would be reminiscent of the betamax case.

Mahlzeit@feddit.de · 1 year ago

Yes, if it’s new content, it’s obviously no copy; so no copyvio (unless derivative, like fan fiction, etc.). I was thinking of memorized training data being regurgitated.

GiveMemes@jlai.lu · edit-2 1 year ago

Yeah I just think that ingesting a bucnh of novels and rearranging their contents into a new piece of work (for example) is still copyright infringement. It doesn’t need to be the Lord of the Rings or Star Wars word for word to get copyright stricken. Similar to how in the music sphere it doesn’t need to be the same exact melody.

Edit: Glad you down voted instead of responding. Really shows the strength of your argument…

Mahlzeit@feddit.de · 1 year ago

I didn’t downvote you. (Just gave you an upvote, though.) You’re reasonable and polite, so a downvote would be very inappropriate. Sorry for that.

Music is having ongoing problems with copyright litigation, like Ed Sheeran most recently. From what I have read, it’s blamed on juries without the necessary musical background. As far as I know, higher courts usually strike down these cases, as with Sheeran. Hip hop was neutered, in a blow to (African-)American culture. While it was obviously wrong, not to find for fair use in that case, samples are copies.

It’s not so bad outside of music. You can write books on “how to write a bestseller”, or “how to draw comics” without needing permission. Of course, you would study many novels and images to get material. The purpose of books is that we learn from them. That we go on to use this to make our own thing is intended (in the US).

What you’re proposing there would be a great change to copyright law and probably disastrous. Even if one could limit the immediate effect to new technologies, it would severely limit authors in adopting these technologies.