cantankerous_cashew@lemmy.world to Technology@lemmy.worldEnglish · 1 month agoMeta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Revealwww.wired.comexternal-linkmessage-square28fedilinkarrow-up1359arrow-down17cross-posted to: [email protected][email protected]
arrow-up1352arrow-down1external-linkMeta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Revealwww.wired.comcantankerous_cashew@lemmy.world to Technology@lemmy.worldEnglish · 1 month agomessage-square28fedilinkcross-posted to: [email protected][email protected]
minus-squarerumbalinkfedilinkEnglisharrow-up88·1 month agoThe notorious piracy database in question is Library Genesis. Cached article: https://web.archive.org/web/20250110075821/https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
minus-squareCriticalMiss@lemmy.worldlinkfedilinkEnglisharrow-up15·1 month agoEarlier reports suggested they trained it on books from Bibliotik. What changed?
minus-squarehalcyoncmdr@lemmy.worldlinkfedilinkEnglisharrow-up25·1 month agoProbably just both honestly.
minus-squareBetaDoggo_@lemmy.worldlinkfedilinkEnglisharrow-up3·1 month agoThe llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.
The notorious piracy database in question is Library Genesis.
Cached article:
https://web.archive.org/web/20250110075821/https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
Earlier reports suggested they trained it on books from Bibliotik.
What changed?
Probably just both honestly.
In for a penny and for a pound.
The llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.