cantankerous_cashew@lemmy.world to Technology@lemmy.worldEnglish · 16 hours agoMeta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Revealwww.wired.comexternal-linkmessage-square28fedilinkarrow-up1291arrow-down17cross-posted to: [email protected][email protected]
arrow-up1284arrow-down1external-linkMeta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Revealwww.wired.comcantankerous_cashew@lemmy.world to Technology@lemmy.worldEnglish · 16 hours agomessage-square28fedilinkcross-posted to: [email protected][email protected]
minus-squarerumbalinkfedilinkEnglisharrow-up71·16 hours agoThe notorious piracy database in question is Library Genesis. Cached article: https://web.archive.org/web/20250110075821/https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
minus-squareCriticalMiss@lemmy.worldlinkfedilinkEnglisharrow-up13·15 hours agoEarlier reports suggested they trained it on books from Bibliotik. What changed?
minus-squareBetaDoggo_@lemmy.worldlinkfedilinkEnglisharrow-up3·8 hours agoThe llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.
minus-squarehalcyoncmdr@lemmy.worldlinkfedilinkEnglisharrow-up20·15 hours agoProbably just both honestly.
The notorious piracy database in question is Library Genesis.
Cached article:
https://web.archive.org/web/20250110075821/https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
Earlier reports suggested they trained it on books from Bibliotik.
What changed?
The llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.
Probably just both honestly.