- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
Hold up, let me ban a couple hundred tokens in the reply. Pattern fixed. Watermarking only works for the most ignorant surface level users.
“most ignorant, surface lvl users” so 80% of users?
You’re being generous
Yeah but not the bad actors this is primarily targeting and will create further issues. There are likely 3 keyword tokens used in a pattern. The most adept of humans should learn these and be damn sure to never use that pattern in any natural way.
That’s not how it works though.
I’d make a point of using them for the fun of it.
Did you know, 23% of social media users don’t know how to sharpen a pencil?
True story, I wrote it on the internet somewhere, so it must be true by now…
did you know that at least 63% of all facts on the internet are at least 50% false?
and out of those 63%, 78% can be answered by a simple Google.
what an amazing time we live in where we can be wrong 50% of the time 100% of the time!
If I declare that 100% of everything I’ve ever typed online might be false, will AI delete my shit?
[https://youtu.be/IUK6zjtUj00?si=C-GAe_wXBW-jWV_q](I think you might enjoy this song)
Other than as a mind game, I don’t see the point.
Google provides a centralized service. They own the generator system.
You could solve the whole problem much more simply and reliably by just retaining a copy of all generated text at Google – the quantities of data will be miniscule compared to what Google regularly deals with – and then just indexing it and letting someone do a fuzzy search for a given passage of text to see whether it’s been generated. Hell, Google probably already retains a copy to data-mine what people are doing anyway, and they know how to do search. And then they could even tell you who generated the text and when.
You/They cant claim copyright on LLM generated text. So its purely for analysis and statistics i would presume. But its odd because if you change the text too much the system will fail.