Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (On Bullshit, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems.
Here’s a good & readable summary paper to pin your critiques on
Humans are capable of metacognition: having levels of confidence about the accuracy of their beliefs. They are also capable of communicating this uncertainty, usually through tone & phrasing.
I suspect that arises from a sort of adversarial or autoregressive interplay btw areas of the brain. I do observe early teens displaying very low metacognition around accuracy of what they say. It’s a true stereotype that they will pick an argument almost arbitrarily and parrot talking points from online. I imagine that if llms can do that, they might just need an RLHF training flow that mirrors stuff like arguing for BS with your parents or experiencing failure as a result of misinformation. That’s why I think it’s a matter of instruction fine-tuning rather than some fundamental attribute of LLMs.
It’s probably part of developmental instincts in humans to develop better metacognition by going through an argumentative phase like that.
Yeah I just don’t see how it’s really any different from a human in that respect
Humans are capable of metacognition: having levels of confidence about the accuracy of their beliefs. They are also capable of communicating this uncertainty, usually through tone & phrasing.
I suspect that arises from a sort of adversarial or autoregressive interplay btw areas of the brain. I do observe early teens displaying very low metacognition around accuracy of what they say. It’s a true stereotype that they will pick an argument almost arbitrarily and parrot talking points from online. I imagine that if llms can do that, they might just need an RLHF training flow that mirrors stuff like arguing for BS with your parents or experiencing failure as a result of misinformation. That’s why I think it’s a matter of instruction fine-tuning rather than some fundamental attribute of LLMs.
It’s probably part of developmental instincts in humans to develop better metacognition by going through an argumentative phase like that.