Say these claims are overhyped. Wouldn’t we still reach a point where it’s true, without having humans have to sit down and sift through what’s allowed and what isn’t?
Not necessarily. Curation can also be done by AIs, at least in part.
As a concrete example, NVIDIA’s Nemotron-4 is a system specifically intended for generating “synthetic” training data for other LLMs. It consists of two separate LLMs; Nemotron-4 Instruct, which generates text, and Nemotron-4 Reward, which evaluates the outputs of Instruct to determine whether they’re good to train on.
Humans can still be in that loop, but they don’t necessarily have to be. And the AI can help them in that role so that it’s not necessarily a huge task.
“Well curated”
Say these claims are overhyped. Wouldn’t we still reach a point where it’s true, without having humans have to sit down and sift through what’s allowed and what isn’t?
Not necessarily. Curation can also be done by AIs, at least in part.
As a concrete example, NVIDIA’s Nemotron-4 is a system specifically intended for generating “synthetic” training data for other LLMs. It consists of two separate LLMs; Nemotron-4 Instruct, which generates text, and Nemotron-4 Reward, which evaluates the outputs of Instruct to determine whether they’re good to train on.
Humans can still be in that loop, but they don’t necessarily have to be. And the AI can help them in that role so that it’s not necessarily a huge task.