TL;DR (by GPT-4 🤖):
The article titled “It’s infuriatingly hard to understand how closed models train on their input” discusses the concerns and lack of transparency surrounding the training data used by large language models like GPT-3, GPT-4, Google’s PaLM, and Anthropic’s Claude. The author expresses frustration over the inability to definitively state that private data passed to these models isn’t being used to train future versions due to the lack of transparency from the vendors. The article also highlights OpenAI’s policy that data submitted by API users is not used to train their models or improve their services. However, the author points out that the policy is relatively new and data submitted before March 2023 may have been used if the customer hadn’t opted out. The article also brings up potential security risks with AI vendors logging inputs and the possibility of data breaches. The author suggests that openly licensed models that can be run on personal hardware may be a solution to these concerns.