• 4 Posts
  • 37 Comments
Joined 2 days ago
cake
Cake day: December 26th, 2024

help-circle



  • I understand that Perplexity employs various language models to handle queries and that the responses generated may not directly come from the training data used by these models; since a significant portion of the output comes from what it scraped from the web. However, a significant concern for some individuals is the potential for their posts to be scraped and also used to train AI models, hence my post.

    I’m not anti AI, and, I see your point that transformers often dissociate the content from its creator. However, one could argue this doesn’t fully mitigate the concern. Even if the model can’t link the content back to the original author, it’s still using their data without explicit consent. The fact that LLMs might hallucinate or fail to attribute quotes accurately doesn’t resolve the potential plagiarism issue; instead, it highlights another problematic aspect of these models imo.


  • Yes, the platform in question is Perplexity AI, and it conducts web searches. When it performs a web search, it generally gathers and analyzes a substantial amount of data. This compiled information can be utilized in various ways, including creating profiles of specific individuals or users. The reason I bring this up is that some people might consider this a privacy concern.

    I understand that Perplexity employs other language models to process queries and that the information it provides isn’t necessarily part of the training data used by these models. However, the primary concern for some people could be that their posts are being scraped (which raises a lot of privacy questions) and could also, potentially, be used to train AI models. Hence, the question.






  • Yeah. I totally get what you’re saying.

    However, as you pointed out, AI can deal with more information than a human possibly could. I don’t think it would be unrealistic to assume that in the near future it will be possible to track someone cross accounts based on things such as their interests, the way they type, etc. Then it will be a major privacy concern. I can totally see three letter agencies using this technique to identify potential people of interest.











  • The issue with that method, as you’ve noted, is that it prevents people with less powerful computers from running local LLMs. There are a few models that would be able to run on an underpowered machine, such as TinyLlama; but most users want a model that can do a plethora of tasks efficiently like ChatGPT can, I daresay. For people who have such hardware limitations, I believe the only option is relying on models that can be accessed online.

    For that, I would recommend Mistral’s Mixtral models (https://chat.mistral.ai/) and the surfeit of models available on Poe AI’s platform (https://poe.com/). Particularly, I use Poe for interacting with the surprising diversity of Llama models they have available on the website.