TBH I’m thinking this is an advertiser bullshitting their clients to seem like they can do something their competitors can’t, since the tech consensus so far has been “this is logistically infeasible, runs afoul of wiretapping laws basically everywhere, and is such a heavy task and bandwidth load that it would 100% be obvious that it was happening.” Like this would require either streaming the audio to a remote server (illegal, high bandwidth and so very noticeable, requires paying for tons of processing power to parse the mostly junk data) or running speech to text locally to grab keywords to then report back (would require the phone to constantly be running at least a moderate load to parse the constant stream of junk noises, quickly draining the battery while it’s not in use and again being very noticeable).
I don’t think these “it’s not feasible” arguments make sense or are in good faith. It’s not a dichotomy between “it’s not happening at all” or “they are recording or streaming everything we’re saying to the worst natural language models”.
Maybe they’re doing it only on certain triggers. Maybe they are listening only for certain keywords. Maybe they record and listen to 10 seconds of audio everytime you get a message on your phone? Maybe here using really low quality recordings. Maybe they’ve slowed down the processing to run in the background as there’s no requirement to do real time NLP.
Like there’s a huge range of potential options (with new, energy efficient optimized ML algorithms) between 0 and 100.
Google Pixels have long had a Now Playing feature that can identify songs playing nearby. As far as I know, it’s all on-device and offline. There are pre-loaded hashes of a bunch of common songs which can then be compared to ambient sounds.
So there could be similar on-device functionality that recognizes spoken trigger words and record them to some file which can then be accessed by apps that serve ads.
This would be different from (and more efficient than) recording large audio files and sending them over the internet.
on device “ai” capabilities actually probably makes this a lot more feasible. but doing so under the noses of every security professional on the planet seems unlikely
on device “ai” capabilities actually probably makes this a lot more feasible
With modern tech and modern hardware it’s feasible from a “maybe it can do a little bit of processing while the device is actually in use and use so little that it’s not grinding everything else to a halt” but it would be immediately obvious to anyone sitting down and testing to see if it does anything weird and even just compiling a list showing how many times a few select keywords showed up to report back the next time it connects to a server anyways could still constitute an illegal wiretap.
Also, even if it manages to only take a persistent 3-4% processor load to run, what happens when every app that’s managed to request microphone permissions is running this? You’d get the whole processor monopolized by spyware to the point nothing could run and it would become immediately obvious. That says that if a company were to try it, they’d basically have only a very brief window before legal hammers over the wiretapping came down or everyone else started doing it too and the manufacturers had to start cracking down because it was making their phones unusable.
TBH I’m thinking this is an advertiser bullshitting their clients to seem like they can do something their competitors can’t, since the tech consensus so far has been “this is logistically infeasible, runs afoul of wiretapping laws basically everywhere, and is such a heavy task and bandwidth load that it would 100% be obvious that it was happening.” Like this would require either streaming the audio to a remote server (illegal, high bandwidth and so very noticeable, requires paying for tons of processing power to parse the mostly junk data) or running speech to text locally to grab keywords to then report back (would require the phone to constantly be running at least a moderate load to parse the constant stream of junk noises, quickly draining the battery while it’s not in use and again being very noticeable).
I don’t think these “it’s not feasible” arguments make sense or are in good faith. It’s not a dichotomy between “it’s not happening at all” or “they are recording or streaming everything we’re saying to the worst natural language models”.
Maybe they’re doing it only on certain triggers. Maybe they are listening only for certain keywords. Maybe they record and listen to 10 seconds of audio everytime you get a message on your phone? Maybe here using really low quality recordings. Maybe they’ve slowed down the processing to run in the background as there’s no requirement to do real time NLP.
Like there’s a huge range of potential options (with new, energy efficient optimized ML algorithms) between 0 and 100.
Copying my other comment
Google Pixels have long had a Now Playing feature that can identify songs playing nearby. As far as I know, it’s all on-device and offline. There are pre-loaded hashes of a bunch of common songs which can then be compared to ambient sounds.
So there could be similar on-device functionality that recognizes spoken trigger words and record them to some file which can then be accessed by apps that serve ads.
This would be different from (and more efficient than) recording large audio files and sending them over the internet.
on device “ai” capabilities actually probably makes this a lot more feasible. but doing so under the noses of every security professional on the planet seems unlikely
With modern tech and modern hardware it’s feasible from a “maybe it can do a little bit of processing while the device is actually in use and use so little that it’s not grinding everything else to a halt” but it would be immediately obvious to anyone sitting down and testing to see if it does anything weird and even just compiling a list showing how many times a few select keywords showed up to report back the next time it connects to a server anyways could still constitute an illegal wiretap.
Also, even if it manages to only take a persistent 3-4% processor load to run, what happens when every app that’s managed to request microphone permissions is running this? You’d get the whole processor monopolized by spyware to the point nothing could run and it would become immediately obvious. That says that if a company were to try it, they’d basically have only a very brief window before legal hammers over the wiretapping came down or everyone else started doing it too and the manufacturers had to start cracking down because it was making their phones unusable.