Why isn't everyone talking about AI generated audiobooks?

PumpkinDrama@reddthat.com · 8 months ago

Why isn't everyone talking about AI generated audiobooks?

simple@lemm.ee · 8 months ago

A lot of people just aren’t aware of how fast AI is moving. AI voices were pretty meh earlier this year. A lot of people working on the audiobook/voice acting scene have been talking about this though.

driving_crooner@lemmy.eco.br · 8 months ago

I recommend everyone to check the YouTube channel “two minute papers” who have being doing videos about papers on AI for the last 10 years on so to see the accelerated progress AI have. Like 5 years ago those images generating AI looked like LSD infused dreams and now they look almost perfect.

Magrath@lemmy.ca · 8 months ago

I wish I could watch his videos but the way he talks is awful. It’s like some exaggerated evolution of YouTube talk.

Liempong_pagong@beehaw.org · 8 months ago

It’s great to be alive!

mindbleach@sh.itjust.works · 8 months ago

I’m only shocked that video isn’t better. Diffusion models work like denoising - so you’d figure all the wiggly nonsense between frames would be the first thing to filter out.

driving_crooner@lemmy.eco.br · 8 months ago

I give it a year, maybe two, for a fully synthetic video that couldn’t not be easily distinguish from reality. There’s already some very good AI that complete or replace backgrounds on videos that work really good, and completely synthetic videos that looks like nightmares for now.

mindbleach@sh.itjust.works · 8 months ago

I expected it to be here six months ago, but its continued absence hasn’t changed my estimate from “any day now, and suddenly.” All of this is so weirdly democratized (and pornography-motivated) that we’re seeing the cool stuff before all the scary disinformation concerns.

And the underlying mechanisms are straight-up “the missile knows where it is, because it knows where it is not.” Stable Diffusion compares the noise estimate with and without a particular term, takes the difference, and then leaps outward along that vector.

Turun@feddit.de · 8 months ago

I expect the data size to be a problem. Stable diffusion defaults to 512x512px, because it simply requires a lot of resources to generate an image. Even more so to train one. Now do that times 30 to generate even one second of video. I think we need something that scales better.

I fully expect this to work decently in a few years though, no matter how hard the challenge is, ai is moving really fast.

mindbleach@sh.itjust.works · edit-2 8 months ago

“Fisheye” generation seems obvious. Give the network a distorted view of an arbitrarily large image, where distant stuff scrunches inward toward a full-resolution point of focus. Predict only a small area - or even a single pixel. This would massively decrease the necessary network size, allowing faster training. (Or more likely, deeper networks). It’d also Hamburger Helper any size dataset by training on arbitrarily many spots within each image instead of swallowing the whole elephant.

Even without that, video only needs a few frames at a time. You want to predict a future frame from several past frames. You want to tween a frame in the middle of past and future frames. That’s… pretty much it. Time-lapse “past frames” by sampling one per second, and you can predict the next second instead of the next frame. Then the stuff between can be tweened.

Hexarei@programming.dev · 8 months ago

Stable diffusion can do arbitrary sizes now, as long as you have the VRAM for it iirc

Turun@feddit.de · 8 months ago

Of course, but that is precisely the problem. It gets expensive really really fast.

LadyLikesSpiders@lemmy.ml · 8 months ago

Ah yes, Audio AI. I can’t wait for this rapidly-approaching future where you literally won’t be able to trust the validity of anything your senses tell you anymore

mindbleach@sh.itjust.works · 8 months ago

“Text was never trustworthy.”

– Abraham Lincoln

LadyLikesSpiders@lemmy.ml · 8 months ago

Lincoln was a smart man

bingbong@lemmy.dbzer0.com · 8 months ago

Ahead of his time too:

Nobody lies on the internet

-Abraham Lincoln

LadyLikesSpiders@lemmy.ml · 8 months ago

Truly one of the wisest men to ever live

AVincentInSpace@pawb.social · 8 months ago

But up until this point, you see, there has always been one medium that is difficult/expensive enough to convincingly fake that it can reasonably be used as proof that something actually happened. If technology advances to the point where a video of something happening is no more convincing than a text description that it happened, and no other more sophisticated, harder-to-fake medium steps in to replace it…

I don’t want to live in a world where the truth is anything you can convince your friends of, you feel me?

mindbleach@sh.itjust.works · 8 months ago

“Up until this point” meaning maybe eighty years where unexpected events had any chance of being on film or televised, and several decades where amateur video was even theoretically possible.

And solid corroborating evidence still barely moved the needle whenever it was footage of cops trying to kill someone.

And what’s going to make bodycams necessary regardless is chain-of-custody demonstrating (a) the footage matching what the victims said absolutely came from the camera strapped to the chest of the accused, or (b) some motherfucker orchestrated a cover-up that demonstrates consciousness of guilt.

AdmiralShat@programming.dev · 8 months ago

Imagine the day when people post videos of the president saying literally anything with pitch perfect audio voice synth

Imagine going to prison for a generated clip of you confessing to a crime.

FaceDeer@kbin.social · 8 months ago

Once the tech is that good, a recording of your confession will be useless as evidence in court.

AdmiralShat@programming.dev · edit-2 8 months ago

…but it is already that good? The fact that celebrities are having to come out and say it wasn’t them in an ad is proof enough that it can fool people

You only need to fool a jury

FaceDeer@kbin.social · 8 months ago

Then we’ll have to take more care with how jury trials are conducted. It’s always been possible to fool juries, that’s often a lawyer’s entire strategy.

xkforce@lemmy.world · 8 months ago

Everything will be useless in court. Audio evidence? Worthless. Video evidence? Worthless. Physical evidence? Prove that it wasnt planted. That kind of AI is a fucking nightmare and no one really understands the danger that kind of AI poses.

FaceDeer@kbin.social · 8 months ago

AI can’t tamper with physical evidence. It can’t fake financial records or witness testimony. Many kinds of audio and visual recordings will still have sufficient authentication and chain of custody to be worthwhile.

The main kind of evidence that these AI generators makes untenable are the ones where someone just shows up and says “look at this video of X confessing to Y that I happen to have,” which was never a particularly good sort of evidence to base a court case on to begin with.

xkforce@lemmy.world · edit-2 8 months ago

Witness testimony is already a very unreliable source of evidence. And again, evidence can be planted. Hell there was doubt about the chain of custody before AI could just make up audio and video. The validity of the chain of custody boils down to the cops and government in general being trusted enough to not falsify it when it suits them.

Sufficiently advanced AI can, and eventually will, be capable of creating deepfakes that cant reliably be proven to be false. Every test that can be done to authenticate that media can be used by the AI to select generated media that would pass scrutiny in principle.

I love the optimism and I hope you’re right but I don’t think you are. I think that deepfake AI should scare people a whole lot more than it does.

FaceDeer@kbin.social · 8 months ago

The validity of the chain of custody boils down to the cops and government in general being trusted enough to not falsify it when it suits them.

There are ways to cryptographically validate chain of custody. If we’re in a world where only video with valid chain of custody can be used in court then those methods will see widespread adoption. You also didn’t address any of the other kinds of evidence that I mentioned AI being unable to tamper with. Sure, you can generate a video of someone doing something horrible. But in a world where it is known that you can generate such videos, what jury would ever convict someone based solely on a video like that? It’s frankly ridiculous.

This is very much the typical fictional dystopia scenario where one assumes all the possible negative uses of the technology will work fine but ignore all the ways of being able to counter those negative uses. You can spin a scary sci-fi tale from such speculation but it’s not really a useful way of predicting how the actual future is likely to go.

jungle@lemmy.world · 8 months ago

deleted by creator

Moneo@lemmy.world · 8 months ago

That got me thinking about when we’ll hear the first case of AI generated security camera footage used to frame someone. Which leads me to wonder when it will be standard procedure for cameras to digitally sign their footage.

4z01235@lemmy.world · 8 months ago

Which leads me to wonder when it will be standard procedure for cameras to digitally sign their footage.

https://arstechnica.com/gadgets/2023/10/leicas-9125-camera-automatically-stores-authenticity-proving-metadata/

The first stills camera that digitally signs its photos is here. Leica is part of a tech consortium developing this as a standard and other major photography brands are also members, so hopefully this catches on and becomes standard, and expands to video.

Moneo@lemmy.world · 8 months ago

There it is. Thanks for sharing.

Shyfer@ttrpg.network · 8 months ago

Or imagine politicians like Trump saying the most heinous stuff and then denying it saying it’s fake or AI. How will people know? You won’t even be able to trust your eyes or ears anymore.

Helix 🧬@feddit.de · 8 months ago

Guss we’ll have to resort to digital watermarking with personal certificates then.

ClaireDeLuna@lemmy.world · 8 months ago

Soon the schizophrenics will become neuro-typical

GarbageShoot [he/him]@hexbear.net · 8 months ago

Alright but hear me out: AI-generated odors

LadyLikesSpiders@lemmy.ml · 8 months ago

You know some people are just gonna generate that fucking locker room smell, the reek of hormones and axe body spray, to terrorize people

FooBarrington@lemmy.world · 8 months ago

Tech like this has been available for a number of years, and has most likely already been used against you. It’s now getting available for the broader masses, but that might just be a blessing in disguise, since increased awareness will hopefully also make you suspicious of those cases that are already happening.

rustyriffs@lemmy.world · 8 months ago

There we go, that’s the comment i was looking for. Lol

FaceDeer@kbin.social · 8 months ago

Have you watched a movie, ever? There have always been special effects trickery.

LadyLikesSpiders@lemmy.ml · 8 months ago

Yes, but you could tell they weren’t real. They still needed real voice actors, real sound design, studios and stages and resources. Anyone with a halfway decent rig can fake shit to a very believable degree. Even with CGI you swear is fantastic, you see its fakeness once the novelty wears off

lol3droflxp@kbin.social · 8 months ago

I guess that AI generated stuff will also have some telltale signs of being fake for quite some time if you actually look for it.

LadyLikesSpiders@lemmy.ml · 8 months ago

It’s still not perfect, but it gets exponentially better every day

Touching_Grass@lemmy.world · edit-2 8 months ago

Oh no, don’t make me become skeptical and critical of my environment. Please anything but that

Bebo@lemm.ee · 8 months ago

I want TTS made better with AI so that I won’t need huge audiobooks filling up my phone. The epubs that I already have would serve as audiobooks when needed.

bionicjoey@lemmy.ca · 8 months ago

If your phone is rendering TTS on the fly that’s probably going to be a drain on battery.

Bebo@lemm.ee · 8 months ago

I have frequently used tts for listening to epubs. I have, however, not noticed much battery drain… And it’s not as enjoyable as listening to an audiobook read by a narrator you like but it kind of works to a certain extent. So I wish you tts would get better.

rustyriffs@lemmy.world · 8 months ago

What’s TTS?

lud@lemm.ee · 8 months ago

Text to speech.

milicent_bystandr@lemm.ee · 8 months ago

That sounds pretty cool, though I’d be concerned it will suffer from the classic problem of current AI (…and humans, but that’s by the by) of confident incorrectness. Like an automatic transmission can miss meanings and types of context that a human will spot, programmatically generating speech can probably mess up punctuation and flow - even the way a human reader sometimes will get part way through a sentence and realise they need to start again for it to come out right.

That said, I can’t see it being a big problem for most works, just unfortunate here and there. For once it seems an AI application short on downsides! (Except for the usual economic ones for many people previously trained in the field.)

rustyredox@lemmy.world · 8 months ago

There was a fairly big 40K lore channel on YouTube with a rather good AI impersonation of David Attenborough’s voice and narration style/scripting. However, I just went to check it, yet it must have recently gotten hit with a DMCA and taken down. A shame really. Though I never got into 40K lore before, or the 40K franchise in general, I am a big fan of David Attenborough, and so that ended up really drawing me in to a new literary universe. However, it was a big mistake by the YouTube creator to use the name and photo likeness of Attenborough in the branding, video titles, and thumbnail art on the channel. I think without pushing that line, the AI voice with a clear disclosure could have kept the channel under the legal radar.

From the pinned comments made here, this looks to be the same creators new channel, now using a different voice, no longer based on any one real person:

PipedLinkBot@feddit.rocks · 8 months ago

Here is an alternative Piped link(s):

https://piped.video/@AttenboroughLore

https://piped.video/@Scholarslore

https://piped.video/watch?v=JnbGL8Z6KYg

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

rustyredox@lemmy.world · 8 months ago

Related commentary on the take down:

https://youtu.be/1-PIhsD987I

PipedLinkBot@feddit.rocks · 8 months ago

Here is an alternative Piped link(s):

https://piped.video/1-PIhsD987I

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

maxprime@lemmy.ml · 8 months ago

I’ve been getting into audiobooks in a big way recently. This is interesting but somehow seems off to me. Maybe I’ll try listening to one and have my mind changed. We’ll see!

Gamma@beehaw.org · 8 months ago

Because it has the potential to become actively harmful to the audiobook industry

Akrenion@programming.dev · 8 months ago

And great for accesibility for people who can not read well.

crank@beehaw.org · 8 months ago

I think it is true because if they get the tech right the market could be saturated and voice actors will be in lower demand.

And the situation is already terrible for these workers. >90% of people buy and consume books via Audible which is owned by Amazon. As I’m sure you can guess there is lots of shady stuff going on. Such as (but not limited to) the “Audiblegate” campaign where workers discovered Amazon was engaging massive systemic wagetheft. As situation which is still ongoing to the best of my knowledge.

Audiblegate - main website for “audiblegate” campaign
Audiblegate Campaign: Fair Deal for Rights Holders - The Alliance of Independent Authors
#Audiblegate: ALLi Campaign Update — Self-Publishing Advice Center from the Alliance of Independent Authors
The Truth Behind Audible Subscription Earnings (2023) - this page has detailed information including correspondence, spreadsheets etc if anyone really wants to dive in - however it appears to have been republished from another source which I can’t identify

Some further context about Audible:

Cory Doctorow is a Bestselling Author, but Audible Won’t Carry his Audiobooks - talking about the tech side, DRM and so-called Intellectual property

GoldELox@lemmy.blahaj.zone · 8 months ago

why should i care about the audio book industry? The biggest player is Amazon, it doesn’t add value to the art form, its just another way to become informed, and the more people who have that ability the better.

Gamma@beehaw.org · 8 months ago

Because they are people. There are other options, you don’t have to support Amazon.

PlasterAnalyst@kbin.social · 8 months ago

A lot of audiobook voices are harmful to the industry. Plenty of times I’ve listened to a book for ten minutes and said nevermind because the voice actor was terrible, making wet mouth sounds or their voice was just annoying or the audio quality was terrible.

i_stole_ur_taco@lemmy.ca · 8 months ago

I love the multi book series where they hire a different narrator for certain books and they end up pronouncing names and places differently than the first narrator.

chicken@lemmy.dbzer0.com · 8 months ago

Audiobooks are offputting to me and I strongly prefer to read text, but this seems like a great thing overall for making books more accessible to people. More people experiencing a wider range of books is good.

Zikeji@programming.dev · 8 months ago

Audiobooks have been a great coping mechanism for my ADHD, they’ve also made me a better driver.

For the latter, if I listen to my music I definitely feel a bit more aggressive, whereas if it’s an audiobook (and I’ve given myself sufficient room), I’m much more forgiving.

For the former, I can mix them with menial tasks and it makes them so much more doable.

PipedLinkBot@feddit.rocks · 8 months ago

Here is an alternative Piped link(s):

this AI generated audiobook

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

bonn2@lemm.ee · 8 months ago

There are also a few AI sung songs out there that are pretty good. Most of them sound pretty Autotuny, but to some extent, that can be a style. Aura, by Ghost, is a good example. If I didn’t know it was ai, I would just think it was autotune.

Outdoor_Catgirl [she/her, they/them]@hexbear.net · 8 months ago

Isn’t Ghost and Pals just vocaloid? If I’m thinking of the right ghost(the one who’s songs get used in fandom animation videos). That’s already been a thing for a while.

bonn2@lemm.ee · 8 months ago

That song specifically uses Solaria, which markets itself as “AI” but I’ll admit I’m not 100% sure that isn’t just marketing https://www.eclipsedsounds.com/product-page/solaria-synthesizer-v-ai

BlazingFlames6073@lemdro.id · 8 months ago

This is amazing. I’m the future, I’'d like to try this on old books I’ve read in the past just to check

ddh@lemmy.sdf.org · 8 months ago

Because it’s not a new product.

𝕸𝖔𝖘𝖘@infosec.pub · 8 months ago

It sounds like a generative model to me, but it’s probably the best one I’ve ever heard. Also, thanks for the link! I added it to my listen list!

lightnsfw@reddthat.com · 8 months ago

Personally I don’t consume audiobooks so this doesn’t affect me at all.

rustyriffs@lemmy.world · 8 months ago

Ok? So what if you did consume them. Would you have any thoughts then, so that you can actually contribute a meaningful comment to this topic?

lightnsfw@reddthat.com · 8 months ago

The topic was “Why isn’t everyone talking about AI generated audiobooks?”. Which I answered. Maybe if you spent more time reading yourself you would have comprehended that.

Rhoeri@lemmy.world · 8 months ago

Because they’re shit and shouldn’t exist.

bogdugg@sh.itjust.works · 8 months ago

I’m sympathetic to the view that artists should be paid for their work. Collectively, artists have produced so much, and these tech companies are funnelling all their work into a machine and recycling it into new works, and profiting off that, without any compensation for the people partially responsible for this new reality. I’m also not interested in people who argue “but actually it’s not copying that’s not how the technology works it’s actually a really complic-” yeah I don’t care. Without the artists you would have nothing.

BUT

Don’t confuse the business practices that make this technology a reality with the technology itself. These tools are incredible, and will result in things that could have never existed previously. I just believe we need to have serious conversations about what they mean for our future.

thegreekgeek@midwest.social · 8 months ago

Exactly! I would do unspeakable things for a tool that would let me pop an epub file in and let me tune the voices and audio effects to my liking. I always have some problem or another with the voice actors.