Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

return2ozma@lemmy.world · 11 months ago

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

Admiral Patrick@dubvee.org · 11 months ago

Also Microsoft…

Microsoft warns deepfake election subversion is disturbingly easy

I know the genie’s out of the bottle, but goddamn.

Etterra@lemmy.world · 11 months ago

Microsoft: I know this will only be used for evil, but I’ll be damned if I’m gonna pass up on the hype-boost to my market share.

Every other big corp: same!

slaacaa@lemmy.world · 11 months ago

“At long last, we have created the Torment Nexus from classic sci-fi novel Don’t Create The Torment Nexus”

brown567@sh.itjust.works · 11 months ago

Can we maybe stop making these? XD

NoRodent@lemmy.world · 11 months ago

The Pantser@lemmy.world · 11 months ago

This coming from the guy who turned himself into a fly for fun

SatansMaggotyCumFart@lemmy.world · 11 months ago

It’s not his fault earth girls are easy.

umbrella@lemmy.ml · 11 months ago

ask dna doo doo doo doo

MysticKetchup@lemmy.world · 11 months ago

Like what even is a legitimate use case for these? It just seems tailor made for either misinformation or pointless memes, neither of which seem like a good sales pitch

Deceptichum@sh.itjust.works · edit-2 11 months ago

I could see a few uses, but the biggest would probably be advertising. Tailored ads that look like they’re coming from a real person.

Imagine Jake from State Farm addressing you personally about your insurance in an ad.

Not that I endorse advertising, I’d like to see it all banned.

I think it could be useful to humanise some things though and talking to a “person” AI in a video call might be more comfortable for some people wanting to do tasks such as say navigate my mobile phone carriers shitty AI help system.

Really any sort of AI assistant device could benefit from a human imprint.

SatansMaggotyCumFart@lemmy.world · 11 months ago

Imagine your dead relative selling you extended warranty for your vehicle.

Poplar?@lemmy.world · 11 months ago

Deepfakes are being used to personalize political messages in India, here’s a fun article on it which also points out an instance all the way back from 2020: https://www.aljazeera.com/news/2024/2/20/deepfake-democracy-behind-the-ai-trickery-shaping-indias-2024-elections

It also mentions using deepfakes to target constituencies speaking different languages, to defame opposing parties, and even creating deepfakes to cast doubt on legitimate videos:

Ahead of the state election in November, the caller requested that Jadoun alter a problematic but authentic video of their candidate – whose party he did not disclose – to make a realistic deepfake. The aim: to claim that the original was a deepfake, and the deepfake the original.

Deceptichum@sh.itjust.works · 11 months ago

In Australia they handed out fliers designed to look like an official government department, and targeted Chinese speaking communities who might not notice the difference.

Dodgy politicians will use anything, the solution is to go after them for doing it rather than focusing on the method because they’ll just find another method if they don’t get stopped.

chiisana@lemmy.chiisana.net · 11 months ago

Say you’re a movie studio director making the next big movie with some big name celebs. Filming is in progress, and one of the actor dies in the most on brand way possible. Everyone decides that the film must be finished to honor the actor’s legacy, but how can you film someone who is dead? This technology would enable you to create footage the VFX team can use to lay over top of stand-in actor’s face and provide a better experience for your audience.

I’m sure there are other uses, but this one pops to mind as a very legitimate use case that could’ve benefited from the technology.

MysticKetchup@lemmy.world · 11 months ago

We’ve already recreated dead actors or older actors whole cloth with VFX. Plus it still seems like a niche use case for something that can be done by VFX artists that can also do way more

chiisana@lemmy.chiisana.net · 11 months ago

Having done something before doesn’t mean they shouldn’t find ways to make it better though. The “deepfake”-esque techniques can provide much better quality replicas. Not to mention, as resolution demand increases, it would be harder to leverage older assets and techniques to meet the new demands.

Another similar area is what LLM is doing to/for developers. We already have developers, why do we need AI to code? Well, they can help with synthesizing simpler code and freeing up devs to focus on more complicated problems. They can also democratize the ability to develop solutions to non-developers, just like how the deepfake solutions could democratize content creation for non/less-skilled VFX specialists, helping the industry create better content for everyone.

conciselyverbose@sh.itjust.works · 11 months ago

They can also democratize the ability to develop solutions to non-developers,

This is insane. If you don’t understand everything a piece of code is doing, publishing it is insanely reckless. You absolutely must know how to code to publish acceptable software.

chiisana@lemmy.chiisana.net · 11 months ago

Try telling that to businesses. Sadly, you’d more likely to be laughed all the way to the door as opposed to being taken seriously. For the non technical people leading businesses, they’d rather something working 90% of the time today than 100% of the time next week.

ramirezmike@programming.dev · 11 months ago

this is so dystopian. Imagine spending your career honing your skill as an actor, dying and then having a computer replace you with just a photograph as a source. How is that honoring an actor??

An actual, practical example is generating video for VR chats like Apple has somewhat tried to do with their headset. Rather than using the cameras/sensors to generate and animate a 3d model based on you, it could do something more like this, albeit 2d.

Pheonixdown@lemm.ee · 11 months ago

Gotta crank up that dystopia meter.

This is slowly moving toward having Content On Demand. Imagine being able to prompt your content app for a movie/series you want to watch, and it just makes it and streams it to you.

frezik@midwest.social · 11 months ago

Maybe a historical biopic in the style of photos of the time. Like take pictures of Lincoln, Grant, Lee, etc., use voice actors plus modern reenactors for background characters, and build it into a whole movie.

I dunno, I’m probably reaching.

fartsparkles@sh.itjust.works · 11 months ago

deleted by creator

RageAgainstTheRich@lemmy.world · 11 months ago

If they use the speech tech on top of it, you wouldn’t even know if you were talking to the person you think you are.

Even_Adder@lemmy.dbzer0.com · 11 months ago

I think you’re falling for the overblown fearmongering headline, and pointless memes is a great reason to make things.

Jimmycakes@lemmy.world · 11 months ago

Avatars for ugly people who are good at games and want to get into streaming

dhork@lemmy.world · 11 months ago

Vasa? Like, the Swedish ship that sank 10 minutes after it was launched? Who named that project?

Jimmycakes@lemmy.world · 11 months ago

They developed an ai to name all future ai. Ironically it is unnamed.

dumbass@leminal.space · 11 months ago

There are a lot of flying vehicles named after birds who famously plummet to the ground at breakneck speeds.

hakunawazo@lemmy.world · 11 months ago

No, like the crispbread.

redcalcium@lemmy.institute · 11 months ago

Combine this with an LLM with speech-to-text input and we could create a talking paintings like in harry potter movies. Heck, hang it on a door and hook it with smart lock to recreate the dorm doors in harry potter and see if people can trick it to open the door.

NotMyOldRedditName@lemmy.world · 11 months ago

Any sufficiently advanced technology is indistinguishable from magic.

Harry Potter wasn’t a fantasy movie, it was a SciFi and we just didn’t know it.

MuchPineapples@lemmy.world · 11 months ago

It was midichlorians all along.

dumbass@leminal.space · 11 months ago

You’re a Jedi 'Arry!

Pope-King Joe@lemmy.world · 11 months ago

Imma wot?

Flying Squid@lemmy.world · 11 months ago

I like your optimism where this doesn’t result in making everything worse.

chatokun@lemmy.dbzer0.com · 11 months ago

I was actually discussing this very idea with my brother, who went to the Wizarding World of Harry Potter at Universal Studios, Orrrlandooooo recently and while he enjoyed himself, said it felt like not much is new in theme parks nowadays. Adding in AI driven pictures you could actually talk to might spice it up.

PipedLinkBot@feddit.rocks · 11 months ago

Here is an alternative Piped link(s):

Wizarding World of Harry Potter at Universal Studios, Orrrlandooooo

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

Ms. ArmoredThirteen@lemmy.ml · 11 months ago

These vids are just off enough that I think doing a bunch of mushrooms and watching them would be a deeply haunting experience

Dozzi92@lemmy.world · edit-2 11 months ago

So essentially the music video for Drugs by Ratatat.

PipedLinkBot@feddit.rocks · 11 months ago

Here is an alternative Piped link(s):

music video for Drugs by Ratatat

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

return2ozma@lemmy.world · 11 months ago

The first video her bottom teeth shift around.

MeekerThanBeaker@lemmy.world · 11 months ago

This is why I don’t post my picture online and I never talk to anyone ever, while hiding my head inside a nylon stocking (unrelated).

11 months ago

werefreeatlast@lemmy.world · 11 months ago

Freddie, this is your mom. Look all I want for my birthday is for you to please start using teams new. It’s so much better than teams classic. I alread… Microsoft already installed it for you. Okay honey? And could you also start using a microsoft.com account so you can get financially hooked like all the Gmail users? It’s pretty smart. Don’t you want to be smart like Jonny? Tata!

ReallyActuallyFrankenstein@lemmynsfw.com · 11 months ago

I mean, I know it’s scary, but I’ll admit it is impressive, even when I watched it with jaded “every day is another AI breakthrough” exhaustion.

The subtle face movements, eyebrow expression, everything seems to correctly infer how the face would articulate those specific words. When you think of how many decades something like this would be in the uncanny valley even with a team of trained people hand -tweaking the image and video, and this is doing it better in nearly every way, automatically, with just an image? Insane.

kromem@lemmy.world · edit-2 11 months ago

It’s pretty wild that this is the tech being produced by the trillion dollar company who has already been granted a patent on creating digital resurrections of dead people from the data they left behind.

So we now already have LLMs that could take what you said and say new things that seem like what you would have said, take a voice sample of you and create new voice synthesis of that text where it sounds a lot like you were actually saying it, and can take a photo of you and make a video where you legit look like you are saying that voice sample with facial expressions and all.

And this could be done for anyone who has a social media profile with a few dozen text posts, a profile photo, and a 15 second sample of their voice.

I really don’t get how every single person isn’t just having a daily existential crisis questioning the nature of their present reality given what’s coming.

Do people just think the current trends aren’t going to continue, or just don’t think about the notion that what happens in the future could in fact have been their own nonlocal past?

It reminds me of a millennia old saying by a group that were claiming we were copies in the images of original humans: “you do not know how to examine the present moment.”

Edit - bonus saying on the topic: “When you see your likeness, you are happy. But when you see your images that came into being before you and that neither die nor become visible, how much you will have to bear!”

11 months ago

And you can run it on a single 4090, that’s crazy.

mPony@lemmy.world · 11 months ago

uh, are graphics cards supposed to be 2500 bucks? (I play boardgames)

techt@lemmy.world · 11 months ago

Crypto did unfortunate things to the space.

11 months ago

What I don’t understand is why they didn’t go back down when crypto moved to proof of stake. Fuck AMD and Nvidia for price fixing.

antlion@lemmy.dbzer0.com · 11 months ago

Since it’s trained on celebrities, can it do ugly people or would it try to make them prettier in animation?

The teeth change sizes, which is kinda weird, but probably fixable.

It’s not too hard to notice for an up close face shot, but if it was farther away it might be hard - the intonation and facial expressions are spot on. They should use this to re-do all the digital faces in Star Wars.

venusaur@lemmy.world · 11 months ago

One photo? That’s incredible.

Wild Bill@midwest.social · 11 months ago

Yeah. Incredibly horrific.

T00l_shed@lemmy.world · 11 months ago

Yes I hate what AI is becoming capable of. Last year everyone was laughing at the shitty fingers, but were quickly moving past that. I’m concerned that in the near future it will be hard to tell truth from fiction.

BetaDoggo_@lemmy.world · 11 months ago

The “why would they make this” people don’t understand how important this type of research is. It’s important to show what’s possible so that we can be ready for it. There are many bad actors already pursuing similar tools if they don’t have them already. The worst case is being blindsided by something not seen before.

spiderman@ani.social · 11 months ago

how important this type of research

I hope they also figure a way to find the bad actors who might use their tools for harmful purposes. You can’t just create something for “research” purposes like this and not find a way to stop bad actors using these for harmful purposes.

AnAnonymous@lemm.ee · 11 months ago

Paranoia vibes starting in 3, 2, 1…

I_Miss_Daniel@lemmy.world · 11 months ago

Feed it Microsoft Merlin. What will happen?

Jesus@lemmy.world · 11 months ago

Microsoft’s research teams always makes some pretty crazy stuff. The problem with Microsoft is that they absolutely suck at translating their lab work into consumer products. Their labs publications are an amazing archive of shit that MS couldn’t get out the door properly or on time. Example - multitouch gesture UIs.

As interesting as this is, I’ll bet MS just ends up using some tech that Open AI launches before MS’s bureaucratic product team can get their shit together.