Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

  • dustbunnies [she/her, comrade/them]@hexbear.net
    link
    fedilink
    English
    arrow-up
    26
    ·
    edit-2
    7 days ago

    as much as the speech-to-text gets wrong on my phone, I can only imagine what it does with doctors’ notes.

    one of my million previous jobs was in medical transcription, and it is so easy to misunderstand things even when you have a good grasp of specialty-specific terminology and basic anatomy.

    they enunciate the shit they’re recording about your case about as well as they legibly write. you really have to get a feel for a doctor’s speaking style and common phrases to not turn in a bunch of errors.

    But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

    internet-delenda-est

    Edit: oh yeah, ✨ innovation ✨

    While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper.

    Edit 2: it gets better and better

    In an example they uncovered, a speaker said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.”

    But the transcription software added: “He took a big piece of a cross, a teeny, small piece … I’m sure he didn’t have a terror knife so he killed a number of people.”

    A speaker in another recording described “two other girls and one lady.” Whisper invented extra commentary on race, adding “two other girls and one lady, um, which were Black.”

    In a third transcription, Whisper invented a non-existent medication called “hyperactivated antibiotics.”

    Edit 3: wonder if the Organ Procurement Organizations are going to try to use this to blame for the extremely fucked up shit that’s been happening

  • UlyssesT [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    19
    ·
    edit-2
    7 days ago

    Of course it does, because all it does is regurgitate things shoved into it without any “intelligence” to make meaningful judgements on those things.

    It’s a planet-burning solution constantly being marketed to problems it doesn’t help or even makes worse. There are some use cases for the technology but applying it everywhere, especially coercively or for corporate surveillance state panopticon reasons, is ruinously stupid.

  • WhatDoYouMeanPodcast [comrade/them]@hexbear.net
    link
    fedilink
    English
    arrow-up
    16
    ·
    7 days ago

    I managed to discover that with unassisted casual use really quickly. People are asleep at the wheel if they tried to give important duties to an AI. You don’t let a dog drive your car and hope for the best

  • UmbraVivi [he/him, she/her]@hexbear.net
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    7 days ago

    At least crypto wasn’t this annoying. You could just point and laugh from the outside. AI is being shoved into everything and makes anything it touches significantly worse.

  • fubarx@lemmy.ml
    link
    fedilink
    English
    arrow-up
    6
    ·
    7 days ago

    That explains why the proctologist kept insisting I needed breast augmentation surgery.

  • BabaIsPissed [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    6
    ·
    7 days ago

    This is fucked, you don’t use a black box approach in anything high risk without human supervision. Whisper probably could be used to help accelerate a transcriptions done by an expert, maybe some sort of “first pass” that needs to be validated, but even then it might not help speed things up and might impact quality (see coding with copilot). Maybe also use the timestamp information for some filtering of the most egregious hallucinations, or a bespoke fine-tuning setup (assuming it was fine-tuned it the first place)? Just spitballing here, I should probably read the paper to see what the common error cases are.

    It’s funny, because this is the openAI model I had the least cynicism towards, did they bazinga it up when I wasn’t looking?