It’s all made from our data, anyway, so it should be ours to use as we want

  • NoForwardslashS@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    3
    ·
    17 hours ago

    But wouldn’t that mean making it open source, then it not functioning properly without the data while open, would prove that it is using a huge amount of unlicensed data?

    Probably not “burden of proof in a court of law” prove though.

    • Bronzebeard@lemm.ee
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      17 hours ago

      Making it open source doesn’t change how it works. It doesn’t need the data after it’s been trained. Most of these AIs are just figuring out patterns to look for in the new data it comes across.

      • NoForwardslashS@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        3
        ·
        17 hours ago

        So you’re saying the data wouldn’t exist anywhere in the source code, but it would still be able to answer questions based on the data it has previously seen?

        • stephen01king
          link
          fedilink
          English
          arrow-up
          15
          ·
          16 hours ago

          That is how LLM works, they don’t store the data as data, but as weight values.

          • NoForwardslashS@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            1
            ·
            15 hours ago

            So then why, if it were all open sourced, including the weights, would the AI be worthless? Surely having an identical but open source version, that would strip profitability from the original paid product.

            • Bronzebeard@lemm.ee
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              1
              ·
              13 hours ago

              It wouldn’t be. It would still work. It just wouldn’t be exclusively available to the group that created it-any competitive advantage is lost.

              But all of this ignores the real issue - you’re not really punishing the use of unauthorized data. Those who owned that data are still harmed by this.

              • stephen01king
                link
                fedilink
                English
                arrow-up
                2
                ·
                12 hours ago

                It does discourages the use of unauthorised data. If stealing doesn’t give you competitive advantage, it’s not really worth the risk and cost of stealing it in the first place.

    • bloup@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      16 hours ago

      in civil matters, the burden of proof is actually usually just preponderance of evidence and not beyond a reasonable doubt. in other words to win a lawsuit, you only need to have more compelling evidence than the other person.

      • just_another_person@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        16 hours ago

        But you still have to have EVIDENCE. Not derivative evidence. The output of a model could be argued to be hearsay because it’s not direct evidence of originating content, it’s derivative.

        You’d have to have somebody backtrack generations of model data to even find snippets of something that defines copyright material, or a human actually saying “Yes, we definitely trained on unlicensed data”.

        • bloup@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          3
          ·
          16 hours ago

          so like I am not making any comment on anything but the legal system here. but it’s absolutely the case that you can win a lawsuit on purely circumstantial evidence if the defense is unable to produce a compelling alternative set of circumstances which can lead to the same outcome.