• GregorGizeh
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    6 months ago

    Pretty sure that cutoff is often used because after that point ai generated content started to appear much more frequently, and the training data becomes corrupted.

    • dislocate_expansion@reddthat.comB
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      6 months ago

      For sure, this makes a lot of sense

      Side note, might be useful to use poison pills on our data here, since this could be scraped in the future without consent