A new report from plagiarism detector Copyleaks found that 60% of OpenAI’s GPT-3.5 outputs contained some form of plagiarism.

Why it matters: Content creators from authors and songwriters to The New York Times are arguing in court that generative AI trained on copyrighted material ends up spitting out exact copies.

  • dan1101@lemm.ee
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    8 months ago

    Isn’t that basically what the current LLM AI fundamentally does? Just digests a bunch of text and gives a summary back in response to queries?

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      8 months ago

      No. It’s not really clear what LLMs do, but it certainly depends on context.

      What they fundamentally do is continue a text. That’s what they were originally trained to do. Then they were fine-tuned to continue a chat log or respond to an instruction. To be able to do that, they have learned a lot. Unfortunately, we do not know what.

      If you ask for a summary of some text, it will give you one; regardless of whether the text even exists.

      The summary could be one written by a human that it has memorized. Or it could be complete nonsense, that it is making up on the fly. You never know.