• don@lemm.ee
    link
    fedilink
    arrow-up
    51
    ·
    1 year ago

    It’s the meticulous savers you should worry about. The savers smart enough to automate what they save, and fastidious enough to know every sector of what they’ve saved. Those savers may/may not save the whole galaxy.

    • Taco
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      11 months ago

      I save lots of 2000s kid’s shows, for when my future kids grow up. No telling when they’ll become lost media. I use filebot to automatically rename the files to TVDB standards, and so far I’ve collected 8tb. Do I have a problem?

      • don@lemm.ee
        link
        fedilink
        arrow-up
        7
        ·
        11 months ago

        Do I have a problem?

        As long as you can afford to maintain your repository, no.

  • Belgdore@lemm.ee
    link
    fedilink
    arrow-up
    17
    ·
    11 months ago

    People like this are the reason we will have records of this period of history in a thousand years.

  • LoamImprovement@beehaw.org
    link
    fedilink
    arrow-up
    8
    ·
    11 months ago

    Hold up, does someone know how to save an entire site? I would really like to get the 5e wikidot archived in case Hasbro or whoever wants to shut it down for good.

    • 🇰 🌀 🇱 🇦 🇳 🇦 🇰 ℹ️@yiffit.net
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      11 months ago

      Probably a browser extension these days. I had one back in the late 90’s or early 2000’s that would simply download the page you were on, as well as every page, image, audio file, etc. on every recursive link on that page.

      This was back when most websites had a table of contents link somewhere, though. There are plenty of sites now that don’t link to every page contained on the domain and are only accessible if you manually enter the URL or use dynamically created pages that only exist upon request.

    • anton@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      5
      ·
      11 months ago

      It won’t save everything, but if a script follows every link recursively, most content should be reached that way. That’s kind of what Google does but for one site instead of the internet.

      If there is a search function try very simple queries.

      The alternative of brute forcing links would be unfeasible, even if you are not rate limited by the site, due to the exponential complexity.

      If you want to do something please look into api/scraping etikette like exponential back off.

    • jherazob@beehaw.org
      link
      fedilink
      English
      arrow-up
      5
      ·
      11 months ago

      There’s software that browses to the homepage of a site and starts traversing it all, saving it all in the process

    • zzz@feddit.de
      link
      fedilink
      arrow-up
      3
      ·
      11 months ago

      Link? And where can I upload a PDF* of the site to share with you? tmpfiles.org’s short duration probably won’t cut it…

      *Although I’m certain The Saver™️ would only do full webarchive zips, for us casuals, the PDF export shall do (and be easier in day to day use)

    • rumschlumpel@feddit.de
      link
      fedilink
      Deutsch
      arrow-up
      2
      ·
      11 months ago

      It’s actually pretty selective. Recently tried reading an old webcomic, lots of dead links and the various web archive pages were very incomplete. I’m sure SOMEONE has it saved somewhere, but it doesn’t look like they made it easily available to the general public.