• farting_weedman [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 months ago

    No, robots.txt doesn’t solve this problem. Scrapers just ignore it. The idea behind robots.txt was to be nice to the poor google web crawlers and direct them away from useless stuff that it was a waste to index.

    They could still be fastidious and follow every link, they’d just be ignoring the “nothing to see here” signs.

    You beat scrapers with recursive loops of links that start from 4pt black on black divs whose page content isn’t easily told apart from useful human created content.

    Traps and poison, not asking nicely.