Blocking AI bots from Microsoft, others has been “pain in the a**”: Reddit CEO | Huffman says companies must pay to scrape Reddit data even though Reddit itself relies on free, user-generated content

ForgottenFlux@lemmy.world · edit-2 6 months ago

Blocking AI bots from Microsoft, others has been “pain in the a**”: Reddit CEO | Huffman says companies must pay to scrape Reddit data even though Reddit itself relies on free, user-generated content

rbits@lemm.ee · 6 months ago

I don’t think they actually block malicious bots, the change they’ve made is just to the robots.txt, they don’t have to do anything.

tb_@lemmy.world · 6 months ago

Robots.txt does literally nothing. It’s a piece of courtesy that’s easily ignored if you don’t care.

rbits@lemm.ee · edit-2 6 months ago

Yeah but it stops bing and a bunch of AI scrapers that want to act like they’re following the rules

Echo Dot@feddit.uk · 6 months ago

How do we know it stops bing? As far as anyone knows they could have instructed their programmers to alter the crawlers so that it ignores robots.txt when on Reddit - that should have taken them a whole 2 minutes.

Reddit blocking any search crawler via robots.txt is such a non thing that it shouldn’t even be reported.

rbits@lemm.ee · edit-2 6 months ago

Ok but they do respect it, we know that https://searchengineland.com/microsoft-confirms-reddit-blocked-bing-search-444385

They even have a page telling you how to use it https://blogs.bing.com/webmaster/June-2008/Robots-Exclusion-Protocol-joining-together-to-pro/