• MrSoup
    link
    fedilink
    arrow-up
    28
    ·
    1 month ago

    I doubt Google respects any robots.txt

    • DaGeek247@fedia.io
      link
      fedilink
      arrow-up
      27
      ·
      1 month ago

      My robots.txt has been respected by every bot that visited it in the past three months. I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

      I’ve only gotten like, 20 visits in the past three months though, so, very small sample size.

      • mozz@mbin.grits.dev
        link
        fedilink
        arrow-up
        14
        ·
        1 month ago

        I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

        This is fuckin GENIUS

        • Moonrise2473@feddit.it
          link
          fedilink
          arrow-up
          8
          ·
          1 month ago

          only if you don’t want any visits except from yourself, because this removes your site from any search engine

          should write a “disallow: /juicy-content” and then block anything that tries to access that page (only bad bots would follow that path)

            • Moonrise2473@feddit.it
              link
              fedilink
              arrow-up
              3
              ·
              1 month ago

              Oops. As a non-native English speaker I misunderstood what he meant. I understood wrongly that he set the server to ban everything that asked for robots.txt

              • Zoop@beehaw.org
                link
                fedilink
                arrow-up
                2
                ·
                1 month ago

                Just in case it makes you feel any better: I’m a native English speaker who always aced the reading comprehension tests back in school, and I read it the exact same way. Lol! I’m glad I wasn’t the only one. :)

          • mozz@mbin.grits.dev
            link
            fedilink
            arrow-up
            5
            ·
            1 month ago

            You need to read again the thing that was described, more carefully. Imagine for example that by “a page,” the person means a page called /juicy-content or something.

      • thingsiplay@beehaw.org
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        1 month ago

        Interesting way of testing this. Another would be to search the search machines with adding site:your.domain (Edit: Typo corrected. Off course without - at -site:, otherwise you will exclude it, not limit to.) to show results from your site only. Not an exhaustive check, but another tool to test this behavior.

    • Moonrise2473@feddit.it
      link
      fedilink
      arrow-up
      10
      ·
      1 month ago

      for common people they respect and even warn a webmaster if they submit a sitemap that has paths included in robots.txt