robots.txt lives at /robots.txt and follows a simple line-based syntax: one or more User-Agent: blocks, each with Allow: / Disallow: rules and an optional Sitemap: line. The rules are *advisory* — well-behaved crawlers honour them, but a malicious crawler will ignore the file entirely.

Common mistakes: blocking /api/ or /admin/ but forgetting that paths like /admin-help/ also match the prefix; declaring a sitemap with a different host than the one robots is served from; and using robots to hide a page that is still linked from elsewhere on the web (the page can still appear in search results without a snippet).

robots.txt should never be the security boundary. To keep a page out of search results entirely, return a noindex directive on the page itself — or require authentication.

robots.txt

See how your page handles robots.txt.