Manual talk:Robots.txt

Don't index vs don't spider
"The only way to keep a URL out of Google's index is to let Google slurp the page and see a meta tag specifying robots="noindex". With our current system, this would be difficult to special case."


 * As nonexistent articles mostly bring up an edit page, can we not just set that robots="noindex" meta tag on the edit page HTML template? This way, the meta tag would be there on all edit pages, so none of them will get indexed. Ropers 18:15, 28 Aug 2004 (UTC)


 * We already do. The issue discussed above is that Google returns search results including URLs that are forbidden by robots.txt. Because they are forbidden by robots.txt, Google does not spider the pages and does not see the meta tag. --Brion VIBBER 21:19, 28 Aug 2004 (UTC)


 * Ah. I misunderstood earlier. But then, can we not just do away with any mention of edit pages in robots.txt (which is what I think was proposed above by "letting Google slurp the page")? Ropers 21:30, 28 Aug 2004 (UTC)


 * This would require making all edit URLs have a distinct prefix which can be excluded from the disallow line in robots.txt. Possible but needs to do some reworking to some functions. --Brion VIBBER 00:35, 29 Aug 2004 (UTC)