Manual talk:Robots.txt

Excluding a namespace[edit]

Latest comment: 10 years ago1 comment1 person in discussion

I guess it would be pretty easy to exclude a namespace, right? E.g., if you wanted to establish an unsearchable trash namespace:

 Disallow: /wiki/Trash:
 Disallow: /wiki/Trash talk:

Leucosticte (talk) 11:52, 19 October 2013 (UTC)Reply

/api.php[edit]

Latest comment: 6 years ago1 comment1 person in discussion

Mention if /api.php should be disallowed too. Jidanni (talk) 17:41, 6 December 2017 (UTC)Reply

robots.txt + site root level Short URL[edit]

Latest comment: 5 years ago1 comment1 person in discussion

For wiki's with site root level Short URL, another user suggests:

User-agent: *
Disallow: /index.php

I'm liking this suggestion, above. My wiki has short URLs implemented. So all the page links look to robots like example.com/Some_page. And all the action links look like example.com/index.php?title=Page_name&action=edit etc. So robots.txt can Disallow: /index.php for all User-agents: * and Google spiders etc will not crawl the action pages but will crawl all the normal page links because my Short URL LocalSettings have caused those to look like this: example.com/Some_page. Am I correct? --Rogerhc (talk) 05:41, 11 January 2019 (UTC)Reply

MediaWiki:Robots.txt[edit]

Latest comment: 7 months ago2 comments2 people in discussion

Wikipedia apparently is able to customize their robots.txt via w:MediaWiki:Robots.txt; on this wiki, MediaWiki:Robots.txt does not exist, but is clearly functional in the same way, if the default content there is any indication. However, this does not seem to be a default of the MediaWiki software - checking a handful of third-party wikis, their MediaWiki:Robots.txt pages do not have this default content. Looking around, there's no obvious documentation on how to set this up, and nothing jumps out on Special:Version either, so how is this done? Can someone add some documentation on this page? 「ディノ奴千？！」^{☎ Dinoguy1000} 00:11, 30 September 2021 (UTC)Reply

Old thread but really interesting to avoid to have an robots.txt discoverable in the directory structure. I will research this, @Dinoguy1000, if I discover something I will try to update the documentation. Thank you for pointing this useful resource! Ivanhercaz (talk) 09:52, 15 September 2023 (UTC)Reply