Manual talk:Robots.txt

Excluding a namespace
I guess it would be pretty easy to exclude a namespace, right? E.g., if you wanted to establish an unsearchable trash namespace: Leucosticte (talk) 11:52, 19 October 2013 (UTC)

/api.php
Mention if /api.php should be disallowed too. Jidanni (talk) 17:41, 6 December 2017 (UTC)

robots.txt + site root level Short URL
For wiki's with site root level Short URL, another user suggests:

User-agent: * Disallow: /index.php

I'm liking this suggestion, above. My wiki has short URLs implemented. So all the page links look to robots like example.com/Some_page. And all the action links look like example.com/index.php?title=Page_name&action=edit etc. So robots.txt can Disallow: /index.php for all User-agents: * and Google spiders etc will not crawl the action pages but will crawl all the normal page links because my Short URL LocalSettings have caused those to look like this: example.com/Some_page. Am I correct? --Rogerhc (talk) 05:41, 11 January 2019 (UTC)

MediaWiki:Robots.txt
Wikipedia apparently is able to customize their robots.txt via w:MediaWiki:Robots.txt; on this wiki, MediaWiki:Robots.txt does not exist, but is clearly functional in the same way, if the default content there is any indication. However, this does not seem to be a default of the MediaWiki software - checking a handful of third-party wikis, their MediaWiki:Robots.txt pages do not have this default content. Looking around, there's no obvious documentation on how to set this up, and nothing jumps out on Special:Version either, so how is this done? Can someone add some documentation on this page? 「 ディノ 奴 千？！ 」☎ Dinoguy1000 00:11, 30 September 2021 (UTC)


 * Old thread but really interesting to avoid to have an  discoverable in the directory structure. I will research this, @Dinoguy1000, if I discover something I will try to update the documentation. Thank you for pointing this useful resource! Ivanhercaz (talk) 09:52, 15 September 2023 (UTC)