Topic on Project:Support desk

Robots.txt to prevent crawling Special pages

4
Summary by Jonathan3

Disallow: /Special:*

Jonathan3 (talkcontribs)

My article pages are at www.example.com/Page_name

Special pages are at www.example.com/Special:Page_name e.g. www.example.com/Special:Drilldown

I've seen Manual:Robots.txt#With_short_URLs but I'm not sure the example there is exactly what I need. Can anyone help?

In fact, it would probably be good to prevent crawling of all non-article pages. Some of my Category pages are quite resource intensive as they have DPL and/or Cargo queries.

Bawolff (talkcontribs)

major search engines support * wildcards in robots.txt files i believe, which can be used for this purpose. I think wikipedia does so

Jonathan3 (talkcontribs)

Thanks. Does this look OK?

User-agent: *
Disallow: /Special:*
Disallow: /index.php*
Jonathan3 (talkcontribs)