Talk:Requests for comment/Clean up URLs

Wary; some notes
I'm naturally wary of such changes. :) A few notes:


 * There's only "no conflict" between "robots.txt" file and "Robots.txt" article due to the historical accident that we by default force the first letter of a page name to capitalized. Please do not rely on this being true in the future; we may well "fix" that one day, and we'd possibly want to rename those files.
 * If we do some sort of massive URL rearrangement, it could break third-party users of our HTML output (including parsed-by-the-API HTML output). For instance I know this would break handling of article-to-article links in the current Wikipedia mobile apps (they would no longer recognize the URLs as article pages, and would probably load them in an external browser instead). This would at the least require some careful planning and coordination.
 * If we're making a rearrangement of URLs, we'll probably have a fun ..... shift... in search engine rankings etc. It might be disruptive.
 * Regarding the index.php .... the primary problem with simply changing everything over to /Article_URL?with=a&query=string is that our robots.txt would no longer be able to exclude those links from spidering. Using a separate prefix means we can very easily chop off all our standard-generated links-with-querystrings that need to be dynamically generated, and make sure that spiders don't squash our servers into dust.
 * Using action paths (eg /edit/Article_name, etc) would provide a nice readable URL without damaging that. However this existing support doesn't cover the case of things like old revisions ('?oldid=123') etc, which default to $wgScript

-- brion (talk) 18:24, 16 September 2013 (UTC)


 * I personally think that forcing the two articles about robots.txt and favicon.ico to be capitalized is an acceptable trade-off. This does not prevent us from using lower-case titles in general (which we already support).
 * I agree that we'd have to coordinate with third-party users. Some of the users of the PHP parser's HTML are also preparing to use Parsoid output, which uses relative URLs everywhere. One of these users is Google. Since we are in contact with the Google folks and can contact other search engines too we can probably avoid issues with ranking changes.
 * Re robots.txt: At least Google, MSN, Slurp (Yahoo) and Yandex support globbing. I have been using this with success for many years, and sites like Quora do the same. -- Gabriel Wicke (GWicke) (talk) 19:08, 16 September 2013 (UTC)