Manual:Search engine optimization

From mediawiki.org

MediaWiki search engine optimization (or SEO) techniques attempt to affecting the visibility of a wiki or its pages in a search engine's "natural" or un-paid ("organic") search results. Short URLs can be used for this purpose, as can search engine optimization extensions.

Robots

There are ways that external search engine robots, like ht://Dig, can be configured to allow efficient searching and spidering of MediaWiki-based systems.

Whole pages

Use the Robots.txt file to tell spiders what pages to index and not. Take a look at Wikipedia's own robots.txt.

You probably want spiders to not follow index.php dynamic pages, just the basic /wiki/article content pages. MediaWiki already outputs <meta name="robots" content="noindex,nofollow" /> in the HTML of "Edit this page" and "History" pages.

Specific sections of pages

You don't want search engines to index all the boilerplate on pages in the navigation sidebar and footer. Otherwise searching for "privacy" or "navigation" will return every single page.

The old way to do this was to put ‎<NOSPIDER>...‎</NOSPIDER> around such HTML sections. This is invalid XHTML unless you declare a namespace it, but I don't know whether search engines still looking for nospider will handle e.g. ‎<i:NOSPIDER>. Google instead uses comments for the same purpose: <!--googleoff: index--> ... <!--googleon: index-->.

A custom skin could output these tags around boilerplate. Googling English wikipedia for 'privacy' returns 2,920,000 pages! (See also bugzilla:5707, which was declined.)

External links