Manual talk:GenerateSitemap.php/Archive

Google webmaster tools
google webmaster tools wants that the full url of each sitemap gz file is listed in the index file. this doesn't happen even when i enter --server parameter to the script. so, this code has to be modified:

function indexEntry( $filename ) { return "\t \n". "\t\t $filename \n". "\t\t {$this->timestamp} \n". "\t \n";

add your site url before the $filename. after that google will not complain about invalid url in the sitemap index file. i hope mediawiki developers address this problem.

Note from BarkerJr: This is the error specified above: "We've detected that a Sitemap you've listed doesn't include the full URL." -BarkerJr 11:56, 15 August 2008 (UTC)


 * FYI, you should edit your url to the location the SITEMAP is saved to I save mine in domain.com/sitemap, so my setup is:


 * -- Ipstenu 14:00, 19 September 2008 (UTC)

Example of usage
I've installed MediaWiki on a separate subdomain, and have set up a cronjob to automatically update the sitemap every hour:

Run: crontab -e

Create a line that looks something like this: */45 * * * * /usr/local/bin/php /home/httpd/public_html/wiki/maintenance/generateSitemap.php wiki.mydomain.com --fspath /home/httpd/public_html/wiki/

Go to Google Webmasters, add your site (e.g. wiki.mydomain.com) and then add the sitemap (e.g. sitemap-index-foo_bar.xml)

On a local windows box using xampp, the command would look something like this:

C:\xampp\php\php.exe c:\mediawiki-1.14.0\maintenance\generateSitemap.php wikisubdomain.mydomain.com --fspath "C:\server\www_public_dir\" --server "wikisubdomain.mydomain.com"

The parts of the command are:
 * 1) initiation of the php executable file/interpreter
 * 2) (the first argument for the command) the php script to be executed (in this case generateSitemap.php)
 * 3) the fspath argument and its value, which tell the script the filesystem path where the sitemap needs to go (on the local machine)
 * 4) the server argument and its value, which tell the script what to use in place of "localhost" if the name cannot be resolved

Options
--help
 * show this message

--fspath= 
 * The file system path to save to, e.g /tmp/sitemap/

--server= 
 * The protocol and server name to use in URLs, e.g.
 * http://en.wikipedia.org. This is sometimes necessary because
 * server name detection may fail in command line scripts.
 * You know you need to use this when the hostname in the sitemap.xml files shows up as "localhost". Use the domain name only, without the protocol prefix (e.g. "http") and without a trailing slash ("/")

--compress=[yes|no]
 * compress the sitemap files, default yes

--Subfader 11:53, 19 March 2008 (UTC)

Meta tags and priority
I use an extension to change meta tags (keywords, description, priority, and robots for follow and index). It's possible to force the priority and index using those tags?--Eloy 00:39, 18 June 2008 (UTC)

Priorities
It'd be nice if there was more fine tuning on the priorities it chooses. I'd like to have newer and more popular articles to have higher priorities. Right now all the priorities are doing is having regular articles be checked more often than talk and user pages, etc. Also Google now complains if a sitemap has the same priority for every page. -Nais 21:10, 2 July 2008 (UTC)
 * I agree! AFAIK when all the pages in sitemaps have the same priority the site itself won't be ranked as high as it should be (as it's more difficult for googlebots to judge what's important and what's not). Is there any kind of solution to rank pages with most edits and newest pages higher than older ones? --83.145.207.200 15:35, 11 November 2008 (UTC)

How I fixed sitemap
So here is how I fixed it - http://forum.appropedia.org/blog/finally-working-mediawiki-sitemap. This is based on the fix from OLPC - http://wiki.laptop.org/go/SEO_for_the_OLPC_wiki/sitemapgen.

Good luck, --LRG 03:49, 24 August 2008 (UTC)

Patch for enabled Server not in trunk?
I tried using the generateSitemap.php script for use with google web tools, and Google came back with the following error:

We've detected that a Sitemap you've listed doesn't include the full URL

I used the comment "Patch for enabled Server" above to patch the generateSitemap.php, except I added a '/' between the server and filename vars:

It seems like this change would be useful in the main branch, except this code will only work if you put the sitemap in the root directory of the server, since there is now way to tell the script what the URL is for the fspath parameter. --Cnovak 21:26, 15 January 2009 (UTC)


 * I made a version with an smpath (sitemap-path) parameter so taht this can be changed, I'll post it here these days. --DaSch 00:07, 16 January 2009 (UTC)
 * By the way. I've put this Patch into mediawiki bugzilla, but thex didn't care about it. That's why it's not in trunk. When you put a / after your servername when starting the skript you have not to change this. --DaSch 00:09, 16 January 2009 (UTC)

What't the URL, then??
With the "Extension:Google Sitemap" the URL I have to give Google for my sitemap is "http://www.pop-cult.net/Wikitainment/sitemap.xml ", but what if I want to use the default sitemap that my mediawiki has?

Btw, is there a way for any of those sitemaps to list more than 500 articles?

I also hate that the "Extension:Google Sitemap" is listing categories and user pages, I only want it to list the regular articles.--187.147.10.114 01:33, 31 March 2009 (UTC)

Bug: Redirect pages are listed
Moved articles continue to be listed in the generated sitemaps, a problem with regard to duplicate content issues for SEO.

It's more of a problem if using the headers extension which changes redirect pages to 301 redirects to the new location. Google throws up a "warning" message when it finds this, wanting you to only list the destination page.

Senseless to list redirect pages. Please modify the script to skip entries if they're redirects.

203.184.10.37 06:18, 7 June 2009 (UTC)