Manual talk:Pywikibot/Use on third-party wikis

Working LDAP config
I fought with this for a while, you need to set self.ldapDomain to equal your domain
 * 1) -*- coding: utf-8  -*-

import family


 * 1) The official ABC Wiki.

class Family(family.Family):

def __init__(self):

family.Family.__init__(self)

self.name         = 'WikiNational' self.langs        = { 'en':         'WikiNational', } self.namespaces[4] = { '_default': u'WikiNational',       } self.namespaces[5] = { '_default': u'WikiNational Talk',  } self.ldapDomain       = 'yourdomain.local'

def version(self, code): return "1.6.1"

def path(self, code): return '/wiki/index.php' --Mellerbeck 21:43, 21 May 2009 (UTC)

Problem
Im not sure if this is to do with some setting some where but im not able to submit the form correctly.

The responce i get from response = conn.getresponseis the entire web page? Is that correct?

whats wrong
wont work with me.... for my projekt at www.pflegewiki.de

i created user-config.py including mylang = 'de' username = 'ElektroZivi' family = 'pflegewiki'

and a "pflegewiki_family.py" including:
 * 1) -*- coding: utf-8  -*-

import family


 * 1) The meta family

class Family(family.Family): name = 'PflegeWiki' def __init__(self): self._addlang('de',                       location = 'www.pflegewiki.de',                        namespaces = { 4: u'PflegeWiki',                                      5: u'Diskussion' })

if i run the login.py from commandline python login.py it returns the message: Login failed. Wrong password?

but the pwd and username are correct (i checked this)!!

what could help ?! -Produnis 15:35, 2 Mar 2005 (UTC)

answer
modify your "pflegewiki_family.py" like this:
 * 1) -*- coding: utf-8  -*-

import family


 * 1) The meta family

class Family(family.Family): name = 'PflegeWiki' def __init__(self): self._addlang('de',                       location = 'www.pflegewiki.de',                        namespaces = { 4: u'PflegeWiki',                                      5: u'Diskussion' })

def path(self, code): return '/index.php'

should work then

wont work for me neither
The script claimns that my password is wrong but when I use the url (login and passord) from the config in a browser everything works. The family looks like.
 * 1) -*- coding: utf-8  -*-

import family

class Family(family.Family): name = 'fridewiki' #Set the family name; this should be the same as in the filename.

def __init__(self): family.Family.__init__(self) self.langs = { 'de':'vierzig4.dyndns.org', }

def version(self, code): return "1.4.2" #The MediaWiki version used. Not very important in most cases.

def path(self, code): return '/mediawiki/index.php' #The path of index.php

did i forget something?
 * I used to have the same problem.Later I solved it. It's caused by the wrong path(sel,code) setting. Make sure http://yousitename.com/mediawiki/index.php could point the real index.php file --Farm 11:01, 14 February 2006 (UTC)

doesn't work
I just downloaded the latest CSV snapshot and it just doesn't work - I've got the user configuration file there and the family file in the sub directory and it just throws this when I try python login.py

Checked for running processes. 1 processes currently running, including the current process. Traceback (most recent call last): File "login.py", line 218, in ? main File "login.py", line 213, in main loginMan = LoginManager(password, sysop = sysop) File "login.py", line 79, in __init__ raise wikipedia.NoUsername(u'ERROR: Username for %s:%s is undefined.\nIf you have an account for that site, please add such a line to user-config.py:\n\nusernames[\'%s\'][\'%s\'] = \'myUsername\'' % (self.site.family.name, self.site.lang, self.site.family.name, self.site.lang)) wikipedia.NoUsername: ERROR: Username for None:en is undefined. If you have an account for that site, please add such a line to user-config.py:

usernames['None']['en'] = 'myUsername'

Whats annoying is that my installation was working but now it throws errors about character sets so I down loaded the latest files to see if that fixed it


 * Probably not everybody else is as stupid as I am, but check that all your cases are consistent - i.e. if you've used lowercase in the filename, use lowercase when you refer to the family. Also, check what case your login uses. Just a tip that would have saved me about an hour... :$

Or Me
I created the user-config.py with the 3 lines in it needed to work with a non Wiki project. I created the family file and tried to login and I get Please create a file user-config.py, and put in there:

One line saying "mylang='language'" One line saying "usernames['wikipedia']['language']='yy'"

...filling in your username and the language code of the wiki you want to work

If I take the family line out of the user-config.py file then I get a password error.

this is using snapshot-20051221 from Sourceforge

user-config.py Not Found
I am having the same problem. Please post solution as soon as anyone finds it. I am using snapshot 20060312. ---20:01, 12 June 2006 (UTC)

Answer
I had the same trouble. Finally found the answer: Make sure you put the statement  AFTER this statement:   -- Barrylb 22:45, 17 July 2006 (UTC)

XML dumps
It seems that for some functions of pywikipedia it is recommend that I have a recent xml dump of my database. How vital is that, and, assuming it's critical, how do I optain an xml dump of my db? Thanks. Edit: Never mind, I think Help:Export answered my question. JoeyDay 05:49, 16 September 2005 (UTC) - edited 07:13, 16 September 2005 (UTC)

Using Pywikipediabot with $wgCapitalLinks = false
Pywikipediabot is having trouble retrieving files that start with lower case letters on the Homestar Runner Wiki because we have our $wgCapitalLinks flag set to false. I realize I may be the first to encounter this issue and don't want to seem needy, but I'm curious as to whether developers are aware of this and/or if there are any plans to address it in future versions of the framework. Thanks. JoeyDay 00:23, 18 September 2005 (UTC)

Uploaded images blank or corrupt
I'm having trouble using upload.py to upload images correctly to our MediaWiki 1.4-based site. I've been able to login to the server with the bot account, run the upload script, and the script reports a correct upload. However, when I look at the images on the site, they are the correction dimensions, but are either: a) blank, or b) blank with a few lines of corrupted pixels at the top of the image.

Here's some of the things I've tried:
 * Uploading with the standard web browser user interface using the bot account (works fine).
 * Tried using my standard administrator user account with the script (same bad result).
 * Tried using other image formats; .png, .gif, .jpg (same result).
 * Tried different image files (same result).
 * Moved image location to the same location as the script on disk (same result).

I don't think it's a permissions problem, since I'm able to upload the files manually using the same account, though I admit I don't know much about MediaWiki permissions. Any ideas?

Pagefromfile.py
I've tried to load a file into my own MediaWiki. But I get the message 'DBG> page may be locked?' and 'Page already exists, not adding!' However the page doesn't exist in my MediaWiki. whats wrong? This is the textfile Start title

subtitle1
text

subtitle2
End BB70 11:20, 25 january 2006 (CET)


 * This only happens with a MediaWiki installed on a localhost; when i use the my website-MediaWiki it works! --81.58.24.250 05:00, 28 April 2006 (UTC)

Transfering pages from wikipedia to my mediawiki project
I must copy a set of it.wiki pages to my custom mediawiki project.

Is there a bot that could do this job, or what bot should I work on?

Any hint is appreciated!!!

196.3.84.214 19:38, 2 April 2006 (UTC)

Compatibility between Mediawiki 1.6.1 and Pywikipedia "snapshot-20060312" ?
Hello,

I'm having problems loging using login.py on my non-wikimedia project, so I wonder if there is any incompatibilities between Compatibility between Mediawiki 1.6.1 and Pywikipedia "snapshot-20060312" ?

I think so because of the differents POST URLs generated using the normal login method and logging using login.py

In /var/log/apache2/access.log :

If I try to log using the login.py script, it fails:

127.0.0.1 - - [03/May/2006:10:50:08 +0200] "POST /index.php?title=Special:Userlogin&action=submit HTTP/1.1" 200 13696 "-" "RobHooftWikiRobot/1.0"

If I try to log using my browser, it succeeds:

172.21.161.66 - - [03/May/2006:10:51:26 +0200] "POST /index.php?title=Special:Userlogin&action=submitlogin&type=login&returnto=Main_Page HTTP/1.1" 200 2025 "http://victoria/index.php?title=Special:Userlogin&returnto=Main_Page" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3"

If it fails  : URL = Special:Userlogin&action=submit

If it suceeds : URL = Special:Userlogin&action=submitlogin&type=login

Here is the output when I try to log in :

srvjavalive:/var/www/victoria/bin/pywikipedia# ./login.py Checked for running processes. 1 processes currently running, including the current process. Password for user releasebot on victoria:en: Logging in to victoria:en as releasebot Login failed. Wrong password?

Note that the wiki is physically located on the same server where I installed pywikipedia.

Following the doc http://meta.wikimedia.org/wiki/Pywikipedia_bot_on_non-wikimedia_projects, this is my config :

srvjavalive:/var/www/victoria/bin/pywikipedia# cat user-config.py family                =  'victoria' mylang                =  'en' usernames['victoria']['en'] = 'releasebot'

srvjavalive:/var/www/victoria/bin/pywikipedia# cat families/victoria_family.py
 * 1) -*- coding: utf-8  -*-

import family


 * 1) The official Victoria Wiki.

class Family(family.Family):

def __init__(self):

family.Family.__init__(self)

self.name         = 'victoria' self.langs        = { 'en': 'victoria', } self.namespaces[4] = { '_default': u'Victoria',      } self.namespaces[5] = { '_default': u'Victoria talk', }

def version(self, code): return "1.6.1"

def path(self, code): return '/index.php'

Should I use the CVS version to solve the problem ? --Effco 10:41, 3 May 2006 (UTC)

Uncyclopedia sv
The Swedish version of Uncyclopedia is called Psyklopedin, not Psykelopedia. Won't you guys ever learn? – Smiddle (85.30.155.6) 16:46, 2 October 2006 (UTC)

Solution to a problem connecting to a wiki more than one directory under the URL domain
When trying to connect pywikipedia bot to a wiki in location www.example.com/folder/folder/wiki/index.php the failure displayed below occured. Basically it seemed the URL in the families file was not recognised.

In families file I had specified self.langs        = { 'en':         'www.example.com/folder/folder/wiki/', }

The solution to the failure was to specify only the domain the self.langs, and the subfolders in the call to "path", def path(self, code): return '/folder/folder/wiki/index.php'

In full: import family class Family(family.Family): def __init__(self): family.Family.__init__(self) self.name         = 'wikispectuspoc' self.langs        = { 'en':         'www.example.com', } self.namespaces[4] = { '_default': u'Smw',       } self.namespaces[5] = { '_default': u'Smw talk',  } def version(self, code): return "1.7.1" def path(self, code): return '/folder/folder/wiki/index.php'
 * 1) -*- coding: utf-8  -*-
 * 1) The Wikispectus Concept Test wiki

The original failure: [rich@wuf bot2]$ python login.py Checked for running processes. 1 processes currently running, including the current process. Password for user RickBot on wikispectuspoc:en: Logging in to wikispectuspoc:en as RickBot Traceback (most recent call last): File "login.py", line 220, in ? main File "login.py", line 216, in main loginMan.login File "login.py", line 169, in login cookiedata = self.getCookie File "login.py", line 120, in getCookie conn.request("POST", pagename, data, headers) File "/usr/local/lib/python2.4/httplib.py", line 804, in request self._send_request(method, url, body, headers) File "/usr/local/lib/python2.4/httplib.py", line 827, in _send_request self.endheaders File "/usr/local/lib/python2.4/httplib.py", line 798, in endheaders self._send_output File "/usr/local/lib/python2.4/httplib.py", line 679, in _send_output self.send(msg) File "/usr/local/lib/python2.4/httplib.py", line 646, in send self.connect File "/usr/local/lib/python2.4/httplib.py", line 614, in connect socket.SOCK_STREAM): socket.gaierror: (4, 'Non-recoverable failure in name resolution') [rich@wuf bot2]$

Thanks to bin in #pywikipediabot for the answer! --mw:User:Rick 01:07, 8 October 2006 (UTC)

Not working for me 1.7.1
user-config.py

family                =  'gtrwiki' mylang                =  'en' usernames['gtrwiki']['en'] = 'Toykilla-bot'

gtrwiki_family.py


 * 1) -*- coding: utf-8  -*-

import family

class Family(family.Family):

def __init__(self): family.Family.__init__(self) self.langs = { 'en':'www.gtr-tech.com', }

name = 'gtrwiki' #Set the family name; this should be the same as in the filename.

def version(self, code): return "1.7.1" #The MediaWiki version used. Not very important in most cases.

def path(self, code): return '/w/index.php' #The path of index.php

After running login.py I get this output: Checked for running processes. 1 processes currently running, including the current process. Traceback (most recent call last): File "login.py", line 220, in ? main File "login.py", line 215, in main loginMan = LoginManager(password, sysop = sysop) File "login.py", line 79, in __init__ raise wikipedia.NoUsername(u'ERROR: Username for %s:%s is undefined.\nIf you have an account for that site, please add such a line to user-config.py:\n\nusernames[\'%s\'][\'%s\'] = \'myUsername\'' % (self.site.family.name, self.site.lang, self.site.family.name, self.site.lang)) wikipedia.NoUsername: ERROR: Username for None:en is undefined. If you have an account for that site, please add such a line to user-config.py:

usernames['None']['en'] = 'myUsername'

Move to talk page
Note: some of the examples on this page have been shown not to work with recent pywikipediabot releases. Please see Talk:Pywikipedia_bot_on_non-Wikimedia_projects for further discussion. -- Tyagi 11:32, 11 April 2006 (UTC).
 * There is no mention of this on the talk page. Could someone in the know fix (or remove) the broken examples to avoid clogging up the mailing list....217.117.47.110 19:13, 12 June 2006 (UTC)

To the developer(s) - please have the URL query and POST information printed on a failure and/or provide more information as to help diagnose what may be mis-configured. If the bot never connects in attempt to enter the password, it would be nice to know. -- Adam Katz 23:09, 31 May 2005 (UTC)

Example
This is an example for the user-config.py and families/ file, which have been shown to work with newer releases of pywikipediabot.

user-config.py family                =  'abc' mylang                =  'en' usernames['abc']['en'] = 'MyNameBot'

families/abc_family.py
 * 1) -*- coding: utf-8  -*-

import family


 * 1) The official ABC Wiki.

class Family(family.Family):

def __init__(self):

family.Family.__init__(self)

self.name         = 'abc' self.langs        = { 'en':         'www.abc.com', } self.namespaces[4] = { '_default': u'ABC',       } self.namespaces[5] = { '_default': u'ABC talk',  }

def version(self, code): return "1.6.1"

def path(self, code): return '/wiki/index.php'

This example is confusing, I am not even sure what wiki this is about... Odessaukrain 22:03, 7 January 2007 (UTC)

Working with LDAP authentication?
Does anyone know if there's a way to get this bot working with LDAP Authentication? I had no problems running it before I went to LDAP, but now it doesn't seem to work, any ideas?

HotMonkeyAC 20:28, 13 June 2007 (UTC)


 * in login.py, there's a block of code that sets wpName, wpPassword, etc. To get LDAP working, add this

"wpDomain" : "MYDOMAIN",

This doesn't work
I did this, replacing MYDOMAIN as above with my LDAP domain name, and no change in behavior - it still fails, complaining of either a bad password or bad CAPTCHA data.

I got this to work with LDAP by putting self.domain = 'domain here' in wiki_family.py, where domain here is the LDAP domain. It did not work with self.name = 'domain here' as it says it should in the documentation README-family.txt -Funkmaster 801 18:45, 9 September 2008 (UTC)

This also does not work - when I added the self.domain = 'domain here' line, (substituting my actual domain, of course) it simply generates a syntax error. Can you post a family file that works so I can see what I might be doing wrong?


 * --Gene Turnbow 20:47, 10 September 2008 (UTC)

In Login.py replace line 121 which is {"wpDomain": self.site.family.ldapDomain,} with {"wpDomain": 'domain here',}
 * --AdrianArcher 14:34, 12 September 2008 (UTC)

With the embedded colon?? Not sure how that's supposed to map to a MediaWiki variable that way, but I'll try it.

Okay - this doesn't work, just tested it. I didn't expect it to, with the colon embedded in the key name like that. Removing the colon and trying again - okay, once I checked my path override in my family file, it finally worked. Now I'll test it with the "plain vanilla" login.py file from the latest version and see if that works - it may have been my family file the entire time.

My conclusion is that a lot of people are confusing the rendered path that MediaWiki generates with the raw URL you need to reach the login page itself. You have to get there without the use of the MediaWiki-massaged path, which will not succeed.

Also, the new login.py does not work. Hard-coding the LDAP domain as AdrianArcher identified, however, did work.


 * --Gene Turnbow 17:06, 12 September 2008 (UTC)

You're right, no colon, I've fixed it now. --AdrianArcher 15:34, 16 September 2008 (UTC)

Errors
C:\pywikipedia>login.py Checked for running processes. 1 processes currently running, including the current process. Traceback (most recent call last): File "C:\pywikipedia\login.py", line 277, in  main File "C:\pywikipedia\login.py", line 272, in main loginMan = LoginManager(password, sysop = sysop) File "C:\pywikipedia\login.py", line 97, in __init__ raise wikipedia.NoUsername(u'ERROR: Username for %s:%s is undefined.\nIf you have an account for that site, please add such a line to user-config.py:\n\nuse rnames[\'%s\'][\'%s\'] = \'myUsername\'' % (self.site.family.name, self.site.lan g, self.site.family.name, self.site.lang)) wikipedia.NoUsername: ERROR: Username for wikipedia:en is undefined. If you have an account for that site, please add such a line to user-config.py:

usernames['wikipedia']['en'] = 'myUsername'

Problem caused because I did not add the "family" line.

C:\pywikipedia>login.py Traceback (most recent call last): File "C:\pywikipedia\login.py", line 49, in    import wikipedia, config File "C:\pywikipedia\wikipedia.py", line 4226, in    getSite File "C:\pywikipedia\wikipedia.py", line 4134, in getSite _sites[key] = Site(code=code, fam=fam, user=user) File "C:\pywikipedia\wikipedia.py", line 3084, in __init__ self.family = Family(fam, fatal = False) File "C:\pywikipedia\wikipedia.py", line 3062, in Family exec "import %s_family as myfamily" % fam File " ", line 1, in  File ".\families\odessa_family.py", line 3 import family ^ IndentationError: unexpected indent

Fixed:

Pywikipedia_bot_on_non-Wikimedia_projects

So that the two lines:

import family  (line 3) 

and

class Family(family.Family):

have no space before them.

Odessaukrain 21:13, 24 May 2008 (UTC)

Not Working
it is not working for my project on : http://mercs.wikia.com/Mercs Wiki

my user-config.py

mylang = 'en' username = 'patxBot' family = 'mercs'

my family


 * 1) -*- coding: utf-8  -*-

import family


 * 1) The mercs wiki

class Family(family.Family): name = 'mercswiki' def __init__(self): self._addlang('en',                       location = 'http://mercs.wikia.com/wiki/Mercs_Wiki',                        namespaces = { 4: u'mercsWiki',                                      5: u'talk' })

def path(self, code): return '/index.php'

then when i type the stuff on the command promt this happens:

C:\Documents and Settings\Heeman9@bellsouth.ne>cd C:\\pywikipedia

C:\pywikipedia>welcome.py WARNING: Configuration variable 'username' is defined but unknown. Misspelled? Traceback (most recent call last): File "C:\pywikipedia\welcome.py", line 173, in    import wikipedia, config, string, locale File "C:\pywikipedia\wikipedia.py", line 5951, in    getSite File "C:\pywikipedia\wikipedia.py", line 5841, in getSite persistent_http=persistent_http) File "C:\pywikipedia\wikipedia.py", line 4069, in __init__    self.family = Family(fam, fatal = False)  File "C:\pywikipedia\wikipedia.py", line 3893, in Family    family = myfamily.Family  File "C:\pywikipedia\families\mercs_family.py", line 13, in __init__    5: u'Mercs Wiki Talk' }) File "C:\pywikipedia\family.py", line 2674, in _addlang self.langs[code] = location AttributeError: Family instance has no attribute 'langs'

C:\pywikipedia>

What should I do????????

-- PATX 13:07, 9 August 2008 (UTC)


 * The name="blah" of the family file must match the family="blah" in the user-config.py file. Also, this same name has to be applied to the name of the family file, so it is "blah_family.py" (mercs, presumably, in this case).


 * Further, it looks like (hard to be sure once you've copied and pasted it into the webpage) that your indentation is kind of messed up.  should be , and it should be after  . Also, I don't know what   is, it might be a valid function, but it's not mentioned in the README file for families (which I recommend you read). Same thing with   - it's not in the readme,   is. It doesn't even look like the Mercs Wiki uses a script path...


 * I think, though I am not certain, that you would use this:

 
 * I made this using |namespaces|namespacealiases|statistics Mercs Wiki's API, and the  file in the Family folder. I'm somewhat new at this, but I think that will work.
 * DragoonWraith 22:24, 2 November 2008 (UTC)

Custom Namespaces
I added a bunch of information that I could not find anywhere when I was trying to set up my installation of PyWikipedia to work on a Wiki with custom namespaces (much thanks to the people on #pywikipediabot for helping me out with this in the first place). It's a bit wordy and some of it probably ought to go elsewhere, but I didn't know where and I figured it'd be easier for people to move what I wrote to someplace appropriate. If you have any questions/comments on the text, let me know. --DragoonWraith 22:09, 2 November 2008 (UTC)

Je comprends rien
Le 1.1 et le 1.2 je maitrîse mais après je comprends plus rien ? Y-a-t-il d'autres fichiers a modifier pour utiliser un bot ? Merci

Re: Custom User Groups & Permissions
As a suggestion, you can try to add:

sysopnames['wikipedia']['en'] = 'adminname'

to, where ['wikipedia'] is the name of the the family in which you are working on, and 'adminname' is the username (as seen in on Pywikipediabot/delete.py).

I've tried this and works, it solved my problems with "redict.py broken".

I haven't a user on this wiki but if this is ok please add it to the main page.

No JSON object could be decoded
I'm having this error while trying to login.py in a 1.17alpha wiki:

Error downloading data: No JSON object could be decoded Request en:/wiki/api.php? Retrying in 1 minutes...

I've checked the user,pass and scriptpath and everything is allright. The strange thing is that in apache's access.log I've seen this:

"POST /wiki/api.php HTTP/1.1" 200 3930 "-" "PythonWikipediaBot/1.0"

So it is trying to connect, but it don't work.

Does anyone know what is happening?? What other things can I check?

Login failed. Wrong password or CAPTCHA answer?
I have not used in the bot in about 18 months. I returned to it yesterday and having got as far as the login.py prompt everything seemed good. It asks me for the password for the relevant user. I supply it. It tells me it is logging in to the wiki as the relevant user name. It then says "Opening CAPTCHA in your web browser...". It asks me for the solution. I supply it. It returns the message: "Login failed. Wrong password or CAPTCHA answer?". I cannot remember having this problem 18 months ago and nothing has changed in my set-up files. I went for the obvious, i.e. wrong passwords. I checked five times, ensured caps lock was off; I changed my password; I then shifted to an alternate user name.....every time the same message. Can anyone help? Is there something obvious I am missing?88.108.210.61 09:56, 5 May 2010 (UTC)
 * Summary: Download http://svn.wikimedia.org/viewvc/pywikipedia?revision=8071&view=revision
 * My bot had the same problem since 4 April. Turn out there's a modification since 7 April. Related links:, , . Hope it helps. Cheers. Bennylin 15:09, 31 May 2010 (UTC)


 * I have the same problem with my bot, downloaded up to date versions and the login.py , still getting Login failed. Wrong password or CAPTCHA answer. I would be very grateful for any help.--Ghaly 21:59, 2 June 2010 (UTC)


 * Thanks. it is all sorted now, I updated software using SVN update , then downloaded http://svn.wikimedia.org/viewvc/pywikipedia?revision=8071&view=revision and now it is working again. Best wishes.--Ghaly 22:01, 7 June 2010 (UTC)

Error downloading data: No JSON object could be decoded [SOLVED]
Message returned on the command line from python login.py: Error downloading data: No JSON object could be decoded Request en:/wikiscriptpath/api.php?

From the dumpfile:

Error reported: No JSON object could be decoded 127.0.0.1 /wikiscriptpath/api.php?

{"login":{"result":"NeedToken","token":"f8ff543dea8a19c853b64d714eb580e8"}}

I see this issue from time to time and it is EXTREMELY EXTREMELY EXTREMELY frustrating. It happened to me in March, then I downloaded a new pywikipedia snapshot, and it magically worked.

I am running MediaWiki 1.15.3 on Mac OS 10.6 with Python 2.6.1, and pywikipediabot works. I am using short URLs, but via httpd.conf aliases. I have not "blacklisted" api.php on that setup.

When I use an IDENTICAL setup on Ubuntu 9.04, but with Python 2.6.2, I get this ... error (sorry, I am straining to hold back the profanity).

My access logs:

127.0.0.1 - - [29/May/2010:08:09:34 -0400] "POST /correctwikiscriptpath/api.php HTTP/1.1" 200 95 "-" "PythonWikipediaBot/1.0"

The annoying bit: It was working just 3 or 4 days ago before I upgraded my Ubuntu python from 2.6.1 to 2.6.2. I have made NO other changes to my system.

Any suggestions? --AttemptedUser 12:18, 29 May 2010 (UTC)

Still having this MAJOR problem
Really? No response after a month? I can't be the only person having this problem, can I? Do any of the developers actually look at this page? Anyone? Bueller, Bueller?
 * --AttemptedUser 09:03, 22 June 2010 (UTC)

Same problem, identically. Been waiting for somebody to look at this for months now, and I'm at a complete standstill.


 * --13:20, 22 June 2010 (UTC)

Solution!! (Well, almost; more of a workaround)
Hi, this is AttemptedUser again, now not so frustrated and posting under my usual handle.

The problem is was that DynamicPageList extension had BOMs at the beginning of its interface file. Because this is a "require_once" extension, it seems that the BOM was getting inserted into the headers, and Ubuntu's version of php or apache (not sure which) does not sanitize those, whereas the Mac (and seemingly, everyone else's installation) DOES sanitize the BOMs before parsing. I am not sure why BeautifulSoup.py doesn't catch this, but for whatever reason it doesn't. Unless you're using UTF-16 files, you really shouldn't have a BOM anyway...

To check if you have any stray BOM's laying around, Mediawiki has actually included a handy script in the t/maint directory called "bom.t" If you're curious, go to your main MediaWiki directory, then "perl t/maint/bom.t", and it will tell you which files are problematic.

If you just want to blast away and fix the problem, a combination of two handy scripts took care of the problem for me. Put one or both in an executable path, but be sure modify the shell script to refer to the absolute path to the Perl script:

This one I call "RecursiveBOMdefuse.sh"

# if [ "$1" = "" ] ; then echo "Usage: $0 directory" exit fi find "$1" -type f | while read Name ; do # Based on the file name, perform the conversion case "$Name" in    (*) # markup text NameTxt="${Name}" /absolute/path/to/./BOMdefuse.plx "$NameTxt"; #alternatively, you could probably use perl /absolute/path/to/BOMdefuse.plx "$NameTxt"; ;;  esac done
 * 1) !/bin/sh
 * 1) Get list of files in the directory

The next, I call BOMdefuse.plx, which is a perl script I found at W3C's website - I'm really not sure why they haven't made this operate recursively, but the shell takes care of that. If I had the time, I'd fix the Perl script to handle everything, but I'm just so happy about getting the bot working again that I'm going back to work on editing/cleaning up content.

# if ($#ARGV > 0) { print STDERR "Too many arguments!\n"; exit; } my @file;  # file content my $lineno = 0; my $filename = @ARGV[0]; if ($filename) { open( BOMFILE, $filename ) || die "Could not open source file for reading."; while () { if ($lineno++ == 0) { if ( index( $_, '﻿' ) == 0 ) { s/^\xEF\xBB\xBF//; print "BOM found and removed from $filename.\n"; }           else { print "No BOM found in $filename.\n"; } }       push @file, $_ ; }   close (BOMFILE)  || die "Can't close source file after reading."; open (NOBOMFILE, ">$filename") || die "Could not open source file for writing."; foreach $line (@file) { print NOBOMFILE $line; }   close (NOBOMFILE)  || die "Can't close source file after writing."; } else {  # STDIN -> STDOUT while (<>) { if (!$lineno++) { s/^\xEF\xBB\xBF//; }   push @file, $_ ; }   foreach $line (@file) { print $line; }   }
 * 1) !/usr/bin/perl
 * 2) program to remove a leading UTF-8 BOM from a file
 * 3) works both STDIN -> STDOUT and on the spot (with filename as argument)
 * 4) adapted from http://people.w3.org/rishida/blog/?p=102

Run a chmod +x on both of these.

Then go to your main Mediawiki directory and run

RecursiveBOMdefuse.sh.

It may take a minute or two, but it works!

Note: If you use symlinks anywhere in your installation, the script above does not seem to follow them, so you have to run the script from the actual directory. Although slightly annoying, this is probably a good thing, as a bed set of symlinks could send this script off to run through your entire drive (or if you're on a system with NFS mounts, the whole network/cluster!!!). Incidentally, the scripts found more BOMs than the

I hope this helps others, and Ubuntu or Pywikipediabot folks, please take a look at your PHP/Apache and BeautifulSoup.py - stray BOMs should not be getting through..... (Of course, extension authors should sanitize their extensions first, but talk about herding cats). --Fungiblename 06:20, 6 August 2010 (UTC)