Manual:Pywikibot/fixes.py

Fixes.py and user-fixes.py are auxiliary files of replace.py, the text manipulating utility of Pywikipediabot framework. They are useful for advanced use of replace.py; both contain so-called "fixes", i.e. predefined text replacement tasks.

They use some elements of Python programming. However, at a basic level you may use them without programming knowledge by copying, while advanced use requires basic knowledge of Python and regular expressions. So don't be afraid of strange words below such as dictionary.

Fixes are better than simple command-line parameters for replace.py, if you want to They are also useful if you have character encoding troubles in command window, because they are fully UTF-8 based.
 * create complicated replacements that don't fit into command-line,
 * develop your replacements and fix errors without retyping them,
 * save the replacement for repeated use,
 * use replacements created by others.

Be sure to understand the role and use of replace.py before going on with reading this article.

Comparison of fixes.py and user-fixes.py
Both files have the same purpose: to store fixes. There is no difference in use of fixes. A -fix:example parameter in the command-line of replace.py will find the fix called "example", should it be defined in fixes.py or user-fixes.py. If there are more fixes with the same name, the last one will be found (fixes.py preceeding user-fixes.py in order of search). Technically, both are Python scripts that contain executable code. Replace.py will call fixes.py if you use the above mentioned parameter, while fixes.py will look for user-fixes.py and execute it if found upon calling.

Fixes.py comes with the Pywikipedia framework. You may use it as is or modify for your own purposes. However, this file is not designed for personal use as it is subject to change at any time. You may write your own fixes into it (as the author of this article does), but this is not a good practice, because either you will lose your own fixes or you have to renounce the new standard fixes. So don't follow me. :-) (I never use the original fixes.) Always have a backup of your file!

User-fixes.py is not included in Pywikipedia when you download it. You may create it by copying from this page or by running generate_user_files.py. Whatever way of updating your Pywikipedia distribution you choose, either SVN or unpacking from nightlies, this file will always be ontouched. That's why it is not included in pack – this is the way to avoid overwriting. So this is the recommended way of storing your own fixes, but be sure to have backups in this case, too, as it contains your own work and nobody will reproduce it for you. (If you think your fix is useful for others, it is a good idea to upload it to your wiki and categorize so that other bot owners can find it, and the backup is ready.)

The two files have a slightly different syntax: while fixes.py has a huge dictionary of fixes, user-fixes.py adds fixes to this dictionary one by one, so it is easier to review. If you want to include your own definitions and functions, user-fixes.py is more comfortable to do that, because you may place them directly before the involved fix, while in fixes.py all of them must take place at the very beginning of the file.

Standard fixes
Standard fixes are created and tested by others, you only have to use them. They also show the way of creating fixes and may give you ideas to invent new ones. They are enumerated at the beginning of fixes.py and include general syntax corrections, German and Arabic spelling as well as replacement of the outdated .yu TLD. As you may see from history, new standard fixes don't come too often.

A short detour to data structures
If you are not a programming person, you should first have a look at two of Python data structures to understand the construction of fixes.py. One of them is dictionary (also known as associative array) that consists of key-value pairs. A new dictionary called fixes may be created (among others) like this:

The existing dictionary may be extended this way: Values may be numbers, texts, lists or even dictionaries while keys are usually strings (pieces of text). The second syntax is used in user-fixes.py.

The other important data structure is list. A list is just an enumeration of any kind of elements among square brackets, for example pairs of data:

You may have noticed that I had put a comma after the last element both in the dictionary and the list. This is legal and usually considered a good practice, because you have less chance to forget it next time when you want to add a new element.

Strings (texts) are worth a sentence: you may often see a u in front of them. That stands for Unicode in Python 2.x and is useful to put it there always again against forgetfulness. While 'table' is equivalent to u'table' as it contains only English letters, for majority of languages the prefix u is unavoidable and thus it is better to use it all the time. Another such prefix, r is recommended when you use regular expressions (but means raw, not regex – just use it if you don't understand the role of it).

Construction of a fix
As written above, fixes form a big dictionary called fixes</tt>. This dictionary is first defined in fixes.py, then optionally extended in user-fixes.py. Each fix itself is a dictionary, beeing the name of the fix the key of fixes and the description of the fix the value belonging to that key. As a fix-dictionary has dictionary-type, list-type and logical-type values, the whole file is constructed of embedded dictionaries at three levels.

Once you understood the difference between fixes.py and user-fixes.py, you will see that the fixes themselves have the same structure in both files, only differing in indentation. (As user-fixes.py adds fixes one by one to the dictionary, they begin at the left while fixes in fixes.py have a default indentation from the left side.)

Below you see the keys of a fix. These keys more or less correspond to command-line parameters of replace.py and if they are defined, they will overwrite the corresponding command-line parameter. Order of the key-value pairs is only a tradition, not mandatory. Only one of them, replacements is obligatory in a fix, all the others are optional.

You may find a simple example in Manual:Pywikipediabot/user-fixes.py and some less simple in fixes.py itself.

Additional keys may be added for your own use if you process them yourself; replace.py won't bother them. (E.g. I modified replace.py for my own needs so that it logs replacements to a user page, and uses an extra key of fixes for that.)

Advanced use of fixes
To learn how to use your own functions in fixes.py and user-fixes.py and what this is good for, see hu:Szerkesztő:Bináris/Fixes and functions HOWTO.