Manual:Pywikibot/fixes.py

From MediaWiki.org
Jump to navigation Jump to search
Git logo
Wikimedia Git repository has this file: pywikibot/fixes.py

Fixes.py and user-fixes.py are auxiliary files of replace.py, the text manipulating utility of Pywikibot framework. They are useful for advanced use of replace.py; both contain so-called "fixes", i.e. predefined text replacement tasks.

They use some elements of Python programming. However, at a basic level you may use them without programming knowledge by copying, while advanced use requires basic knowledge of Python and regular expressions. So don't be afraid of strange words below such as dictionary.

Fixes are better than simple command-line parameters for replace.py, if you want to

  • create complicated replacements that don't fit into command-line,
  • develop your replacements and fix errors without retyping them,
  • save the replacement for repeated use,
  • use replacements created by others,
  • share your replacements with other bot owners.

They are also useful if you have character encoding troubles in command window, because they are fully UTF-8 based.

Be sure to understand the role and use of replace.py before going on with reading this article.

Comparison of fixes.py and user-fixes.py[edit]

Both files have the same purpose: to store fixes. There is no difference in use of fixes. A -fix:example parameter in the command-line of replace.py will find the fix called "example", should it be defined in fixes.py or user-fixes.py. If there are more fixes with the same name, the last one will be found (fixes.py preceeding user-fixes.py in order of search). Technically, both are Python scripts that contain executable code. Replace.py will call fixes.py if you use the above mentioned parameter, while fixes.py will look for user-fixes.py and execute it if found upon calling.

Fixes.py comes with the pywikibot framework. You may use it as is or modify for your own purposes. However, this file is not designed for personal use as it is subject to change at any time. You may write your own fixes into it (as the author of this article does), but this is not a good practice, because either you will lose your own fixes or you have to renounce the new standard fixes. So don't follow me. :-) (I never use the original fixes.) Always have a backup of your file!

User-fixes.py is not included in pywikibot when you download it. You may create it by copying from this page or by running generate_user_files.py. Whatever way of updating your pywikibot distribution you choose, either SVN or unpacking from nightlies, this file will always be ontouched. That's why it is not included in pack – this is the way to avoid overwriting. So this is the recommended way of storing your own fixes, but be sure to have backups in this case, too, as it contains your own work and nobody will reproduce it for you. (If you think your fix is useful for others, it is a good idea to upload it to your wiki and categorize so that other bot owners can find it, and the backup is ready.)

The two files have a slightly different syntax: while fixes.py has a huge dictionary of fixes, user-fixes.py adds fixes to this dictionary one by one, so it is easier to review. If you want to include your own definitions and functions, user-fixes.py is more comfortable to do that, because you may place them directly before the involved fix, while in fixes.py all of them must take place at the very beginning of the file.

Standard fixes[edit]

Standard fixes are created and tested by others, you only have to use them. They also show the way of creating fixes and may give you ideas to invent new ones. They are enumerated at the beginning of fixes.py and include general syntax corrections, German and Arabic spelling as well as replacement of the outdated .yu TLD. As you may see from the history, new standard fixes aren't added very often.

A short detour to data structures[edit]

If you are not a programming person, you should first have a look at two of Python data structures to understand the construction of fixes.py. One of them is dictionary (also known as associative array) that consists of key-value pairs. A new dictionary called fixes may be created (among others) like this:

fixes = {
   'key1': value1,
   'key2': value2,
}

The existing dictionary may be extended this way:

fixes['key3'] = value3

Values may be numbers, texts, lists or even dictionaries while keys are usually strings (pieces of text). The second syntax is used in user-fixes.py.

The other important data structure is list. A list is just an enumeration of any kind of elements among square brackets, for example pairs of data:

replacements = [
   (old1, new1),
   (old2, new2),
]

You may have noticed that I had put a comma after the last element both in the dictionary and the list. This is legal and usually considered a good practice, because you have less chance to forget it next time when you want to add a new element.

Strings (texts) are worth a sentence: you may often see a u in front of them. That stands for Unicode in Python 2.x and is useful to put it there always again against forgetfulness. While 'table' is equivalent to u'table' as it contains only English letters, for majority of languages the prefix u is unavoidable and thus it is better to use it all the time. Another such prefix, r is recommended when you use regular expressions (but means raw, not regex – just use it if you don't understand the role of it).

Construction of a fix[edit]

As written above, fixes form a big dictionary called fixes. This dictionary is first defined in fixes.py, then optionally extended in user-fixes.py. Each fix itself is a dictionary, beeing the name of the fix the key of fixes and the description of the fix the value belonging to that key. As a fix-dictionary has dictionary-type, list-type and logical-type values, the whole file is constructed of embedded dictionaries at three levels.

Once you understood the difference between fixes.py and user-fixes.py, you will see that the fixes themselves have the same structure in both files, only differing in indentation. (As user-fixes.py adds fixes one by one to the dictionary, they begin at the left while fixes in fixes.py have a default indentation from the left side.)

Below you see the keys of a fix. These keys more or less correspond to command-line parameters of replace.py and if they are defined, they will overwrite the corresponding command-line parameter. Order of the key-value pairs is only a tradition, not mandatory. Only one of them, replacements is obligatory in a fix, all the others are optional.

Key Type of value Description
'regex' logical Whether or not replacements and exceptions should be interpreted as regular expressions. Omitting equals to False.
'recursive' logical If True, this fix will always run in recursive mode. Used very rarely (an example). Omitting equals to False.
'nocase' logical If True, this fix will always run in case insensitive mode (corresponds to -nocase in command-line). Omitting equals to False.
'msg' dictionary Keys of this dictionary may be valid language codes (or '_default') and the corresponding values will determine the edit summary of the bot for that language. If you write your own fix for your home wiki, it is enough to give your language code. For example a fix used only in Hungarian Wikipedia may have here
'msg': {
         'hu':u'A hogy elé vessző kell.',
       },
(Some wikis require the summary to begin with "Bot:".)
string Alternatively, this key may be a basestring (i.e. string or Unicode string). In this case edit comments are not stored within the fix itself, but an outer file in the i18n subdirectory of your pywikibot distribution, according to the new generation internationalization method of the framework. E.g. the standard 'isbn' fix has the value 'isbn-formatting'. That means, edit summaries will be taken from i18n/isbn.py and will be the same for all users of that language. New translations may be added on Translatewiki. This is useful if you want to make your fix publicly available for several language bot owners, while the first syntax is simpler and leaves the summary within your scope (recommended for fixes on your own).
'replacements' list This is the heart of your fix. You may give arbitrary number of replacements as (old, new) pairs (as written in the above section). They usually contain regular expressions, but may consist of just given concrete words. New values are allowed to be function names instead of direct replacement texts or regular expressions. See the link at the bottom of this page for further information about functions.
dictionary With this alternative syntax you may create language-dependent fixes. Keys of this dictionary must be language codes and the values must be lists as described above. The appropriate list will be chosen by the language of your wiki.
'exceptions' dictionary The exceptions when the old text will not be replaced by the new or the page will not be searched for matches at all.[1] This dictionary may have these keys:
title
A list of regular expressions. All pages with titles that are matched by one of these regular expressions are skipped.
text-contains
A list of regular expressions. All pages with text that contains a part which is matched by one of these regular expressions are skipped.
inside
A list of regular expressions. All occurences are skipped which lie within a text region which is matched by one of these regular expressions.
inside-tags
A list of strings. These strings must be keys from the exceptionRegexes dictionary in pywikibot/textlib.py, replaceExcept() method.
Currently available tags: 'comment', 'header', 'pre', 'source', 'score', 'ref', 'startspace', 'table', 'template', 'hyperlink', 'gallery', 'link', 'interwiki', 'property', 'invoke' and any "HTML-like" double tags such as nowiki, noinclude, includeonly, timeline, math etc. Tags also include 'category' and 'file' (with all available names on the given wiki).
require-title
Opposite of title. Only pages with titles that are matched by ALL of these regular expressions will be processed. This is not an exception, and is here for technical reasons. Listing the same regex in title and require-title will thus prevent the bot of doing anything.
include
One standalone value, either the name of a dictionary in your file (example) or the name of a callable function that takes the name of the fix as argument and returns a dictionary of exceptions. This dictionary may have any of the five above keys (but not 'include' itself!), and the lists belonging to those keys will be added to your exceptions. This way you may define one or more basic collection of exceptions used for multiple fixes, and add separate exceptions to each fix.

You may find a simple example in Manual:Pywikibot/user-fixes.py and some less simple in fixes.py itself.

Additional keys may be added for your own use if you process them yourself; replace.py won't bother them. (E.g. I modified replace.py for my own needs so that it logs replacements to a user page, and uses an extra key of fixes for that.)

Advanced use of fixes[edit]

To learn how to use your own functions in fixes.py and user-fixes.py and what this is good for, see hu:Szerkesztő:Bináris/Fixes and functions HOWTO.

References[edit]

  1. Description adapted from the code of replace.py. For original authors, see the code history.