Manual:Pywikibot/category.py

From MediaWiki.org
Jump to: navigation, search
Wikimedia-logo-meta.png

This page was moved from MetaWiki.
It probably requires cleanup – please feel free to help out. In addition, some links on the page may be red; respective pages might be found at Meta. Remove this template once cleanup is complete.

Bug blank.svg
Wikimedia Git repository has this file:
compat: category.py
Language: English  • català • italiano • 한국어

Script to manage categories. This needs Python with at least v2.4 (not v2.3) as stated on Using the python wikipediabot.

Syntax[edit]

The syntax is:

   python pwb.py category action [-option]

where action can be one of these:

* add          - mass-add a category to a list of pages
* remove       - remove category tag from all pages in a category
* move         - move all pages in a category to another category
* tidy         - tidy up a category by moving its articles into subcategories
* tree         - show a tree of subcategories of a given category
* listify      - make a list of all of the articles that are in a category

and option can be one of these:

Options for "add" action:

* -person      - sort persons by their last name
* -create      - If a page doesn't exist, do not skip it, create it instead
* -redirect    - Follow redirects

If action is "add", the following options are supported:

-catfilter        Filter the page generator to only yield pages in the
                  specified category. See -cat for argument format.

-cat              Work on all pages which are in a specific category.
                  Argument can also be given as "-cat:categoryname" or
                  as "-cat:categoryname|fromtitle" (using # instead of |
                  is also allowed in this one and the following)

-catr             Like -cat, but also recursively includes pages in
                  subcategories, sub-subcategories etc. of the
                  given category.
                  Argument can also be given as "-catr:categoryname" or
                  as "-catr:categoryname|fromtitle".

-subcats          Work on all subcategories of a specific category.
                  Argument can also be given as "-subcats:categoryname" or
                  as "-subcats:categoryname|fromtitle".

-subcatsr         Like -subcats, but also includes sub-subcategories etc. of
                  the given category.
                  Argument can also be given as "-subcatsr:categoryname" or
                  as "-subcatsr:categoryname|fromtitle".

-uncat            Work on all pages which are not categorised.

-uncatcat         Work on all categories which are not categorised.

-uncatfiles       Work on all files which are not categorised.

-file             Read a list of pages to treat from the named text file.
                  Page titles in the file may be either enclosed with
                  [[brackets]], or be separated by new lines.
                  Argument can also be given as "-file:filename".

-filelinks        Work on all pages that use a certain image/media file.
                  Argument can also be given as "-filelinks:filename".

-search           Work on all pages that are found in a MediaWiki search
                  across all namespaces.

-logevents        Work on articles that were on a specified Special:Log.
                  The value may be a comma separated list of these values:

                      logevent,username,start,end

                  or for backward compatibility:

                      logevent,username,total

                  To use the default value, use an empty string.
                  You have options for every type of logs given by the
                  log event parameter which could be one of the following:

                      block, protect, rights, delete, upload, move, import,
                      patrol, merge, suppress, review, stable, gblblock,
                      renameuser, globalauth, gblrights, abusefilter, newusers

                  It uses the default number of pages 10.

                  Examples:

                  -logevents:move gives pages from move log (usually redirects)
                  -logevents:delete,,20 gives 20 pages from deletion log
                  -logevents:protect,Usr gives pages from protect by user Usr
                  -logevents:patrol,Usr,20 gives 20 patroled pages by user Usr
                  -logevents:upload,,20121231,20100101 gives upload pages
                  in the 2010s, 2011s, and 2012s
                  -logevents:review,,20121231 gives review pages since the
                  beginning till the 31 Dec 2012
                  -logevents:review,Usr,20121231 gives review pages by user
                  Usr since the beginning till the 31 Dec 2012

                  In some cases it must be written as -logevents:"move,Usr,20"

-namespaces       Filter the page generator to only yield pages in the
-namespace        specified namespaces. Separate multiple namespace
-ns               numbers or names with commas.
                  Examples:

                  -ns:0,2,4
                  -ns:Help,MediaWiki

                  If used with -newpages/-random/-randomredirect,
                  -namespace/ns must be provided before
                  -newpages/-random/-randomredirect.
                  If used with -recentchanges, efficiency is improved if
                  -namespace/ns is provided before -recentchanges.

                  If used with -start, -namespace/ns shall contain only one
                  value.

-interwiki        Work on the given page and all equivalent pages in other
                  languages. This can, for example, be used to fight
                  multi-site spamming.
                  Attention: this will cause the bot to modify
                  pages on several wiki sites, this is not well tested,
                  so check your edits!

-limit:n          When used with any other argument that specifies a set
                  of pages, work on no more than n pages in total.

-links            Work on all pages that are linked from a certain page.
                  Argument can also be given as "-links:linkingpagetitle".

-liverecentchanges Work on pages from the live recent changes feed. If used as
                  -liverecentchanges:x, work on x recent changes.

-imagesused       Work on all images that contained on a certain page.
                  Argument can also be given as "-imagesused:linkingpagetitle".

-newimages        Work on the most recent new images. If given as -newimages:x,
                  will work on x newest images.

-newpages         Work on the most recent new pages. If given as -newpages:x,
                  will work on x newest pages.

-recentchanges    Work on the pages with the most recent changes. If
                  given as -recentchanges:x, will work on the x most recently
                  changed pages. If given as -recentchanges:offset,duration it
                  will work on pages changed from 'offset' minutes with
                  'duration'  minutes of timespan. rctags are supported too.
                  The rctag must be the very first parameter part.

                  Examples:
                  -recentchanges:20 gives the 20 most recently changed pages
                  -recentchanges:120,70 will give pages with 120 offset
                  minutes and 70 minutes of timespan
                  -recentchanges:visualeditor,10 gives the 10 most recently
                  changed pages marked with 'visualeditor'
                  -recentchanges:"mobile edit,60,35" will retrieve pages marked
                  with 'mobile edit' for the given offset and timespan

-unconnectedpages Work on the most recent unconnected pages to the Wikibase
                  repository. Given as -unconnectedpages:x, will work on the
                  x most recent unconnected pages.

-ref              Work on all pages that link to a certain page.
                  Argument can also be given as "-ref:referredpagetitle".

-start            Specifies that the robot should go alphabetically through
                  all pages on the home wiki, starting at the named page.
                  Argument can also be given as "-start:pagetitle".

                  You can also include a namespace. For example,
                  "-start:Template:!" will make the bot work on all pages
                  in the template namespace.

                  default value is start:!

-prefixindex      Work on pages commencing with a common prefix.

-subpage:n        Filters pages to only those that have depth n
                  i.e. a depth of 0 filters out all pages that are subpages,
                  and a depth of 1 filters out all pages that are subpages of
                  subpages.

-titleregex       A regular expression that needs to match the article title
                  otherwise the page won't be returned.
                  Multiple -titleregex:regexpr can be provided and the page
                  will be returned if title is matched by any of the regexpr
                  provided.
                  Case insensitive regular expressions will be used and
                  dot matches any character.

-transcludes      Work on all pages that use a certain template.
                  Argument can also be given as "-transcludes:Title".

-unusedfiles      Work on all description pages of images/media files that are
                  not used anywhere.
                  Argument can be given as "-unusedfiles:n" where
                  n is the maximum number of articles to work on.

-lonelypages      Work on all articles that are not linked from any other
                  article.
                  Argument can be given as "-lonelypages:n" where
                  n is the maximum number of articles to work on.

-unwatched        Work on all articles that are not watched by anyone.
                  Argument can be given as "-unwatched:n" where
                  n is the maximum number of articles to work on.

-usercontribs     Work on all articles that were edited by a certain user.
                  (Example : -usercontribs:DumZiBoT)

-weblink          Work on all articles that contain an external link to
                  a given URL; may be given as "-weblink:url"

-withoutinterwiki Work on all pages that don't have interlanguage links.
                  Argument can be given as "-withoutinterwiki:n" where
                  n is the total to fetch.

-mysqlquery       Takes a Mysql query string like
                  "SELECT page_namespace, page_title, FROM page
                  WHERE page_namespace = 0" and works on the resulting pages.

-wikidataquery    Takes a WikidataQuery query string like claim[31:12280]
                  and works on the resulting pages.

-sparql           Takes a SPARQL SELECT query string including ?item
                  and works on the resulting pages.

-sparqlendpoint   Specify SPARQL endpoint URL (optional).
                  (Example : -sparqlendpoint:http://myserver.com/sparql)

-searchitem       Takes a search string and works on Wikibase pages that
                  contain it.
                  Argument can be given as "-searchitem:text", where text
                  is the string to look for, or "-searchitem:lang:text", where
                  lang is the langauge to search items in.

-random           Work on random pages returned by [[Special:Random]].
                  Can also be given as "-random:n" where n is the number
                  of pages to be returned.

-randomredirect   Work on random redirect pages returned by
                  [[Special:RandomRedirect]]. Can also be given as
                  "-randomredirect:n" where n is the number of pages to be
                  returned.

-google           Work on all pages that are found in a Google search.
                  You need a Google Web API license key. Note that Google
                  doesn't give out license keys anymore. See google_key in
                  config.py for instructions.
                  Argument can also be given as "-google:searchstring".

-yahoo            Work on all pages that are found in a Yahoo search.
                  Depends on python module pYsearch. See yahoo_appid in
                  config.py for instructions.

-page             Work on a single page. Argument can also be given as
                  "-page:pagetitle", and supplied multiple times for
                  multiple pages.

-pageid           Work on a single pageid. Argument can also be given as
                  "-pageid:pageid1,pageid2,." or "-pageid:'pageid1|pageid2|..'"
                  and supplied multiple times for multiple pages.

-grep             A regular expression that needs to match the article
                  otherwise the page won't be returned.
                  Multiple -grep:regexpr can be provided and the page will
                  be returned if content is matched by any of the regexpr
                  provided.
                  Case insensitive regular expressions will be used and
                  dot matches any character, including a newline.

-ql               Filter pages based on page quality.
                  This is only applicable if contentmodel equals
                  'proofread-page', otherwise has no effects.
                  Valid values are in range 0-4.
                  Multiple values can be comma-separated.

-onlyif           A claim the page needs to contain, otherwise the item won't
                  be returned.
                  The format is property=value,qualifier=value. Multiple (or
                  none) qualifiers can be passed, separated by commas.
                  Examples: P1=Q2 (property P1 must contain value Q2),
                  P3=Q4,P5=Q6,P6=Q7 (property P3 with value Q4 and
                  qualifiers: P5 with value Q6 and P6 with value Q7).
                  Value can be page ID, coordinate in format:
                  latitude,longitude[,precision] (all values are in decimal
                  degrees), year, or plain string.
                  The argument can be provided multiple times and the item
                  page will be returned only if all of the claims are present.
                  Argument can be also given as "-onlyif:expression".

-onlyifnot        A claim the page must not contain, otherwise the item won't
                  be returned.
                  For usage and examples, see -onlyif above.

-intersect        Work on the intersection of all the provided generators.

Options for "listify" action:

* -overwrite   - This overwrites the current page with the list even if
                 something is already there.
* -showimages  - This displays images rather than linking them in the list.
* -talkpages   - This outputs the links to talk pages of the pages to be
                 listified in addition to the pages themselves.

Options for "remove" action:

* -nodelsum    - This specifies not to use the custom edit summary as the
                 deletion reason. Instead, it uses the default deletion reason
                 for the language, which is "Category was disbanded" in
                 English.

Options for "move" action:

* -hist        - Creates a nice wikitable on the talk page of target category
                 that contains detailed page history of the source category.
* -nodelete    - Don't delete the old category after move
* -nowb        - Don't update the wikibase repository
* -allowsplit  - If that option is not set, it only moves the talk and main
                 page together.
* -mvtogether  - Only move the pages/subcategories of a category, if the
                 target page (and talk page, if -allowsplit is not set)
                 doesn't exist.
* -keepsortkey - Use sortKey of the old category also for the new category.
                 If not specified, sortKey is removed.
                 An alternative method to keep sortKey is to use -inplace
                 option.

Options for "tidy" action:

* -namespaces    Filter the arcitles in the specified namespaces. Separate
  -namespace     multiple namespace numbers or names with commas. Examples:
  -ns            -ns:0,2,4
                 -ns:Help,MediaWiki

Options for several actions:

* -rebuild     - reset the database
* -from:       - The category to move from (for the move option)
                 Also, the category to remove from in the remove option
                 Also, the category to make a list of in the listify option
* -to:         - The category to move to (for the move option)
               - Also, the name of the list to make in the listify option
        NOTE: If the category names have spaces in them you may need to use
        a special syntax in your shell so that the names aren't treated as
        separate parameters. For instance, in BASH, use single quotes,
        e.g. -from:'Polar bears'
* -batch       - Don't prompt to delete emptied categories (do it
                 automatically).
* -summary:    - Pick a custom edit summary for the bot.
* -inplace     - Use this flag to change categories in place rather than
                 rearranging them.
* -recurse     - Recurse through all subcategories of categories.
* -pagesonly   - While removing pages from a category, keep the subpage links
                 and do not remove them
* -match       - Only work on pages whose titles match the given regex (for
                 move and remove actions).
* -depth:      - The max depth limit beyond which no subcategories will be
                 listed.

For the actions tidy and tree, the bot will store the category structure locally in category.dump. This saves time and server load, but if it uses these data later, they may be outdated; use the -rebuild parameter in this case.

For example, to create a new category from a list of persons, type:

 python pwb.py category add -person

and follow the on-screen instructions.

Or to do it all from the command-line, use the following syntax:

 python pwb.py category move -from:US -to:"United States"

This will move all pages in the category US to the category United States.

Global arguments available for all bots

These options will override the configuration in user-config.py settings.

Name Description Default/config name compat core
-dir:PATH Read the bot's configuration data from directory given by PATH, instead of from the default directory.
-lang:xx Set the language of the wiki you want to work on. xx should be the language code. mylang
-family:xyz Set the family of the wiki you want to work on, e.g. wikipedia, wiktionary, commons, wikitravel, …. family
-user:xyz Log in as user 'xyz' instead of the default username.
-daemonize:xyz Immediately return control to the terminal and redirect stdout and stderr to xyz (only use for bots that require no input from stdin).
-help Show a help text for the invoked script.
-log Enable the logfile. Logs will be stored in the logs subdirectory.
-log:xyz Enable the logfile, using xyz as the filename.
-nolog Disable the logfile (if it's enabled by default).
-maxlag Sets a new maxlag parameter to a number of seconds. Defer bot edits during periods of database server lag. maxlag
-debug:item
-debug
Enable the logfile and include extensive debugging data for component "item" (for all components if the second form is used).
-putthrottle:nn
-pt:nn
Set the minimum time (in seconds) the bot will wait between saving pages. put_throttle
-verbose
-v
Make the program output more detailed messages than usual to the standard output about its current work, or progress, while it is proceeding. This may be helpful when debugging or dealing with unusual situations. not selected
-cosmeticchanges
-cc
Toggles the cosmetic_changes setting made in config.py or user_config.py to its inverse and overrules it. All other settings and restrictions are untouched. not selected
-simulate Disables writing to the server. Useful for testing and debugging of new code (if given, doesn't do any real changes, but only shows what would have been changed).
-<config_var>:n You may use all given numeric config variables as option and modify it with command line.


See also[edit]