Manual:Pywikibot/Cookbook/Creating and reading lists

From mediawiki.org

Creating a list of pages is frequent task. For example:

  1. You collect titles to work on because collecting is slow and can be done while you are sleeping.
  2. You want to review the list and make further discussions before you begin the task with your bot.
  3. You want to know the extent of a problem before you begin to write a bot for it.
  4. Listing is the purpose itself. It may be a maintenance list that requires attention from human users. It may be a community task list etc.
  5. Someone asked yo to create a list on which he or she wants to work.

A list may be saved to a file or to a wikipage. listpages.py does something like this, but the input is restricted to builtin page generators and output has a lot of options. If you write an own script, you may want a simple solution in place. Suppose that you have any iterable (list, tuple or generator) called pages that contains your collection.

Something like this:

'\n'.join(['* ' + page.title(as_link=True) for page in pages])

will give an appropriate list that is suitable both for wikipage and file. It looks like this:

* [[Article1]]
* [[Article2]]
* [[Article3]]
* [[Article4]]

On Windows sometimes you get a UnicodeEncodeError when you try to save page names containing non-ASCII characters. In this case codecs will help:

import codecs
with codecs.open('myfile.txt', 'w', 'utf-8') as file:
    file.write(text)

Of course, imports should be on top of your script, this is just a sample. While a file does not require the linked form, it is useful to keep them in the same form so that a list can be copied from a file to a wikipage at any time.

To retrieve your list from page [[Special:MyPage/Mylist]] use:

from pywikibot.pagegenerators import LinkedPageGenerator
listpage = pywikibot.Page(site, 'Special:MyPage/Mylist')
pages = list(LinkedPageGenerator(listpage))

If you want to read the pages from the file to a list, do:

from pywikibot.pagegenerators import TextIOPageGenerator
for page in TextIOPageGenerator('myfile.txt'):
    # Process your page here

See also Page generators chapter about how to collect titles.

Where to save the files?[edit]

While introducing the userscripts directory is a great idea to separate your own scripts, using pwb.py your prompt is in the Pywikibot root directory. Once this structure is created so nicely, you may not want to mix your files into Pywikibot system files. Saving it to userscripts requires to give the path every time, and is an unwanted mix again, because there are scripts rather than data.

A possible solution is to create a directory directly under Pywikibot root such as t, which is short for "texts", is one letter long and very unlikely to appear at any time as a Pywikibot system directory:

2023.03.01.  20:35    <DIR>          .
2023.03.01.  20:35    <DIR>          ..
2022.11.21.  01:09    <DIR>          .git
2022.10.12.  07:16    <DIR>          .github
2023.01.30.  13:38    <DIR>          .svn
2023.03.04.  20:07    <DIR>          __pycache__
2022.10.12.  07:35    <DIR>          _distutils_hack
2023.03.03.  14:32    <DIR>          apicache-py3
2023.01.30.  14:53    <DIR>          docs
2023.02.19.  21:28    <DIR>          logs
2022.10.12.  07:35    <DIR>          mwparserfromhell
2022.10.12.  07:35    <DIR>          pkg_resources
2023.01.30.  13:35    <DIR>          pywikibot
2023.01.30.  22:44    <DIR>          scripts
2022.10.12.  07:35    <DIR>          setuptools
2023.02.24.  11:08    <DIR>          t
2023.02.14.  12:15    <DIR>          tests

Now instead 'myfile.txt' you may use 't/myfile.txt' (use / both on Linux and on Windows!) when you save and open files. This is not a big pain, and your saved data will be in a separate directory.