Manual:Pywikibot/Cookbook/Creating pages based on a pattern

From mediawiki.org

Pywikibot is your friend when you want to create a lot of pages that follow some pattern. In the first task we create more than 250 pages in a loop. Then we go on to categories. We prepare a lot of them, but create only as many in one run that we want to fill with articles, in order to avoid a lot of empty categories.

Rules of orthography[edit]

Rules of Hungarian orthography have 300 points, several of which have a lot of subpoints marked with letters. There is no letter a without b, and last letter is l. We have templates pointing to these on an outer source. Templates cannot be used in an edit summary, but inner links can, so we create a lot of pages with short inner links that hold these templates. Of course, bigger part is a bot work, but first we have to list the letters. Each letter from b to l gets a list with the numbers of points of which this is the last letter (lines 5–12). For example, 120 is in the list of e, so we create the pages for the 120, 120 a) ... 120 e) points. The idea is to build a title generator (from line 14). (It also could be a page generator, but title was more comfortable.)

The result is at hu:Wikipédia:AKH. page marks the actual subpage which has text, while mainpage with maintext the main page. As we get the titles from the iterator (line 41–), we create the text with the appropriate template and a standard part, we create the page, and add its link to the text of the main page. At the end we save the main page (line 63–).

import pywikibot as p
site = p.Site()
mainpage = p.Page(site, 'WP:AKH')

b = [4, 7, 25, 27, 103, 104, 108, 176, 177, 200, 216, 230, 232, 261, 277, 285, 286, 288, 289, 291]
c = [88, 101, 102, 141, 152, 160, 174, 175, 189, 202, 250, 257, 264, 267, 279, 297]
d = [2, 155, 188, 217, 241, 244, 259, 265,]
e = [14, 120, 249,]
f = [82, 195, 248,]
g = [226,]
i = [263,]
l = [240,]

def gen():
    for j in range(1, 301):
        yield str(j)
        if j in b + c + d + e + f + g + i + l:
            yield str(j) + ' a'
            yield str(j) + ' b'
        if j in c + d + e + f + g + i + l:
            yield str(j) + ' c'
        if j in d + e + f + g + i + l:
            yield str(j) + ' d'
        if j in e + f + g + i + l:
            yield str(j) + ' e'
        if j in f + g + i + l:
            yield str(j) + ' f'
        if j in g + i + l:
            yield str(j) + ' g'
        if j in i + l:
            yield str(j) + ' h'
            yield str(j) + ' i'
        if j in l:
            yield str(j) + ' j'
            yield str(j) + ' k'
            yield str(j) + ' l'

maintext = ''
summary = 'A szerkesztési összefoglalókban használható hivatkozások létrehozása a helyesírási szabályzat pontjaira'

for s in gen():
    print(s)
    title = 'WP:AKH' + s.replace(' ', '')
    li = s.split(' ')
    try:
        s1 = li[0] + '|' + li[1]
        s2 = li[0] + '. ' + li[1] + ')'
    except IndexError:
        s1 = li[0]
        s2 = li[0] + '.'
    templ = '{{akh|' + s1 + '}}\n\n'
    print(title, s1, s2, templ)
    maintext += f'[[{title}]] '
    page = p.Page(site, title)
    print(page)
    text = templ
    text += f'Ez az oldal hivatkozást tartalmaz [[A magyar helyesírás szabályai]] 12. kiadásának {s2} pontjára. A szerkesztési összefoglalókban '
    text += f'<nowiki>[[{title}]]</nowiki> címmel hivatkozhatsz rá, így a laptörténetekből is el lehet jutni a szabályponthoz.\n\n'
    text += 'Az összes hivatkozás listája a [[WP:AKH]] lapon látható.\n\n[[Kategória:Hivatkozások a helyesírási szabályzat pontjaira]]\n'
    print(text)
    page.put(text, summary)

maintext += '\n\n[[Kategória:Hivatkozások a helyesírási szabályzat pontjaira| ]]'
print(maintext)
mainpage.put(maintext, summary)

Categories of notable pupils and teachers[edit]

We want to create categories for famous pupils and teachers of Budapest schools based on a pattern. Of course, this is not relevant for each school; first we want to see which article has "famous pupils" and "famous teachers" section which may occur in several forms, so the best thing is to review it by eyes. We also check if the section contains enough notable people to have a category.

In this task we don't bother creating Wikidata items; these categories are huwiki-specific, and creating items in Wikidata by bot needs an approval.

Step 1 – list the section titles of the schools onto a personal sandbox page
We use the extract_sections() function from textlib.py to get the titles. This returns a NamedTuple in which .sections holds the sections as (title, content) tuples, from which element [0] is the desired title with its = signs.
Note that extract_sections() is not a method of a class, just a function, thus it is not aware of site, and must explicitely get it.
The result is here.
>>> import pywikibot
>>> from pywikibot.textlib import extract_sections
>>> site = pywikibot.Site()
>>> cat = pywikibot.Category(site, 'Budapest középiskolái')
>>> text = ''
>>> for page in cat.articles():
...   text += '\n;' + page.title(as_link=True) + '\n'
...   sections = [sec[0] for sec in extract_sections(page.text, site).sections]
...   for sect in sections:
...     text += ':' + sect.replace('=', '').strip() + '\n'
...
>>> pywikibot.Page(site, 'user:BinBot/try').put(text, 'Listing schools of Budapest')
Step 2 – manual work
We go through the schools, remove the unwanted and the subtitles, and mark with :pt after the title if we want to create categories both for pupils and teachers, and with :p, if only for pupils. There could be also a :t, but isn't. This is the result.
We don't want to create a few dozens of empty categories at once because the community may not like it. Rather, we mark the schools we want to work on soon with the beginning of the desired category name and the sortkey, as shown here, and the bot will create the categories if the name is present and the category does not exist yet.
If you don't like the syntax used here,never mind, it's up to you. This is just an example, you can create and parse any syntax, any delimiters.
Step 3 – creating the categories
We read the patterns from the page with a regex, parse them, and create the name and content of the category page (including sortkey within the parent category).
The script creates a common category, then one for the pupils, and then another for the teachers only if necessary.
Next time we can add new names to schools on which we want to work that day; the existing categories will not be changed or recreated.
import re
import pywikibot

site = pywikibot.Site()
base = pywikibot.Page(site, 'user:BinBot/try')
regex = re.compile(r';\[\[(.*?)\]\]:(pt?):(.*?):(.*?)\n')
main = '[[Kategória:Budapesti iskolák tanárai és diákjai iskola szerint|{k}]]\n'
comment = 'Budapesti iskolák diákjainak, tanárainak kategóriái'
cattext = \
 'Ez a kategória a{prefix} [[{school}]] és jogelődjeinek {member} tartalmazza.'

for m in regex.findall(base.text):
    cat = pywikibot.Category(site, m[2] + ' tanárai és diákjai')
    if not cat.exists():
        cat.put(main.format(k=m[3]), comment, minor=False, botflag=False)
    prefix = 'z' if m[0][0] in 'EÓÚ' else ''  # Some Hungarian grammar stuff
    # Pupils
    catp = pywikibot.Category(site, m[2] + ' diákjai')
    if not catp.exists():
        text = cattext.format(prefix=prefix, school=m[0], member='diákjait')
        text += f'\n[[{cat.title()}|D]]\n'
        catp.put(text, comment, minor=False, botflag=False)
    if not 't' in m[1]:
        continue
    # Teachers
    catt = pywikibot.Category(site, m[2] + ' tanárai')
    if not catt.exists():
        text = cattext.format(prefix=prefix, school=m[0], member='tanárait')
        text += f'\n[[{cat.title()}|T]]\n'
        catt.put(text, comment, minor=False, botflag=False)