Topic on Talk:Growth/Feature summary

Suggested Links and interaction with non-prose text

6
Folly Mox (talkcontribs)

Will / does the "suggested links" variant of the "links" task ever suggest adding a link to text that is not part of article prose, like reference information or infobox parameter values? I could see such an interaction being mildly helpful for linking notable contributors from cited works (and en.wp's bot populace could probably handle changing the wikilinks to "|author-link=" parameters), but it seems like it would be more dangerous than helpful (wikilink – title link conflicts come immediately to mind). I presume the short answer to my question is "no" and "it's already in the documentation", but I just wanted to check and I'm low key bad at reading. Folly Mox (talk) 17:17, 6 December 2023 (UTC)

KStoller-WMF (talkcontribs)

@Folly Mox Sorry, I investigated your question last week but I apparently never responded!

You can see in the model info that there are some Hard-coded rules for (not) linking. But I don't think that really answers your question, so here's some further info:

The "add a link" Structured task doesn't suggest linking any text that is within an infobox.

Currently links may be suggested in the reference section, but communities can adjust this via Special:EditGrowthConfig. There is a lengthy discussion about the various pros and cons of different technical approaches to how we should handle avoiding adding links in sections that might be problematic: Add a link: algorithm improvements: Avoid recommending links in sections that usually don't have links. Ultimately we decided to allow communities to have more customization options by adding "a list of section names in which no links should be recommended" to Community Configuration. You can't see this configuration option on English Wikipedia because the "add a link" Structured task isn't available on enwiki yet. However, you can see the Configuration option on Test wiki: https://test.wikipedia.org/wiki/Special:EditGrowthConfig

Generally wikis decide to add the References section to that list (and Notes, External Links, and any other section that might be awkward to internally link).

Does that help? Sorry again for the delayed response!

Folly Mox (talkcontribs)

Thanks, User:KStoller-WMF, that does answer my question and no worries on forgetting to reply right away (that's my own typical mode of response). I do have some follow ups though:

Does this tool look at the post-render page when determining which section it's in, or is it working off the base wikitext? I ask this because although citations will appear in a "References" (or similar) section, the information contained in them is almost always in the article body, except when using en:Help:List-defined references. The wikitext of a "References" section usually contains only a call to en:Template:Reflist or a "‹references›" html-style tag.

My second question is whether individual communities are / will be able to implement their own exclusion lists. En.wp has a fairly well enforced style guideline en:MOS:OL, which cautions against wikilinking common terms, including general fields of knowledge ("psychology"), household items ("oven"), modern nation states ("People's Republic of China"), et al. I understand the algorithmic version of the "Add a link" Structured Task to use things like revert rate to adjust itself (although my understanding may, characteristically, be a misunderstanding), and I'm concerned that maybe the time window allotted for algorithmic reinforcement to downrank link targets may either not capture later script runs that remove commonly linked terms, or encourage quick reversion that could demoralize newcomers.

Thanks again. I appreciate your work.

Folly Mox (talkcontribs)

(One wrinkle – or anti-wrinkle, I suppose, so a clothes iron or skin cream – is that sometimes citation information appears in the wikitext in a "Sources" (or similar) section, for shortened footnotes or manually formatted citations. These are sometimes not in citation templates, since they can be specified with the "refbegin" and "refend" keywords, and so lend themselves less often to e.g. title-link–wikilink conflicts, and also any "Sources" section can be excluded using the already implemented method. Just mentioning this for completeness re: followup question one.)

KStoller-WMF (talkcontribs)

@Folly Mox Thanks for thinking about Growth features in such depth! There was a lot of thought put into training the "add a link" model, but it actually predates my time working on the Growth team, so I'm asking a colleague to chime in to help answer your questions more completely.

My understanding is that we are looking at the base wikitext, but my colleague can respond to your first question in more detail.

In response to your second question: currently individual communities are able to implement their own exclusion list for: templates, categories, and section names. There isn't currently a way to list specific terms or phrases to never link on a per wiki basis within Community Configuration. We can certainly continue to add to the Wikidata items that shouldn't be linked, but that's a hard-coded list that applies to all wikis.

That being said, the way the algorithm is trained is specific to each language wiki. So, in theory, the algorithm should follow the guidelines if the articles it is trained on are following those guidelines. That being said, the algorithm isn't perfect, which is why newcomers receive onboarding and we try to make it clear that the suggestions will sometimes be incorrect and should be rejected.

But inevitably new editors don't always read all onboarding, and mistakes will be made as new editors are learning. Hopefully we can customize the English Wikipedia configuration in way to make the task as good as it can be for new editors, but also accept that there will be mistakes. In general, the revert rate for this task on other wikis is fairly low, and lower than the unstructured link task: Growth/Personalized first day/Structured tasks/Add a link/Experiment analysis, December 2021. And the task helps more new account holders try editing and that even flows into improving new editor retention. (Maybe this is more info than you want, but we are also releasing a Leveling Up feature to more wikis soon that encourages new editors to "Level up" and try new task types, so new editors start learning more valuable editing skills).

I'll ask my colleague who is more familiar with the algorithm and training of the model to chime in with more details. Do you have other questions? Do you have any initial thoughts about this task? Do you think it's worth testing on English Wikipedia? When we start sharing more information in January, are there other editors we should loop in to get their opinion about this task?

Also, hopefully this was already clear, but ultimately the English Wikipedia community can decide if this task is enabled or disabled via https://en.wikipedia.org/wiki/Special:EditGrowthConfig. So if it's ultimately decided that this task doesn't work well for English Wikipedia, then it can easily be disabled by any English admin. I'm hopeful we can at least test it out for a period of time before that decision is made, but ultimately that too is a community decision. Thanks again for thinking about all of this and asking such insightful questions!

Trizek (WMF) (talkcontribs)

Hello @Folly Mox

I have a few replies for you regarding your two questions. We asked the team that built the model.

The model itself works on the wikitext and not on the post-processed text. We filter the wikitext to add only links to plain text in the body of the article and try to exclude infoboxes, references, etc. Plus, some specific sections in the body can be excluded in Growth configuration, these sections will get no suggested links.

We do take into account some aspects of manual of style such as not to link to a target again if it already appears earlier in the text. But it is not possible to encode all policies into the algorithm (and it is not scalable given the different manual of styles in different languages).

However, we train the model based on the already existing links in the given language version of Wikipedia. We do take into account terms that we dont want to link by adding a set of hard-coded rules to not link to (see here). This includes common terms such as dates, units, etc. We have updated this list in the past after community feedback (T279434, general improvements valid for all languages).

Unrelated, but useful to know: we are getting closer to the end of calendar year, meaning that we might be slower on our replies. In January, we will start the work on the deployment at English Wikipedia, and on how your community can enable this feature. :)

Reply to "Suggested Links and interaction with non-prose text"