Talk:Growth/Feature summary

About this board

Suggested Links and interaction with non-prose text

6
Folly Mox (talkcontribs)

Will / does the "suggested links" variant of the "links" task ever suggest adding a link to text that is not part of article prose, like reference information or infobox parameter values? I could see such an interaction being mildly helpful for linking notable contributors from cited works (and en.wp's bot populace could probably handle changing the wikilinks to "|author-link=" parameters), but it seems like it would be more dangerous than helpful (wikilink – title link conflicts come immediately to mind). I presume the short answer to my question is "no" and "it's already in the documentation", but I just wanted to check and I'm low key bad at reading. Folly Mox (talk) 17:17, 6 December 2023 (UTC)

KStoller-WMF (talkcontribs)

@Folly Mox Sorry, I investigated your question last week but I apparently never responded!

You can see in the model info that there are some Hard-coded rules for (not) linking. But I don't think that really answers your question, so here's some further info:

The "add a link" Structured task doesn't suggest linking any text that is within an infobox.

Currently links may be suggested in the reference section, but communities can adjust this via Special:EditGrowthConfig. There is a lengthy discussion about the various pros and cons of different technical approaches to how we should handle avoiding adding links in sections that might be problematic: Add a link: algorithm improvements: Avoid recommending links in sections that usually don't have links. Ultimately we decided to allow communities to have more customization options by adding "a list of section names in which no links should be recommended" to Community Configuration. You can't see this configuration option on English Wikipedia because the "add a link" Structured task isn't available on enwiki yet. However, you can see the Configuration option on Test wiki: https://test.wikipedia.org/wiki/Special:EditGrowthConfig

Generally wikis decide to add the References section to that list (and Notes, External Links, and any other section that might be awkward to internally link).

Does that help? Sorry again for the delayed response!

Folly Mox (talkcontribs)

Thanks, User:KStoller-WMF, that does answer my question and no worries on forgetting to reply right away (that's my own typical mode of response). I do have some follow ups though:

Does this tool look at the post-render page when determining which section it's in, or is it working off the base wikitext? I ask this because although citations will appear in a "References" (or similar) section, the information contained in them is almost always in the article body, except when using en:Help:List-defined references. The wikitext of a "References" section usually contains only a call to en:Template:Reflist or a "‹references›" html-style tag.

My second question is whether individual communities are / will be able to implement their own exclusion lists. En.wp has a fairly well enforced style guideline en:MOS:OL, which cautions against wikilinking common terms, including general fields of knowledge ("psychology"), household items ("oven"), modern nation states ("People's Republic of China"), et al. I understand the algorithmic version of the "Add a link" Structured Task to use things like revert rate to adjust itself (although my understanding may, characteristically, be a misunderstanding), and I'm concerned that maybe the time window allotted for algorithmic reinforcement to downrank link targets may either not capture later script runs that remove commonly linked terms, or encourage quick reversion that could demoralize newcomers.

Thanks again. I appreciate your work.

Folly Mox (talkcontribs)

(One wrinkle – or anti-wrinkle, I suppose, so a clothes iron or skin cream – is that sometimes citation information appears in the wikitext in a "Sources" (or similar) section, for shortened footnotes or manually formatted citations. These are sometimes not in citation templates, since they can be specified with the "refbegin" and "refend" keywords, and so lend themselves less often to e.g. title-link–wikilink conflicts, and also any "Sources" section can be excluded using the already implemented method. Just mentioning this for completeness re: followup question one.)

KStoller-WMF (talkcontribs)

@Folly Mox Thanks for thinking about Growth features in such depth! There was a lot of thought put into training the "add a link" model, but it actually predates my time working on the Growth team, so I'm asking a colleague to chime in to help answer your questions more completely.

My understanding is that we are looking at the base wikitext, but my colleague can respond to your first question in more detail.

In response to your second question: currently individual communities are able to implement their own exclusion list for: templates, categories, and section names. There isn't currently a way to list specific terms or phrases to never link on a per wiki basis within Community Configuration. We can certainly continue to add to the Wikidata items that shouldn't be linked, but that's a hard-coded list that applies to all wikis.

That being said, the way the algorithm is trained is specific to each language wiki. So, in theory, the algorithm should follow the guidelines if the articles it is trained on are following those guidelines. That being said, the algorithm isn't perfect, which is why newcomers receive onboarding and we try to make it clear that the suggestions will sometimes be incorrect and should be rejected.

But inevitably new editors don't always read all onboarding, and mistakes will be made as new editors are learning. Hopefully we can customize the English Wikipedia configuration in way to make the task as good as it can be for new editors, but also accept that there will be mistakes. In general, the revert rate for this task on other wikis is fairly low, and lower than the unstructured link task: Growth/Personalized first day/Structured tasks/Add a link/Experiment analysis, December 2021. And the task helps more new account holders try editing and that even flows into improving new editor retention. (Maybe this is more info than you want, but we are also releasing a Leveling Up feature to more wikis soon that encourages new editors to "Level up" and try new task types, so new editors start learning more valuable editing skills).

I'll ask my colleague who is more familiar with the algorithm and training of the model to chime in with more details. Do you have other questions? Do you have any initial thoughts about this task? Do you think it's worth testing on English Wikipedia? When we start sharing more information in January, are there other editors we should loop in to get their opinion about this task?

Also, hopefully this was already clear, but ultimately the English Wikipedia community can decide if this task is enabled or disabled via https://en.wikipedia.org/wiki/Special:EditGrowthConfig. So if it's ultimately decided that this task doesn't work well for English Wikipedia, then it can easily be disabled by any English admin. I'm hopeful we can at least test it out for a period of time before that decision is made, but ultimately that too is a community decision. Thanks again for thinking about all of this and asking such insightful questions!

Trizek (WMF) (talkcontribs)

Hello @Folly Mox

I have a few replies for you regarding your two questions. We asked the team that built the model.

The model itself works on the wikitext and not on the post-processed text. We filter the wikitext to add only links to plain text in the body of the article and try to exclude infoboxes, references, etc. Plus, some specific sections in the body can be excluded in Growth configuration, these sections will get no suggested links.

We do take into account some aspects of manual of style such as not to link to a target again if it already appears earlier in the text. But it is not possible to encode all policies into the algorithm (and it is not scalable given the different manual of styles in different languages).

However, we train the model based on the already existing links in the given language version of Wikipedia. We do take into account terms that we dont want to link by adding a set of hard-coded rules to not link to (see here). This includes common terms such as dates, units, etc. We have updated this list in the past after community feedback (T279434, general improvements valid for all languages).

Unrelated, but useful to know: we are getting closer to the end of calendar year, meaning that we might be slower on our replies. In January, we will start the work on the deployment at English Wikipedia, and on how your community can enable this feature. :)

Reply to "Suggested Links and interaction with non-prose text"

En.wp thread about local configuration

2
Folly Mox (talkcontribs)

I've opened what I hope will be a discussion about Growth Tools (specifically Suggested Edits and its local configuration on en.wp, but also conflating technically distinct tools such as the Newcomer Homepage help panel) at en:Wikipedia:Village pump (WMF)#Let's configure: Suggested Edits. I welcome any corrections, as well as any input and links members of Growth Team would like to provide. For clarity, I have attempted to frame this discussion as how the enwiki community can improve the interaction of the Growth Tools with our project. It is not intended to apportion blame anywhere but our own doorstep, but since I do focus on the problems, positive counterpoints might help lift the tone, and I lack data. Folly Mox (talk) 04:28, 19 November 2023 (UTC)

Trizek (WMF) (talkcontribs)

It is a fantastic initiative, thank you very much!

I already responded, and more will come (regarding data, mostly).

Reply to "En.wp thread about local configuration"

Thanks for the introduction video

1
Elitre (WMF) (talkcontribs)

I added Italian subs to File:Suggested edits module.webm. A few notes:

  • when recording on our websites, try to get rid of any sitenotice/centralnotice, as it's distracting.
  • when possible, try to keep consistency between site language and interface language - this also helps staying focused.
Reply to "Thanks for the introduction video"
Czar (talkcontribs)

>3,262 mentor questions have been asked as of March 2020

Is this dataset publicly available somewhere?

MMiller (WMF) (talkcontribs)

@Czar -- I think you're asking for the list of questions so you can see what they say and get a sense for common questions. If you're willing to use Google Translate (or if you read other languages!) an easy thing to do is to go to the Recent Changes feed on one of the wikis that have the feature and filter to the "mentorship module question" edit tag. Here's a link where I've done that in French Wikipedia. You could also look in any of these Wikipedias: Czech, Korean, Vietnamese, Arabic, Ukrainian, Armenian, Serbian, Hungarian, and Basque.

I'll also tag our Arabic ambassador, @Dyolf77 (WMF), who once read through and categorized all the Arabic questions. He may have some interesting counts for you.

Dyolf77 (WMF) (talkcontribs)

@MMiller (WMF) Thanks for bringing this point about questions from newcomers. @Czar Hi, I analysed more than 1300 questions from Arabic contributors and made this table. I found that the autobiography/biography question had the biggest number. Followed by nonsense questions and in the 3rd position, people just saying "Hello".

Czar (talkcontribs)

@Dyolf77 (WMF) Thanks! Is there a Quarry query I could use to run this again for another language's wiki? Also didn't know about the googletranslate function in Sheets—very nice. Isn't "general questions" the biggest bucket, even before "biography" questions?

Dyolf77 (WMF) (talkcontribs)

@Czar Good point, maybe it is a bad label from me, I meant by "General questions", other different questions with various topics. Biography is the most asked question, still. So in general questions, newcomers asked about editing, sources, reviewing edits, deleted articles etc. About the table, it was the work of @Martin Urbanec (WMF), he can tell about the code.

Martin Urbanec (WMF) (talkcontribs)

@Czar Hello, thanks for your question! I was the one who made the data for Dyolf77. I did that by running https://gist.github.com/urbanecm/ec8d74604f584d2272edaf92a3a3711f. It should be pretty easy to run it elsewhere, but toolforge access is needed to be able to do so without changing the script significantly. If you don't have it, I can generate the dataset for a different wiki for you. As Marshall stated. the questions are available as MediaWiki revision, my script merely puts that into a simple table for easier analysis. Hope this helps,

Reply to "Question data"
There are no older topics