Talk:Reading/Web/Projects/Related pages

Jump to navigation Jump to search

About this board

This is a place to discuss feedback for the Related pages beta feature (on mobile or desktop).

Here is a summary of issues raised along with proposed responses:

Reading/Web/Projects/Related pages#Initial Community Feedback

Here is a proposal for moving forward:

Reading/Web/Projects/Related pages#Proposal for moving forward

An RFC asking for feedback to the proposed next steps also here: m:Talk:Requests for comment/Related Pages

Johnywhy (talkcontribs)

Eg, i'd like to create a tag, MyTag, and then relate several pages with {{#related:MyTag}}. Is that possible?

I'm guessing it could be done with a common page that they all point to, where the hub page is called "MyTag". Maybe could be a sub-category page.

We want to tag our articles (topic-tags, not revision-tags). And then display "Related Articles", based on those tags.

This article describes a method using SemanticWiki, https://clkoerner.com/2012/08/28/use-semantic-mediawiki-semantic-forms-to-create-a-folksonomy-for-tagging-related-pages/ but that seems a heavy solution, since we don't need any other SemanticWiki features. Would prefer a simpler method.

Can't use Categories, as we're already using Categories as Categories, for organizing and TOC. We consider Categories and Tags to be different concepts. That Folksonomy article agrees. Eg:

  • Category is Fruit.
  • Title is Oranges.
  • Tags are citrus, segmented, juicy, vitamin C, cold prevention, breakfast

Or

  • Category: Books
  • Title: Cien Años de Soledad
  • Tags: Spanish, surrealism, Colombia, magical realism, Latin American

Tags vs. Categories:

  • Non-hierarchical: We use Category as a hierarchical organizing structure. We want articles to appear in only one Category in TOC, but articles can have multiple tags. Tags are non-hierarchical, and can apply across different Categories.
  • Permissions: We want separate permissions for allowing users to add Tags and Categories to an article.
  • Hidden: Tags should be hidden from Extension:CategoryTree. Tags should not be listed in any Special page, Transclude, or MagicWord that lists Categories.
Johnywhy (talkcontribs)

I've built a topic-tagging system, for inline tags, with some nice features like anchors, descriptions, tag-list, and highlighting. Extension:TopicTags

You can view a demo here.

Reply to "Request: Relate by Topic-Tag"
TheTruthCreator (talkcontribs)

At the bottom of the page for the Muffin proxy, there is a link to the edible muffin in the related pages.

Jkatz (WMF) (talkcontribs)

Hi. Short articles like this are very challenging for an algorithm, but a proposed change to our algorithm might make a difference: Extension:RelatedArticles/CirrusSearchComparison#Hollywood Library

But no matter what we do, algorithms will occasionally get things wrong, as we so often find with search. Here is a solution that should address the most egregious examples:

Editors can change the suggested articles given by adding up to 3 manually curated examples to this part of the page navigation.

{{#related:new page title1}}
{{#related:new page title2}}
{{#related:new page title3}}

For example, on https://en.wikipedia.org/wiki/Korur_language the related pages have been over-ridden to:

{{#related:Western Oceanic languages}}
{{#related: New Guinea}}
{{#related: Mbula language}}

Let me know if you continue to have trouble.

Seslichathaber (talkcontribs)

inanın hiç bir şey anlamadım ben sesli sohbet sayfasına bakayım dedim ama sayfada hep saçma birşeyler mevcut

This post was hidden by Jkatz (WMF) (history)
Reply to "dhruba guha"
NikolaiNyegaard (talkcontribs)

In my opinion it would be more appealing to have the Related Pages above the sources section, or somewhere in between the paragraphs, so its more available and visible, rather than hiding it at the very bottom of the page.

This post was hidden by Tacsipacsi (history)
Tacsipacsi (talkcontribs)

Putting between two sections is much more difficult than just at the end of the page and also can cause errors (e.g. what to do if there is not exactly one such section or it's just somewhere in the middle of the article). Localization is also more difficult as the software needs to know all possible section titles.

Reply to "Different placement"

Hovercards for related pages would be neat.

7
MammothManni (talkcontribs)

I love hovercards and would like to see the feature implemented for related pages as well.

Melamrawy (WMF) (talkcontribs)

Hi @MammothManni, can you please elaborate a bit on that? You mean you would like related pages cards to resemble those of hovercards?

MammothManni (talkcontribs)

Sorry if I was not clear. I would like to see a hovercard open up if I hover over the suggested related page card that is shown underneath the article. I.e. a short preview of the related page should open up, the same short preview that also pops up if you hover over a link within an article and have the beta-feature "hovercards" activated.

Melamrawy (WMF) (talkcontribs)

I see what you mean now. Basically making the cards more brief and hoverable. This is a design decision @Npangarkar (WMF) that requires research I guess.

MammothManni (talkcontribs)

Basically I want more information about the related page before clicking on it. You could make the card bigger and integrate more info already. Or, and that was my suggestion, just make a small window pop up with more information when you hover over the card.

Npangarkar (WMF) (talkcontribs)

@MammothManni Hey, this sounds like a good idea, we can expand the related pages card on hover. the bigger issue is we are still figuring out the value of related pages on desktop. you mentioned hovercards, so I'm assuming you are talking wikipedia on desktop (Vector) there have been discussions about removing related pages feature from Vector. @Melamrawy (WMF) can we look into the progress of consultation around that?

MammothManni (talkcontribs)

@Npangarkar (WMF) Yes, I am using Vector and only talking about the desktop version.

Reply to "Hovercards for related pages would be neat."
Ruud Koot (talkcontribs)
Jkatz (WMF) (talkcontribs)

@Ruud Koot if it's based on calculated similarity rather than editor bias, is it still an NPOV violation? It is not favoritism...if these were authors, I don't think we would feel the same way.

Ruud Koot (talkcontribs)

Yes, because everything about the presentation of these results tries to make the picks look like an objective selection instead of a number of best-effort search results (most notably them being displayed without a request from the reader, close to other hand-picked navigational content). This would be very different if these results were just the top results on several pages of search results, displayed after a user-initiated query. Context, presentation, and user expectation matters.

When the algorithms selects, say, classic authors this may not always be harmful, but when it starts picking commercial companies or political parties it certainly is. See also Search engine manipulation effect.

Jkatz (WMF) (talkcontribs)

Okay, thanks for the additional context.

Reply to "Example of an NPOV violation"
Melamrawy (WMF) (talkcontribs)

Please check the moving to stable plan and proposal here. Thanks

Reply to "Moving to stable"
MPS1992 (talkcontribs)

The English Wikipedia article "Anita Krajnic case" is about a female animal rights activist campaigning for the rights of pigs. The related pages feature innocently recommends "Pig-faced women" as related. Probably the latter being a Featured Article contributes to this choice. But this is the kind of result that will cause someone to make the assumption that this choice is a humorous Wikipedia editorial commentary about the BLP who is the subject of the case, and that it targets her because she is female. Results like this are a timebomb waiting to explode in a spectacular fashion similar to the "Wikipedia's Sexism Toward Female Novelists" debacle. English Wikipedia has a fairly high number of Featured Articles whose "amusing" aspects are open to misinterpretation or offence in this way, "Gropecunt Lane" is just one example.

Jkatz (WMF) (talkcontribs)

Thank you for bringing this to our attention. To put it lightly, this is far from the message our software should be conveying to the readers of Wikipedia and I appreciate your sensitivity to how this looks externally, as well. Your guess that the featured article status has something to do with it is correct! Incidentally, we are actually removing "featured status" from the selection criteria, because it was leading to too many pop articles dominating the suggestions of obscure pages. This will launch when we rollout to all mobile users (though, the current plan is to remove this beta feature from desktop) as detailed here.

As to specific instances like this, we have created a way for conscientious editors to override misleading or offensive suggestions by inserting new suggestions. Copying from our FAQ:

...editors can change the suggested articles given by adding up to 3 manually curated examples to this part of the page navigation.

{{#related:new page title1}}

{{#related:new page title2}}

{{#related:new page title3}}

For example:

On https://en.wikipedia.org/wiki/Korur_language the related pages have been over-ridden to:

{{#related:Western Oceanic languages}}

{{#related: New Guinea}}

{{#related: Mbula language}}

I wouldn't presume to override the result for you, but let me know if I can help in another way!

Reply to "BLP / feminism / something issue"

Unrelated (funny) related page on DEWP

5
Reaper35 (talkcontribs)

I just want to mention that on the German article De:Cracker (Gebäck) (same as En:Cracker (food) ) has as related page De:Crack (Droge) . I wouldn't say that these two objects are really related, beside similar names. Well, it made me laugh, thank you anyway. ;)

(I'm interested in how you going to fix things like this, just as I'm a hobby programmer.)

FriedhelmW (talkcontribs)

I bet this will not be fixed. You have to override it manually using the {{#related: xyz}} parser function.

Gestumblindi (talkcontribs)

Got your comment somehow broken? Or isn't it displayed correctly for me? It seems that the names of (or links to) the (Wikipedia?) articles you intended to mention are missing from your post? Edit: I started a topic about this issue here.

Seb35 (talkcontribs)

The syntax was broken, I just fixed it. Wikitext was [[De:Cracker (Gabäck)]] which is an interwiki link, I changed to [[:De:Cracker (Gabäck)]].

Reaper35 (talkcontribs)

Thank you. Well, I tried to type [[:de:... but the visual editor automatically changed/added the link in another way than I expected. It was my first post using this new topic feature. After I posted it I saw my post how I expected it to be - but I haven't reloaded the page, so I only saw the JavaScripts result.

Reply to "Unrelated (funny) related page on DEWP"

Another more serious NPOV issue

1
Pudeo (talkcontribs)

I just read an article about a Soviet-funded Cold War era front organization and it gave "Israel" in it despite no apparent connection, other than sometimes Jews are said to be overpresented in those movements :-) Pretty contentious.

Another example: let's say the UK Independence Party article gave "Fascism" in the Related pages. That would be a NPOV violation as well, it's only something the sharpest critics of the party would say.

Everyone who's been in the English Wikipedia for longer knows there has been edit-warring over "See also" sections. This does not that those POV issues into account.

Reply to "Another more serious NPOV issue"

J.K. Rowling recommended in (almost) every Novels article (or how the algorithm places too much emphasis on high profile articles)

18
Sadads (talkcontribs)

So I am noticing a pattern: I am assuming the algorithm that suggests the articles is based on the degree of "relatedness" in some form of network analysis, based on link closeness, categories and similar language? I work on novels articles almost exclusively in my volunteer time (I am on the Wikipedia Library team in the rest of the time), and <s>almost every single</s> very many of the novels article I have read, has author pages at the bottom AND one of the authors is J.K. Rowling.

My suggested solution: Might I suggest the algorithm put more weight on categories rather than "closeness" in links or quality or pageviews? I would imagine most of our readers would favor like types of items (for example, novels -> novels instead of novels -> authors, who didn't write the book) -- and we have a real opportunity here to profile some of our less viewed articles (not the stub with 50 views a month, but say, the start/c class with 300-400 views a month), instead of cycling everyone to the highest profile articles in the network. I can see how the kinds of recommendations that have been coming to me will lead to a self reinforcing cycle: the articles with high views and/or high quality get recommended more often that other articles, which means they get more eyes on them, which means more edits to those articles (or less new editors, because they aren't seeing the sloppy bits of Wikipedia -- most of us started on copy edits, and detail fixes when the material was less than ideal).

In short: the algorithm is optimized poorly for the kind of results we want people to get if their is a slight chance that they will become editors or participate in the less visited parts of Wikipedia: we want their eyes on the less visible and good, but not amazing items, because that is how we improve the encyclopedia. I would love to talk through this more @Jdlrobson .

Jdlrobson (talkcontribs)

Thank you so much @Sadads for highlighting this example where the algorithm consistently fails.

I'd suggest that @ Jkatz (WMF) would be a better person to talk through this with since he is the lead for this project. That said he's on paternity leave right now - User:Melamrawy_(WMF) is this something you and User:ABaso_(WMF) could work on capturing in Jon's absence?

Sadads (talkcontribs)

@Jkatz (WMF), @Melamrawy (WMF), @ABaso (WMF) I would be happy to chat or reflect on this.

This morning Rowling wasn't in the articles that I saw her associated with last night: so something changed (or you have a bit of randomness going on). I just get this unnerving feeling that I have seen her image and name on way too many pages --- especially fantasy and/or children's lit pages. She is not the only author we cover, and as someone who would like to see our content grow, its important to show just how diverse our content can be, rather than focusing overwhelmingly on the "high demand" stuff. I feel like, especially on any topic with a popular culture connection, we will end up reinforcing hegemonic topics -- topics with a certain level of cultural ubiquity/dominance -- rather than exciting curiosity in the unknown.

Melamrawy (WMF) (talkcontribs)

Hi @Sadads, I actually didn't encounter a similar J.K Rowling experience while browsing novel articles, can you please walk us through your browsing scenario for better clarity? Thanks.

Sadads (talkcontribs)

Hi @Melamrawy (WMF) so I was working on en:Jack (Homes novel) , and it had Rowling there yesterday; now, the space where Rowling was at is occupied by Harvey Milk and, the place where A. M. Homes was at is now occupied by David Bowie! (Think through a logical connection there :P) These are crazy odd "related articles": but I could see how an algorithm could call them close if it favored community assessment and pageviews and/or closeness via internal links. I do see Rowling (alongside Kafka and Marlyn Monroe !?!?) on the author page for en:A . M. Homes .

I also work on articles en-mass to improve linking and categories for Novels articles, so I can't place exactly which articles Rowling has been before. However, I do know most of these articles are relatively young novels or authors articles (I use tools like edwardbetts.com/find_link/debut_novel to add links). For example, I just linked debut novel on https://en.wikipedia.org/wiki/Sarah_Mason_(novelist) and yet again Rowling (alongside Mary Shelly and Kylie Minogue !?!?). Perhaps the algorithm is seeing the for the "Debut novel" article link combined with connections to common pages like en:Novelist and seeing J.K. Rowling as the closest connected of these articles; I would imagine any conventional reader would see these as connected in the "I am reading about this topic, I might want to learn about that other topic" sense. The common thread here: all of the results I have mentioned are high visibility/high quality articles, with only internal linking connecting them (rather than a common category, or sharing non-top level/generic topic connections). I suspect that the biggest nodes in the Wikipedia link network (if the algorithm is using network analysis) are dominating the internal link "closeness" score.

Anyway its a really neat feature, but provides really odd/blah results, esp. if we want people to see the less visible parts of Wikipedia (which we do, because thats how they become editors). As someone who curates the content: I want search to find the biggest, yet most useful articles, while this feature should be for surfacing the less needed, but more curiously interesting bits.

Sadads (talkcontribs)

I will keep listing articles that I see Rowling on in this thread:https://en.wikipedia.org/wiki/A_Summer_Bird-Cage https://en.wikipedia.org/wiki/Isabel_Fonseca https://en.wikipedia.org/wiki/Andrew_Michael_Hurley https://en.wikipedia.org/wiki/The_Queen_of_the_Tearling https://en.wikipedia.org/wiki/Did_You_Ever_Have_a_Family, https://en.wikipedia.org/wiki/Tell_The_Wolves_I%27m_Home https://en.wikipedia.org/wiki/Cathy_Marie_Buchanan https://en.wikipedia.org/wiki/John_Michael_Cummings

I am also going to track high frequency articles and what topic area: Mary Shelly, Jane Austen (Romance novels), Kylie Minogue (not seeing a pattern), William Gibson (science fiction), Marilyn Monroe (not seeing a pattern)

Sadads (talkcontribs)
Jkatz (WMF) (talkcontribs)

Hi Sadads sorry for the delay. I have been on paternity leave and just catching up. I just took a browse through and I am seeing the JK Rowling phenom with authors, particularly niche authors. I expect this has something to do with pageview volume. That being said, it is not something I have noticed with niche actors. I think this is something to look into as a refinement, but I am curious: do you see it as a blocker for the feature or an improvable? Does the fact that the selections are editable assuage your concerns?

Sadads (talkcontribs)

No worries on the delay: totally understand paternity leave.

Are the selections editable? I am not seeing a clear way in the interface for me to tweak suggestions.

I am just thinking that there is no value added beyond the fully hand curated "see also" and Navboxes: esp. when the links you are adding are either a) already linked in the article (this is already the case with something like 1/4-1/2 of the results I am seeing -- sometimes its even the article itself (I think this problem might be associated with redirects)) or b) so central to the network, that they ought to be common knowledge -- or can be found really easy through link chaining (vis-a-vis the behavior promoted in http://thewikigame.com/ ). The problem, I think, is that you are using a tool designed for getting a "closest to the search term" result (Cirrus search) to do something that ought to be focused on the "nearest, as in same neighbourhood of knowledge".

The real value of linking in Wikipedia for our readers, is the hand curated incongruities between what our readers thought they came to Wikipedia for, and the long chain of other things that are connected to that topic, which excite their curiosity. If you plan to enable this tool: it really ought to provide "unexpected but rationally connected" results that excite the imagination, rather than a) known quantities or b) stuff that doesn't need more attention by potential new editors -- even if its new or different. This tool seems to be at the opposite extreme of the Random Article tool: it provides almost too obvious/central topics that aren't exciting. It would be great to have a list of articles, editable by (admins?), which could be excluded from the results, to force the algorithm to work around these unusually central/important articles as assessed by the algorithm.

I like the idea of pushing more exploration of our Wikis (this is a really valuable engineering effort), and for smaller Wikis which don't have the level of micromanagement of connections (links, categories, navboxes) that happens on English or some of the higher volume edit Wikis, this tool as it stands might makes sense. However, on the bigger wikis, Editors much less flexable than me will be very angry about the tool circumventing their long, hard, hand-curated work AND producing unusually not useful links, when this tool could be doing something that creates innovative new "ah ha!" moments (that serrendipity moment, that makes library research so fun: I would highly recommend reading: http://dp.la/info/2014/02/07/planning-for-serendipity/ ).

The algorithm is just not sophisticated enough and really needs a way to be managed locally so that you aren't having to anticipate the community's tweaking needs centrally. If you could define: a) the variables that rate pages, b) provide an interface where admins could tweak those variables to meet something closer to consensus needs, and exclude pages, c) do more testing with people that see hundreds of pages a day (editors), d) machine learning that prioritizes the kinds of connections that people click through on, I think you would get something that would be really fun for the communities to play with and use. But as it stands now, its not useful in the grand scheme of things: it neither promotes exposure of interesting content to readers, nor exposes them to "new/different/esoteric" content that many of our editors pride themselves working on.

P.s. Some more examples of not useful linking ("useful" ones are the exceptions). Based on this set, you are talking 3 out of 18 links that encourage someone to explore the depths of Wikipedia around similar items, rather than the surface or topics that are already common knowledge:

https://en.wikipedia.org/wiki/Yazoo_and_Mississippi_Valley_Railroad Listed articles: Memphis, Tenessee (central in link network), W.C. Hardy (bizarre, possibly central to network?) and Alabama (central in link network)

https://en.wikipedia.org/wiki/Aim%C3%A9_Ngoy_Mukena Military of the Democratic Republic of Congo (linked on page), Democratic Republic of Congo (linked on page), Lubumbashi (bizarre, possibly central to network?)

https://en.wikipedia.org/wiki/Francis_Patrick_Donovan Gough Whitlam (useful), Stanley Bruce (useful), Australia (central to network, linked on page)

https://en.wikipedia.org/wiki/Thomas_Meehan_(writer) Musical theatre (central, and linked on page), Maury Yeston (Unexpected, interesting connection: useful), Hairspray (2007 film) (linked on page)

https://en.wikipedia.org/wiki/Michael_E._Smith Aztec (central), OCLC (central and bizarre), Nahuatl (central)

https://en.wikipedia.org/wiki/James_Morrill Minnesota (central), W.E.B. Du Bois (central), Michigan State University (central and/or tangential).

Sadads (talkcontribs)
Jkatz (WMF) (talkcontribs)

@Sadads Thank you for your thoughtful analysis! I agree with most of your concerns here and proposals, specifically:

  1. The algorithm could be better, and in an ideal world, we would improve it. The Android team, which is already using this algorithm is planning on tweaking it soon
  2. The algorithm variables/rules should be highly visible (we are planning on publishing them in simple english sometime soon)
  3. Ideally, editors could tweak the algorithm at the local level or at least be able to blacklist, or edit the items

To clarify some points. The read more options are editable. See here for an explanation: Topic:Suqj6do13qpmlerd it also explains some of the benefits over 'see also'.

Regarding the notion of 'is this useful', I would start with, is it harmful. Since it is at the bottom of the article, below the references, it is hard to argue that it is getting in the way of other content. For the vast majority of pageviews, pages are very long and anyone who has scrolled past the references is clearly looking for something more than they are finding and there is a need that was not satisfied with the various links above (such as see also) that make wikipedia so great.

Now, is it actually helpful. The data suggests not only that users are clicking on it but that they are continuing to click on it at a high rate. As has been pointed out several times in this thread, you can game click-through rates with flashing lights, pictures etc. However, what happens over time is that the click-through rates on unhelpful links drop over time as users come to trust them less and less. This is not what we were seeing with 'related pages' at all before I left. I'll have to check if this has changed since returning from pat leave, but my sql access is funky right now.

So, if readers are continuing clicking at high rates more frequently on related pages links than any other link on the page (even more than links in the first section), despite being at the bottom, I am inclined to agree it is useful navigation tool to our readers (if not to power-editors).

Could it be better? Definitely. The question is, now that we are helping people spend more time on Wikipedia, by 5% on desktop, 10% on mobile (as of last check), how much more do we want to invest in this feature to improve it more v. other features? If editors wont be happy with it as is, then maybe more investment is warranted on that alone. Otherwise, it is a question of what else we might do with those resources.

What do you think? (and I owe you updated #s)

Sadads (talkcontribs)

I wrote a really good response, but accidentally clicked on a link and lost it all (ACH!). How about we schedule for 20-30 minutes sometime (I am based on the east coast, and my WMF is calendar is usually up to date).

Jkatz (WMF) (talkcontribs)

Oooh, I feel your pain. Re: chat--this week is wrecked, but I put something in for next week.

Jkatz (WMF) (talkcontribs)

BTW for public record: I reran the numbers and the click through rate for mobile is very high at 19% of anyone who sees it. The desktop (vector) numbers are much lower at 4%. In both cases, the sample size is very small (<200 'seen' events per day). The likelihood of someone reaching the bottom of the desktop page is much higher than on mobile, so that might have something to do with it. This tool may simply be better suited for mobile, where the users who see it are a little more dogged.

Jkatz (WMF) (talkcontribs)
Sadads (talkcontribs)

Super exciting: @Jkatz (WMF) A lot less emphasis on J.K. Rowling, and Kate Perry, and any number of other centralities!

Nice. I wonder if we can do anything to bring a little bit more relevance to how the community hand curates material. For example, @Harej has built a database of articles by WikiProject that might be able to help with pointing at the intersection of different curated groupings. I would imagine putting a bit more weight to categories might be useful as well -- of course I realize that might not be a priority at the moment. It also might be worth putting less scaling effect from the popularity (multiplying the scores by eachother will always return the really big and popular, whereas reducing the emphasis on that criteria by scaling it, to say .10 of the original value, would still give it weight, just not as much that it overwhelms other articles). ~~~~

Jkatz (WMF) (talkcontribs)

@Sadads Thanks good thoughts! I can't say that adding categories to the considerations is something we can prioritize right now, but that is an interesting idea, as is the notion of simply scaling back the coefficient(!) I think for now, the goal is to see how far this tweak gets us.

Reply to "J.K. Rowling recommended in (almost) every Novels article (or how the algorithm places too much emphasis on high profile articles)"