Reading/Web/Projects/Related pages

Read more is a beta feature that aims to drive page views by engaging users by directing them to related content. The rationale is if readers are offered suggestions that are similar to the topic they are reading about, this will further engage their reading session time, it will further educate them about the topic they are looking for, and supports a richer reading experience for those who are just randomly browsing topics. The concept already exists on apps, where you can check the performance report.

The Problem
If a reader has reached to the bottom of the article, they might be looking to read more about the topic and surfacing articles that are similar might be exactly what they are looking for.

This has been released on apps and saw a 16% click-through (For users who saw it). Additionally, 25% of the users who saw read more results clicked through at least more than once.

The How
Using the Extension:RelatedArticles extension we will show suggested articles at the bottom of pages encouraging the user to read another page. A user who gets to the bottom of an article on either mobile or desktop web is shown a list of other articles that are related to the one they just finished. The notion is that if the reader has read to the end of the article, they might be looking to read more about the topic and surfacing articles that are similar might be exactly what they are looking for. This has been release on apps and saw a 16% click-through (For users who saw it). Additionally, 25% of the users who saw read more results clicked through at least 1x in a 1-day period (data).

FAQ
A more like query which will programmatically choose related pages. Here is an overview of the technology the service is based on: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html
 * Where do article suggestions come from?

The more-like query service ranks articles in order to show the top 3. This is subject to change, but as of Feb 24, 2016 (heavily quoting from: https://lists.wikimedia.org/pipermail/mobile-l/2016-February/010122.html:

2 types of score are combined: These 3 scores are multiplied together to achieve the final score. To use an example of a bad match:
 * A score that computes the similarity between documents, this can be fine-tuned[1]
 * A score (we call it "rescore") that use article metadata: 'boostlinks', 'boosttemplates'.
 * Boostlinks is a measure of how many articles link in, which is a mark of notability
 * boosttemplates are templates that tend to signal quality. Here is a link to the enwiki boost-templates (they vary from wiki to wiki):https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates


 * Article: A_Summer_Bird-Cage


 * The score for "I Know Why the Caged Bird Sings" with boost links is:
 * similarity: 0.3457441 (terms chosen: "from", "cage", "bird")
 * boostlinks: 2.807535
 * boost-templates: 2
 * total: 0.3457441 * 2.807535 * 2 => 1.9413773

The similarity score can be configured differently using these variables. The defaults are here, but this seems to be less of an issue as the dominance of popular articles in the results.

Some potential ways to improve the results:
 * Right now, the boost templates seem to favor quality/notability 2:1 over similarity. Given the concerns expressed here and elsewhere about popular results making little room for relevance or serendipity, this is an area for improvement.
 * A test was done to analyze the differnece when we remove inbound links as a factor and templates as a factor (both measures of popularity, and, to some extent, quality). Please review this list to see which results sets are better: Extension:RelatedArticles/CirrusSearchComparison#Hollywood Library (I,  Jkatz (WMF), think removing popularity makes a very big positive difference on relevancy, but also leads readers to more stubs.  I am for making the change).
 * Other ideas?

They can be overridden by editors using the.

Though this feature is not considered part of the article, editors can change the suggested articles given by adding up to 3 manually curated examples to this part of the page navigation.
 * Do editors have any control or are they given any preview of the suggested articles?







For example:

On https://en.wikipedia.org/wiki/Korur_language the related pages have been over-ridden to:







User:Jkatz (WMF) thinks that making them editable is a problem because: They have pretty pictures, are limited to 3, for greater simplicity (unlike see also, are only intended for users who have gotten to the bottom of the article without finding another link more appealing to visit). It is thought that the images and simplicty increase the click-through rate. Which is the metric of success for this feature.
 * the results are not automatically updated
 * it means that improvements we make to the algorithm/selection will be lost to pages where an editor has overridden the automated selection.
 * right now the manual selection option only applies to the web (not the apps), which is misleading
 * some editors have requested that we do automated refreshes every time you load the page--this would not be possible if the above keywords are used.
 * How is related articles different from the See also section, navigation boxes and the category system?

Initial Community Feedback
The following is an attempt to summarize the feedback collected on the talk page for this feature (as of Feb 25, 2016), along with responses to each one.

'Disambiguation pages are pretty bad. See the holocaust disambiguation page as an example: https://en.wikipedia.org/wiki/Holocaust_%28disambiguation%29 '

Response:
 * Luckily, the feature has no real utility on those pages. There is now a ticket to remove this:  https://phabricator.wikimedia.org/T127068 prior to any further roll-out.

On English Wikipedia, the use of non-free (fair use) article images for navigation goes against En Wikipedia policy. The click-through-rate metric is not air-tight and reader value is not certain.
 * Response:
 * This is being remedied by this task: https://phabricator.wikimedia.org/T124225 and is considered a blocker

Response:
 * It is true that click-through rates can be gamed. I (JKatz) submit that sustained click-through rates are indicative that readers are finding value, and placement at the bottom of the page, as well as some analysis of raw pagelogs above, suggest that cannibalization of blue links is not occurring.  However, enough others disagree that some additional work is necessary, at least prior desktop roll-out.  Here are some ideas:
 * Roll-out on some small % of anon-traffic. It is not entirely fair to test only with logged-in users a feature that is intended primarily for users who are not logged in.
 * Then, a/b test and look at overall pageviews (for those of you who find session depth to be a bad metric: please suggest a better one)
 * The, use the quick-survey feature to ask users who click on 'related pages' if they are satisfied with the results or to get feedback

This is the same as see-also and does not add additional value

Response:
 * The sustained high click through rate on mobile suggests that users are finding it valuable enough to warrant promotion. On desktop this is not yet the case--though the click-rate still exceeds any given link. (see above about obtaining better measures of reader value).
 * The best thing to do would be to combine with see also, but this is thorny and a lot of technical work. I think we should be assessing whether this is a net benefit to Wikipedia or net negative, not whether or not is a perfect solution.

Sometimes pages return anywhere from suboptimal to damaging results

Response:
 * for 'damaging' results, which are rare, editors can over-ride results. I was against mixing algorithm and editing, but the value here is now clear.
 * for 'suboptimal results', for now I think we should live with it because the readers seem to be deriving value (see below on underlined portion)

Algorithms are non-NPOV and go against our ethos OR algorithms should be adapted to community norms

Response:
 * In the FAQ, we have pasted put a link to the code somewhere and describe the variables/logic?  Ultimately, a solution might be to allow each community to tweak the weighting to their needs, but I think that is not warranted by the value of this feature at this point.  How do you feel about transparency with regard to the algorithm logic as a solution?
 * we might want to denote that they are machine-driven somewhere to clarify. Is this a blocker?

In German and maybe Russian, the tool may violate a principle against 'Themenring' policy

Response:
 * I see no reason why we should push on these wikis if they don't want it, but have asked community members to poll their wikis and find out if the concerns are representative

The community was not consulted first and we would have helped you avoid major pitfalls.

Response:
 * Unfortunately, yes. We thought since this worked on apps it was a no-brainer, but we clearly misjudged.  Given that we obviously made a mistake here, and are making an attempt to improve moving forward both with this feature and others, we hope that you can help us move forward together.

Success criteria

 * %CTR (click through rate, clicks/views) is higher than 10%.
 * Ideally, we will be able to ensure that these clicks are not-cannibalistic (increasing overall clicks as opposed to taking clicks from blue links)

Prototyping
This will be tested in beta first.

MVP
A user reaching the bottom of an article is shown the title and lead image for 3 articles that relate to the article they just finished. We are able to measure the engagement clicks/impressions with this feature.

User Stories
A user reaching the bottom of an article is shown the title and lead image for 3 articles that relate to the article they just finished so that they can continue reading about that topic. Someone editing a Wikipedia article can manually change the article suggestions using wikitext so that they can correct any erroneous or sub-optimal suggestions. A project stakeholder (such as PM or data analysis) is able to measure the engagement clicks/impressions with this feature so they can determine if it is adding user value.

Metrics Implementation
We will want to track:
 * Impressions of read more suggestions (the lump--not each item suggested)
 * Clicks to read a suggested article
 * The position of article (1,2,3). Ideally: whether or not article was edited manually
 * Ideally: a)overall referrals from a page with read more, b)overall referrals from same page without read more

Timeline Estimate
(this has been updated on 2/24/16 to reflect the actual order to-date)
 * 1)     build read more according to mobile web specs ✅
 * 2)     launch on mobile web beta ✅
 * 3)     launch on desktop web beta✅
 * 4)     measure impact using event logs CTR (referral data won't be helpful at these numbers) ✅
 * 5)     discuss with community
 * 6)     launch on mobile
 * 7)     launch on desktop (open discussion about whether we launch in beta first and/or do progressive rollout)
 * 8)     Delivery Estimate: end of Q3 on mobile across wikis...desktop is too much of a question mark as of 2-24-16.

Join the conversation
This task is being tracked on Phabricator and we'd love to hear your thoughts.

Event logging
As measured for the first two weeks of 2016 on a sampled basis, related articles items were frequently tapped on mobile web beta Wikipedias (the skin "minerva-beta" in the table below) - 19.7% of the time when seen. The relatively high tap rate on the mobile web beta is believed to be attributable to the interesting content, as well as in part automatically collapsed sections in its phone form factor layout yielding a higher impact visually to the related articles at the end of the article.

Interestingly, while users on the desktop web Wikipedias (see "vector" in data below) were likely to see the related articles panel (35% ready-to-seen event ratio versus mobile web's 27.3% ratio), desktop users evidently were far less likely to click on a related article - they apparently clicked only 3.4% of the time when seeing the related articles panel.

event_skin	event_eventName	count(*) cologneblue	ready	3 minerva-beta	clicked	217 minerva-beta	ready	4025 minerva-beta	seen	1099 modern	ready	114 modern	seen	62 monobook	clicked	1 monobook	ready	270 monobook	seen	82 vector	clicked	48 vector	ready	4053 vector	seen	1429

The relatively lower click rate on the desktop web is believed to be attributable to the richly laid out expanded sections yielding a relatively lower visual impact for the related articles. It may also be related to the mechanics of beta opt-in: on the mobile web beta opt-in enables all beta features, whereas on the desktop web users are able to opt-in to specific beta features. It should be noted that some users on desktop web also enable the feature to auto-enroll on all beta features.

Tablet devices by default are presented the mobile website, which commonly expands sections, but still has a relatively lighter weight layout. But users have the ability to change to desktop mode, where again sections are expanded but there's a relatively heavier weight layout.

Note that for a confined set of tablet devices, although it's an imperfect analysis and there are a variety of potentially impacting factors that could muddle the analysis, the data suggest that we can at least say with some level of confidence that


 * mobile web beta treatment dramatically outperformed the desktop treatment
 * although users with a confined set of tablets on mobile web beta had not insignificant engagement with related articles, their rate of engagement was somewhat lower than the fuller mobile web beta population (confined tablet population's engagement at 16.7% versus full mobile web beta population's engagement of 19.7% for a two week period and 13.2% versus 18.4% for the full period since it was introduced on mobile web beta) - but this was still quite promising on mobile web beta

minerva-beta	clicked	11 minerva-beta	ready	262 minerva-beta	seen	66 monobook	ready	1 monobook	seen	1 vector	ready	37 vector	seen	9

All events and skins since related articles available in beta for confined set of tablets...

minerva-beta	clicked	39 minerva-beta	ready	1485 minerva-beta	seen	295 monobook	ready	2 monobook	seen	1 vector	ready	70 vector	seen	21

All events and skins since related articles available in beta for all UAs...

cologneblue	ready	9 cologneblue	seen	2 minerva-beta	clicked	768 minerva-beta	ready	19009 minerva-beta	seen	4170 modern	ready	185 modern	seen	105 monobook	clicked	4 monobook	ready	535 monobook	seen	171 vector	clicked	110 vector	ready	8005 vector	seen	2848

Do note that a CTA experiment to heighten the general enrollment in mobile web beta was running in the latter part of 2015 but was disabled, hence relatively lower event counts for the two week period observed at the start of January 2016. Engagement with the related articles was relatively high with both the CTA in force and has continued to be relatively high (albeit slightly lower) since the CTA was removed. See the data above. Note also that a small experiment with collapsed sections has been running, but should have a negligible impact on the analysis in this page.

HTTP Referer header analysis
One additional signal in the environment suggesting the efficacy of the mobile web beta related articles feature is the share of internally referred pageviews as a percentage of total pagevews. Again, caveats could apply, but it appears that prior to the related articles feature, mobile web beta internally referred pageviews were in the low 50s percentagewise, whereas with the feature they are now in the high 60s percentagewise (the stable channel where the feature was not implemented stayed relatively consistent around 25%). However, this is only a signal; partial rollout in the stable channel of the mobile web would be more telling.

What follows are relative internally referred pageviews rates on the mobile web for a set of Thursdays - two Thursdays in the first two weeks of January, and two Thursdays prior to the related articles feature being released in beta on the mobile web and desktop web. Dates coinciding with holidays or known high rate fundraising campaign banners are not included, as they would complicate the analysis.

Thursday, January 14, 2016
Note on this query: feature running in mobile web beta Wikipedias.

beta: 48450/(48450+14958+8465) = 67.4% stable: 58400028/(58400028+56362895+113239899) = 25.6%

external	NULL	113239899 external	b	8465 unknown	NULL	56362895 unknown	b	14958 internal	NULL	58400028 internal	b	48450

Thursday, January 7, 2016
Note on this query: feature running in mobile web beta Wikipedias.

beta: 51732/(51732+14540+8137) = 69.5% stable: 59881357/(59881357+55286651+109964317) = 26.6%

external	NULL	109964317 external	b	8137 unknown	NULL	55286651 unknown	b	14540 internal	NULL	59881357 internal	b	51732

Thursday, December 3, 2015
Note on this query: feature was not yet running in mobile web beta Wikipedias.

beta: 184067/(184067+55814+119778) = 51.1% stable: 50273872/(50273872+51372508+102363512) = 24.6%

external	NULL	102363512 external	b	119778 unknown	NULL	51372508 unknown	b	55814 internal	NULL	50273872 internal	b	184067

Thursday, November 19, 2015
Note on this query: feature was not yet running in mobile web beta Wikipedias.

beta = 189359/(189359+58549+121871) = 51.2% stable = 51119529/(51119529+54209700+105205731) = 24.3%

external	NULL	105205731 external	b	121871 unknown	NULL	54209700 unknown	b	58549 internal	NULL	51119529 internal	b	189359

Proposal for moving forward
In light of the feedback noted above, I would like to propose the following next steps for this feature:
 * 1) Publish variables and code that impact article selection (see FAQ) ✅
 * 2) Fix non-free image issue via: https://phabricator.wikimedia.org/T124225 ✅
 * 3) Work on https://phabricator.wikimedia.org/T127068 to remove from disambiguation pages
 * 4) Rollout on mobile web on select wikis (it seems like stakeholders are less concerned with the experience here, and the 'performance' is much higher)
 * 5) Rollout small traffic a/b test on desktop on a few wikis and either measure session depth change or use quick survey (the latter is more difficult)
 * 6) Use results of 3 to inform possible desktop rollout (Hear back from De and Ru Wikipedias as to whether or not a rollout is desired, given 'Themenring' policy)

Any thoughts, suggestions? I will post on the talk page as well here: Topic:Sz30fah2s3enrql4

On June 15th 2016 21:00 local time the German speaking Wikipedia will most likely reject this feature with a large mayority. Sorry for all the money spend. --Eingangskontrolle (talk) 16:12, 4 June 2016 (UTC)

Improvements/Feature requests

 * enable close button on individual items to alert editors (or tweak algorithm) when a particularly recommendation is not helpful
 * merge with see-also
 * Recast as a "Find Similar Articles" button with a long list of search-style results

[open call to add more here (please sign)]