Reading/Web/Accessibility for reading/Repository/Readability research report

The Web team is considering improvements to the readability of the Wikimedia projects. This is in response to the team's 2023 goal to "Ensure a quality reading experience for all users by adapting the default experience for 15% of pageviews, based on the individual needs and constraints of the user."

Tl;dr
A review of the Human Computer Interaction (HCI) and design discourse on web-based readability suggests four changes. These will improve the readability of the Wikimedia content:
 * 1) Make the default font size bigger (but not too big) to improve readability
 * 2) Increase the information density of articles to improve scanning
 * 3) Increase the space between paragraphs and article sections to improve scanning
 * 4) Allow readers to customize the density of their reading experience

Reading Wikimedia wikis
In a year, people collectively spend hundreds of millennia reading the Wikimedia wikis. We estimate that humanity spent about 672,349 years reading all Wikipedia language versions from November 2017 through October 2018. Wikipedia alone got 25 billion page hits in July 2023. Its readership is in the neighbourhood of one billion people per day. To have a full picture, we need to take into account the content of Wikipedia's sister projects.

Casual readers
So what are those billion-ish people doing? They don't come to us looking to explore, necessarily. They come with a specific task in mind.

Most readers are reading Wikipedia for less than a minute. The median reading time is 25 seconds and the 75th percentile is 75.1 seconds. This is consistent with the Reading Wikipedia finding that people are often scanning for specific pieces of information. In a survey of 2500 people in Nigeria and Kenya, readers tend to skim or scan articles for particular information about current events, specific people, or school assignments. Readers tend to skim or scan Wikipedia to find specific information or to get the general gist of something. For these casual readers who make up the vast majority of readers on Wikipedia, the wikis are places to begin a search for information quickly, not the end destination for in-depth, line-by-line reading.

There are exceptions to this reading behaviour where people read things more in-depth for intrinsic learning. In the context of about a billion readers every day, the people reading in this way still represent a significant group. One study found that readers in low-Human Development Index countries are more likely to read for longer times on a desktop computer compared to Global North readers. They are more likely to be reading in an educational context and for intrinsic learning rather than fact-checking.

'''Readers who quickly scan for particular information for a specific need are the primary audience for the reading accessibility improvements we will suggest. However, we need to consider long-form readers, too.'''

Current readability
In a study of new and casual English readers, we received conflicting feedback on the current readability of Wikipedia. Many participants said that they find the current fonts comfortable to read, "minimal", and fine. In other portions of the study, participants complain about too much information density and overwhelming "congestion." This suggests that, while readers like the typeface of Wikipedia, changes to article typography might improve readability. In order to create an approach to improving readability for casual readers, Web team's designer reviewed the current literature on readability for web interfaces. These articles tended to fall into 4 categories:
 * 1) General research about reading on the web
 * 2) Design and typography discourse
 * 3) Human-computer interaction research
 * 4) Considerations for non-latin character sets

General research about reading on the web
Typography on the web is really about ergonomics for our eyes and brains. It's about the ways human anatomy and neurology process information. In the context of readability for Wikipedia, we are concerned more for the performance or ergonomics of content rather than aesthetics (although beautiful type and readable type are not mutually exclusive). With that in mind, we reviewed literature about the ways our eyes and brains process text on the web to understand the basic mechanics behind readability.

Our eyes do not follow lines of text in a linear way. Instead, they make tiny jumps called saccades as they move from across a line of text. These jumps happen multiple times per second. The eyes sometimes pause on a particular set of characters for a fraction of a second before moving on. This is called a "fixation." At the end of a line of text, the eyes will perform a "return sweep" back to the beginning of the next line. While our eyes move over the text, they read the text's texture by processing whitespace around letterforms and blocks of text, as well as the shapes of characters, words, and groups of words. Our brain processes all of that visual information into words that we understand. The goal of typography is to make the text as effective as possible by helping our eyes and brains carry out the complex choreography of reading.



Much like the Wikipedia readers we studied, eye tracking studies have found that general readers on the web tend to scan text rather than reading it in a linear way. Even if they want to go through an article in detail, they scan past certain elements and come back after reading other things. When researchers track a reader's eye movements down a large page of text while they are trying to skim an article quickly, they studies have observed a pattern called a "spotted pattern". When a reader is reading in a spotted scanning way, their eyes sweep through large sections of text looking for bolded words, dates, links, and other things that stand out in an otherwise undifferentiated body of text. The cognitive strategy behind this reading pattern is an assumption that the signal-noise ratio in the highlighted text will be higher than in the text as a whole. In our New Readers research, readers expressed frustration when articles had too many links. We might hypothesize that too many links disrupt the spotted scanning behaviour by reducing the signal-noise ratio of links versus body text.



This reading behaviour benefits from denser typography. Denser texts show more information at one time, which, assuming it is well structured, will help spotted scanning by reducing the distance between highlighted points of interest and moving important actions closer together. In response to this insight, the latest iteration of Google's Material Design Guidelines includes a component density variable and instructions to increase interface density when it would improve a person's experience.

Eye tracking studies also found that when readers want to get the general gist of an article rather than specific fact, they use a "layer cake" scanning pattern. Unlike the F-pattern, which is the standard in-depth reading pattern, the layer cake is a pattern where readers scan the section headers quickly down a page in order to glean the salient details of an article before diving deeper into one particular section to find what they're looking for. In the layer cake pattern, a reader's eyes jump between headings and sections, which makes this pattern a more efficient way to get through a large amount of content than reading every line carefully.

Purposeful scanning behaviours like the layer cake pattern benefit from clearer contrast between sections, paragraphs, and other blocks of text. If each paragraph and section are undifferentiated to the reader's eyes, their eyes have to work harder to distinguish between different sections as they scan. On the other hand, if each block of text is distinct from other blocks, the reader's eyes have clearer signposts to aim for as they sweep through the text.

Typography discourse
The designer-oriented discourse on web typography available from sites like Smashing Magazine and Medium is not very relevant to the task of improving readability of Wikipedia. Most of the articles assume that designers want to optimize an in-depth editorial reading experience of long-form text articles, rather than the quick, task-oriented, contextualized skimming behaviours Wikipedia's readers are most likely to employ. These articles tend to root themselves in typographic traditions based on physical books. Some articles cited best practices employed by book typography from over 100 years ago. Heuristics developed for the engaged reading of physical books without explicit connection to screen-based reading research may provide some broad guidelines for successful typography in general, but they are not ideal sources for optimizing the readability of Wikipedia.



Despite their shortcomings, my reading of general design texts did contain one insight that we can use to extrapolate relevant information for Wikipedia. One article quotes the type designer Zuzana Licko as saying, "Readers read best what they read most." The quote suggests that the more one reads a particular type of text in a particular context, the better one becomes at reading it because our eyes and brains become accustomed to that context and learn shortcuts for processing that kind of information. This has an important implication for the communities of active Wikipedia readers. Prolific readers and editors of Wikipedia articles will have unknowingly trained their eyes and brains to process Wikipedia's current typographic style and are more effective readers of the current Wikipedia than casual readers. However, this learned efficiency comes with a drawback in the form of negative transfer, a psychological effect whereby it becomes harder to learn something that is similar to something you already know how to do well. For example, a professional tennis player would have a harder time at first learning to play ping pong than someone who has never picked up a racquet before. This is also why it is harder to read other people's handwriting. Negative transfer happens because our brains and bodies need to un-train themselves from a familiar pattern in order to learn a new pattern that is similar to the old one. If this insight holds true, any change to Wikipedia's typography might have a temporary negative impact on readability for active readers, even if it improves readability for the billions of others whom this work is intended to benefit. The good news is that negative transfer effects do not last for long. Effects like this suggest that allowing Wikipedia readers to customize their reading experience could improve readability for more people.

Human Computer Interaction (HCI) Literature
In addition to the scanning and reading patterns above, HCI studies try to find quantitative measures for readability. Often, these measures use try to gauge reading efficiency by using eye tracking equipment to measure the number and length of fixations that participants' eyes have as they read a text, or measure the length of time a participant needs to read a text. These studies tend to take that speed or fixation measurement and compare it to a comprehension score based on a set of questions about the text that participants answer after reading the text. The researchers use these data to create quantitative readability scores.

Several studies found that font size had some impact on readability. They seem to have a consensus that, for readability, bigger fonts are better. For example, in one study, all readability measures, including number of fixations, subjective comprehension, and measured comprehension improved as the fonts got bigger. Most studies found that around 18pt (24px) is the point of diminishing returns where increasing font size stops improving readability. 24px is significantly bigger than the current default font size on Wikipedia (14px on desktop browsers and 16px on mobile browsers). Several studies used Wikipedia articles as their testing data and all of them suggested a font size increase in our core article reading experiences. One study went so far as to name its resulting publication "Make It Big! The Effect of Font Size and Line Spacing ‘Ease of comprehension’ on Online Readability". Another was called "Size Matters (Spacing not): 18 Points for a Dyslexic-friendly Wikipedia." The scientific consensus from the HCI literature suggests that Wikipedia's font size should increase significantly.

The impact of line height on readability was less universal. One study found that more vertical space between lines of text helps the eye trace out smaller letterforms when fonts are relatively small. Another found that bigger text needs less relative line height (up until a certain point). Our eyeballs need less breathing room when the letterforms are bigger and less fiddly. In a different study, line height did not have a significant impact on fixation duration, but it had a major effect on comprehension, namely that 0.8 line height had a much better comprehension performance than 1.8. In this study, denser line heights lead to better comprehension. These findings suggest that an increase in font size may allow us to increase the information density of our articles by reducing the relative line height of the text without negatively impacting readability. Such an intervention would make articles denser and improve reading comprehension.

There were also several findings in studies explicitly geared towards readers with accessibility needs, especially people with dyslexia. In these studies, researchers also recommend relatively large font sizes. One study made a minimum font size recommendation of 12-14pt which is 16-19px. This study also found that line heights should be 1.5pt relative to the font size, and that the space between paragraphs should be minimum 1.5x the height between lines of text. The 12 readability guidelines put forth by Miniukovich et. al. in this study outperformed WCAG's AA accessibility criteria in readability assessments. Another interesting finding from this study was that The 12 readability guidelines they suggest follow a universal design pattern. While all but 2 of the 12 guidelines improved either dyslexia readability experiences or non-dyslexia readability, none of them harmed the reading experiences of the folks that they didn't explicitly help. Improving readability for readers with accessibility needs will not detract from the reading experience of readers without accessibility needs.

While these quantitative, lab-based studies all recommend relatively large font sizes for text-heavy interfaces like Wikipedia, I do not think we should adopt their recommendations wholesale. Eye tracking studies use sample sizes that are orders of magnitude smaller than the reader numbers we would expect to deal with. Very few of these eye tracking studies deal with sample sizes over 100. Almost none of them deal with sample sizes over 500. "How People Read Online: New and Old Findings" has 750 participants, but it took them over 2 decades to conduct the study. As I mentioned before, Wikipedia sees around a billion visitors per day. The scale that we deal with is just fundamentally different from the sample sizes in these studies. Further, the types of reading tasks measured by most of these studies are in-depth, engaged reading tasks, which are very different from the scanning reading behaviour I discuss above. The diversity of people, devices, languages, and contexts that Wikipedia serves are more complex, and findings from these studies may not map onto our uniquely global scale in a perfect way. That being said, we can take these studies and the guidelines they suggest as directional data in our efforts to improve article readability.

There are a number of other tensions in the research that are worth naming in an explicit way. The point at which increasing font size stops improving readability performance is very big, almost double the pixel size of our current baseline. Increasing font sizes to that extent could erode trust in Wikipedia by changing its timeless look and feel too radically. It would also make the aesthetics of our interface subjectively less pleasing. More importantly, increasing font sizes to such a degree would make scanning more difficult. The bigger the text, the less information on a page, and scanning reading behaviours benefit from greater information density. People scanning articles would also have to adopt much more extreme scrolling patterns to compensate for the loss of information density. This suggests a tradeoff between readability and scannability, and we will need to find a middle ground.

This tension also makes a good case for customization features that allow readers to choose how dense they would like their reading experience to be. This mirrors the findings from several studies which suggest that readability improves most when it is customized on an individual basis. However, customization features would only benefit engaged readers. The quick scanning readers who spend less than a minute on an article will not benefit from customization because their individual reading sessions tend to be too short and task oriented to find and change reading settings.

Design implications
The HCI and design literature on web-based readability suggest four goals for our approach to improving the article reading experience on Wikipedia:
 * 1) Make default text sizes bigger
 * 2) Maintain enough density for efficient scanning
 * 3) Increase the space between blocks of text to create better signposts for scanning
 * 4) Allow readers to customize the font size and density of their reading experience

Next steps and future research
The vast majority of readability research has been conducted on latin character sets. The few studies on non-latin character sets that I found suggest that reading patterns like the f-pattern and the layer cake pattern are universal across character sets. For example, in right-to-left languages like Arabic, the f-pattern was mirrored. That being said, expectations about information density and font size vary wildly across languages and character sets. To address the diversity of our readership, we plan to work with wiki communities to optimize typography defaults for their unique readerships. We would also like to measure the impact of our interventions on the readability of different wikis in a way that better reflects the realities of reading Wikipedia.