Quality assessment tools for Wikipedia readers
This is a summary of the discussion from the session "Quality of Article Content and the Reader Perspective (for Wikipedia)" (Wikisym 2010).
Participants: Cormac, Kasia, Andrew, Giota, Ariel, Guillaume
We started with these questions:
- What is the reader experience when visiting a Wikipedia article and trying to evaulate the article quality?
- Are readers good at deciding whether content on the Internet is trustworthy, useful, etc.?
- What means do they use to make these evaluations?
- How can we provide tools that help readers make these evaluations?
- How can we help readers to learn how to think critically about content on the Internet (and elsewhere)?
Short answers to the first three of these questions:
- The only indicators available to the reader on the page itself are: notices in case the article was featured on the front page (a large notice is displayed on the top of the page), or has been marked as "good" (a small green circle with a plus sign is displayed in the upper right of the page), and notices that the article may have neutrality isssues, not enough citations, etc. Some of these may be quite opaque in meaning to the layperson.
- In some contexts at least, readers are careful and critical thinkers. Two examples of such systems are TripAdvisor and Amazon, which include the prominent display of negative comments and rating other raters. When a reader has clear criteria in mind they can evaluate quite effectively.
- For article content evaluation, at least two common things that influence a reader are the authority of the institution that houses the content (for medical articles, is the content hosted by a known medical institution?), and the style and cohesiveness of the text itself (does it have a coherent narrative, is it well-structured with paragraphs, introductory material, proper spelling and punctuation?)
We quickly came to the conclusion that there is no one-tool-fits-all and that we should provide several different tools that readers may elect to use or not. We also decided that we wanted a pluggable framework for easy addition of new tools.
Some of the ideas we discussed:
- Annotation of article text by readers, with the possibility of the annotation being transformed into a comment that is added to the talk page
- Article rating by readers, according to custom criteria which might include factual accuracy, completeness, usefullness...
- A box for displaying various metrics of the reader's choosing: number of citations in the article compared to the average, number of edits in the last week, number of editors who have contributed (without reversions), etc.
- A "quality toolbox" which could include various tools for measuring article quality, as well as other information about the article: a tag cloud, a visual representation of the links to other information sources, etc.
- The article and the talk page content could be displayed side by side and to the extent that discussions on the talk page referenced specific text in the article, those connections could be made explicit in the display.
Social tools such as rating, voting, annotation are part of the Web 2.0 paradigm and are increasingly familiar to most Internet users. Information contained in semantic network diagrams and tag clouds is also easily absorbed by the average user, which makes these sorts of displays even more attractive to us for this use.
With the introduction of these tools, we have the opportunity to help folks to learn to evaluate general sorts of Internet content critically, and to take that learning with them into other domains.
Rough sketches illustrating some of the above ideas
This is a very loose transcript of the discussion for the session "Quality of Article Content and the Reader Perspective (for Wikipedia)". Nothing is an exact quote but the spirit of the main points of what was said should have been captured reasonably well. (C=Cormac, K=Kasia, AL=Andrew, P=Giota, A=Ariel, G=Guillaume)
A: What is the average reader perception of quality when they visit a Wikipedia article? They probably come from Google search, there are of course various templates (Good, Class A) but even if the reader knew what these meant, they are on the discussion page and not visible. The reader doesn't even know the discussion page exists.
G: We should make the discussion page more visible to the reader.
P: The reader won't look at all that text, they want something quick.
G: Do the readers care about the quality of the article? Anyways we would never tag it with something like a star or anything like that.
(Note that some Wikipedia articles are right now tagged as "Good" which displays a green circle with a plus sign in it in the upper right hand corner. This was done by a bot run on May of this year, from the list of articles marked as "good" using the same template on the talk page.)
A: How do you in this room make the determination that content on the Internet is trustworthy? What standards do you use?
K: It depends what institution is behind the article. For example, if you are looking at health care you would want to look at NHS.
P: Maybe Wikipedia articles are more helpful to the average reader than NHS pages? If you are looking for a general overview of a topic...
C: Wikipedia articles often contain too much information and detail for the layperson.
P: The level of the article is just too difficult to decipher; high school students get told to go to WikiHow or WikiAnswers for a short summary.
G: Should we try to separate readers into classes (i.e. high school, general layperson, etc) for purposes of talking about quality?
C: Chemistry articles for example are just not accessible to the layperson. They have so much specialized information in them.
G: The student doesn't realize that Wikipedia is unreliable; if they did they would be more interested in reading the discussion page then.
C: If we want to focus on literacy in a sense, on a deeper understanding, we would want people to look at the article history, we want to empower the reader to understand how the article was created. Is there controversy on the talk page? If not, and it's well-written, then we can assume it's probably a good article.
A: What do we mean by "well-written"?
C: Well I think we can recognize it when we see it. Some articles, they have one sentence here, another one there, there's no coherent narrative, maybe pieces are repeated in different parts of the article, no one has gone through the whole thing and unified it. A well-written article has cohesiveness, good structure with defined paragraphs, a flow.
P: Some articles are written in an essay style, while still neutral point of view, others are more traditionally encyclopedic. Some are too hyperlinked; maybe lay readers lose the plot?
A: This reminds me of the classic xkcd comic: http://xkcd.com/214/
C: Yeah, that's my experience with Wikipedia a lot of the time too.
P: We need a way to visually represent the links between articles and between areas of knowledge. A visual structure. Britannica did this, at least on DVD. For example, how this medical article connects to these other base disciplines... It's good to have "Further reading" and references, but how can the reader find other resources without some sort of catalog?
We want to look at these things from a navigation and epistemological point of view.
There are wikis that target specific audiences though, for example Wikikids (for children).
C: I think France has a Wikipedia for children.
G: vikidia.org, which has French and Spanish editions. It's a different project, not based on Wikipedia. I think the content is entirely separate.
P: There's the Encyclopedia of Life, crowdsourced, repurposing of Wikipedia content. They request help from specialized experts. It's a biodiversity archive, and the cooperate with flickr and national history museums and such. It's funded by the McArthur Foundation. They put observations of species on line. They filter and correct Wikipedia content as they re-use it.
So we could think about quality of content on a per-subject basis, working with specific communities, collecting expert input.
C: The more different groups with different expertise you have working on the same article the more accessible it will become. If there are too many experts...
G: I think there are two things we have to do, 1) make it possible for readers to realize that they are the ones that must evaluate the article content, and 2) provide them with assessment tools. We could have a quality toolbox. For example they could turn on or off WikiTrust, they could see how many revisions there were in the last week, and so on. No one tool would be enough.
P: There is Diigo, which lets people see what other people said about a page. There is also Cohere, which is an openuniversity web annotation tool. You could let people add a check mark or a star, there could be some sort of widget for that. The reviewing could be networked.
K: Suppose you are looking at a history article, about some king, born in a certain year. How do you as a reader ensure that this is correct? Even with citations people still make mistakes.
G: If you turn on WikiTrust, let's say you see that the article has been very stable over the last 3 years, it's probably good. If you see it's been stable over the last 6 months but the date of birth was updated yesterday that might put that information in doubt. Maybe we could have a collection of tools, a reader could choose their favorite one?
C: I like the suite of tools idea. How about a "rate this article" tool?
G: That's going in during the next year.
C: What does a mixed rating mean though? Is there missing information, was the article useless for some purpose?
G: Over the next year the tech department at WMF is going to be involved in more research, looking at what features to develop. The "rate this article" feature is important to the ED and to the Board. There is an existing extension but it may not be so viable. I think the usability folks Trevor, Nimish, Paru, maybe Howie? were looking at it. There is user research and testing going on now.
C: Rating doesn't give as rich a picture as the discussion page.
G: The toolbox could have, say, the two last discussion topics from the talk page.
P: We don't want to confuse the reader by putting too much stuff there, we want a simple clean interface.
G: Two main principles of usability are to hide anything irrelevant and to make things discoverable. If we put it in the toolbox but it's in there when someone looks at the contents, they can scroll and find it, then that meets both principles.
P: Maybe we could look at something flagged semantically in the article as controversial or check for key words on the talk page (e.g. "neutrality", "policy") and these could go in a little comment box.
C: Maybe we could enrich WikiTrust so the text would have a marker "this was useful to me in this specific context" or "this part was well written", the article could be annotated.
A: Tagging/annotating is a familiar paradigm for users now.
G: Comments added to the talk page could link to specific article text.
C: We could rethink the talk page so that it becomes an annotation layer of the article.
G: Sometimes you want to see all the comments together though.
For readability you want the width of the text to be narrow. But we also want to take advantage of bigger screens these days. We could have a second sidebar with the quality toolbox, or "what are people saying about this article" or "rate this article" or even talk page sections.
P: Here's Diigo working. It's a Firefox plugin, you can see that some people have written notes about the main page on Wikipedia. Here's Cohere, it gives a visual map of the links, you add them manually.
A: If lots of users click a link in an article we could generate an image in some automated way.
C: Talk pages have very rich and detailed discussions; that could overwhelm any sort of widget.
G: We could have flexible sections with discussion summaries, they could be collapsible, you could click something to remove this feature from your personalized version if you didn't want to see them.
G: The rate this article tool could let the user rate for whether the article was informative, had bad spelling, is complete, etc. It's in use in its current form on English Wikinews and the strategy wiki.
AL: So let's say the article has been rated and then there are a pile of edits so the article is in a different state; what happens to that rating?
A: Maybe Pending Changes (formerly Flagged Revisions) could be used to link ratings to specific article revisions.
G: We should have multiple plugins.
G: We would have to build those stats into MediaWiki (the software underlying Wikipedia) somehow.
AL: We could get 90% of stuff from the cache. I want a way to plug in different sort of statistics that would show in the same area on the page (say the upper right corner), some sort of api.
C: There ought to be a way to flag and find articles rated as good by other people, an alternative means of navigation.
AL: We could use an api to manipulate those bits of data.
A: What are some good rating systems in use right now?
C: Ebay or maybe Amazon?
AL: Yeah, Amazon has that "was this useful or not" question at the bottom.
C: Hmm, I guess Ebay lets you rate people as a good buyer or seller, that's somewhat different.
AL: Well there is Trip Adviser, which allows you to rate the rater.
G: Amazon's "Was this comment useful" is definitely good. A couple of weeks ago in the (WMF) office they were talking about LiquidThreads (MediaWiki extension that displays discussions on talk pages with automated indentation and allows answers to specific comments, moving "threads", i.e. pieces of the discussion, to different locations on the page, etc, much as forum discussions are displayed). They were thinking that in order to reduce the noise of discussions there could be a "+1" or a "this was useful" button.
AL: No one has done meta-moderation like Slashdot does (rating moderators).
A: We have to think about whether gaming these systems is possible.
AL: So one thing that turns out to be useful is displaying not only the most useful positive comments, but also the most useful negative comments, because otherwise there is a tendency for only the positive comments to bubble up to the top.
P: Trip Adviser does this too. They have really gained a solid reputation very rapidly, both among readers/travelers and among hotel operators.
A: These rating communities are a hopeful sign that readers can and will learn to be critical evaluators; in those communities the readers are savvy and effective raters.
AL: In those two circles, good work leads to more good work, a virtuous circle.
P: We need to get this into the schools. Think about this in terms of critical literacy projects. We would do controlled tests, experiments and get students to test the prototype, both high school students and university students. I think we could work with Hewitt on this. They are interested in "deep learning" and in helping people to learn to evaluate content critically.
C,G: We need to publish these notes and help the Wikimedia Foundation set priorities on this.
G: I think it will get adopted especially if we can show that it would have a laarge impact.
AL: The Parc folks have a Wikidashboard that goes across the top of the page, maybe they would be willing to release the code for that framework, you can plug in various things that get displayed, tag clouds etc.
A: Why don't we have tag clouds anyways? Right next to the article!
C: Another way annotation gets used is in rethinking the textbook, Sophie (software) allows markpu and annotation of textbooks, and these annotations can be personal and/or shareable.
AL: Why can't we have right next to the section edit link on the right of every section in the article, a "talk" link which would create a new section on the talk page with a built-in reference to the article text? You know one of the things that really gets me is when people put a tag on the article, it says "This part of the article is disputed, see the talk page for more information" and then when you look at the talk page there's nothing there!
G: We could have "report an issue" in the sidebar, and the issues could include copyright, factual errors, etc.
A: The factual error link could give people an option to directly edit the article.
AL: The dashboard could display how many factual errors have been reported, how many citations the article has, and other statistics.
G: When a user reported an issue this could get added to an article "task list" that anyone with the article on their watchlist could see, so for example the factual errors that need to be corrected...
C: So what are we going to call this thing? It has to have a name.
AL: Augmenting the Wikipedia Interface for Quality.
P: Augmenting the Reader's Voice.
C: Augmented Wikipedia Experience = AWE
(This was the winner :-D )
P,A: The broader use for this tool could be that we empower people from an early age to critically evaluate content, and wouldn't that be amazing.