User:KBach-WMF/Collections/Conclusions

This document describes my thoughts and conclusions after working with collections as described in and looking through previous documentation initiatives as per. It has been compiled based on my notes, and follows their format to a certain extent. As a result, some crucial thoughts are interwoven in the broader narrative. I tried to highlight these and summarize them in the final section.

= Too long, didn't read = Documentation collections help us scope our work and focus on areas that need documentation improvements the most. They can also be helpful in presenting an overview of the knowledge landscape to readers.

There are many criteria for evaluating pages and collections. For an experienced technical writer, most of that evaluation happens automatically and concurrently, though it still can take significant time. I do not believe it is possible to create collection usage guidelines that distill technical writing expertise in a way that could be practical for non-writers.

= What is a collection? = Over the course of this research cycle I came across three distinct understandings of what a collection is:


 * Collection is a group of pages that are similarly maintained.
 * Collection is a set of documentation resources that cover the same subject. It should be possible to create a structure of these pages to ease navigation and improve discoverability of information in the collection.
 * A landing page and its sub-pages constitute a collection.

None of these is sufficiently comprehensive, so I wrote my own:

''Collection is a group of pages that can exist on its own, as a coherent unit. This may be due to how it’s written and maintained (by a specific group of people), how it describes a specific subject, or how it’s organized.'' This definition is too generic to be practically useful. This is the same for all other research results, guidelines, and recommendations from this research exercise: Any sufficiently broad description of terminology and processes related to documentation is too complicated to be useful for people who are not already familiar with technical writing practices.

= What is the purpose of a collection? = Depending on the material it covers and the process that produces it, documentation collection can serve multiple purposes:


 * Limiting the scope
 * Providing an overview of content
 * Providing an overview of content metadata

Limiting the scope
Many discussions around documentation in the community include sweeping statements about documentation as a whole. However, in the same way the Movement really consists of multiple communities, there is likely no such thing as “documentation”, only “documentation s ”.

There are projects that have or don’t have documentation. Some resources are stored on wikis, some in the form of static sites. Some projects look for more contributors, which informs the content of their docs, some (seemingly) do not. Some documentation is translated, some isn’t. In other words, it’s deceptive to talk about documentation as a whole, because the ecosystem contains many separate projects with their own agendas and objectives.

Working with specific collections, as opposed to documentation in general, will help focus our efforts and target initiatives. Limiting scope, for example by saying “let’s improve onboarding documentation for project X”, instead of “onboarding documentation is crap, let’s improve it”, makes documentation tasks more manageable.

Focusing on collections also helps retain agency. Instead of trying to delegate responsibility for improving readers’ understanding to some remote end of the documentation (i.e., it’s not my responsibility to describe this, let’s link to documentation maintained by someone else), we can focus on improving our collection by making the necessary changes here and now.

This has another effect - that of perceived ownership. It is hard for me to think of the entire documentation as my responsibility. But if I limit the scope to just a sub-group of pages, suddenly I feel like I can make a difference.

Providing an overview of content
MediaWiki does not provide tools for structuring information. This is its biggest downside as a documentation tool. Keeping an index, even if it’s separate from the content, should make it easier for newcomers to understand what content they might find helpful. A collection can become an unofficial, focused table of contents.

This is significantly better than relying only on inline links to other resources. Such links are distracting and provide no information as to whether the linked resource is more general, more important, or barely relevant for the subject a given page is describing. Being able to stick to a predetermined list of pages while learning new concepts will simplify the learning process.

This should also help maintainers understand what impact any changes to their project might have. A well designed structure, represented by a collection tree, should make it obvious and clear where to make documentation changes.

Providing an overview of content metadata
In the same way as we collect information on the documentation, we can also easily investigate metadata about the content. With a clear scope, we should be able to quickly evaluate collections and their pages, and make reasonable decisions about any potential changes. This should make documentation easier to maintain.

= What is not the purpose of a collection? = Collections are not a call to action. Collections do not, on their own, encourage people to step in and maintain documentation. If anything, they might have the opposite effect - the breadth of documentation to maintain might make people less likely to help.

Collections do not make it significantly easier to evaluate documentation. They mostly define what needs to be evaluated.

Collections do not make it easier to maintain documentation. Their only benefit here is in clear scoping.

= How to construct a collection? = There are multiple ways of constructing a collection. The method below describes constructing a collection based on structure and content information. Some collections will be team/group-oriented instead (i.e. documentation for a tool or set of tools maintained by a specific group of people). You can construct such a collection by interviewing group members.

Step 1: Landing page and sub-pages
Find the documentation landing page for a given subject and check whether it has sub-pages. That landing page and sub-pages might constitute the entire collection, or they might prove a good starting point for identifying the broader set of pages that cover the same subject.

Step 2: Category
Check whether pages discovered in the previous step (landing page and its sub-pages) belong to a specific category or set of categories. You can then investigate if these categories contain any additional pages, not identified in the previous step, that should be considered part of the collection.

Step 3: Linked pages and information hubs
Check if linked pages should also belong to the collection.

Pages that are already considered part of the collection most likely contain links to resources outside the collection. If you identify individual pages that should belong to the collection, you will have to consider whether it makes sense for them to remain where they are, or whether it's possible to store them together with other pages in the collection.

It is possible that you will find hubs of information related to your collection in different places (outside the existing structure of the collection). In such cases, you might want to consider all pages within that information hub (e.g. a separate landing page and its sub-pages) as part of the collection. This results in a slightly more distributed structure of the collection. Note: I don't think we need to think of collections as occupying a single location on wikis or any other service or portal. It is probably easier if that's the case, but might turn out to be impossible in the social environment we work in. Code (and to an extent documentation) architecture tends to reflect the structure of the environment that produced it. With a global, distributed community, we will probably never have a single source of truth on the most critical subjects.

= How to evaluate a collection? = Note: This is not a tutorial. This section describes my current process - a process that I am still figuring out. It might be completely wrong. You have been warned :) To evaluate a collection, we need different pieces of information about its contents. Specifically, we need information about the collection as a unit, as well as information about pages or resources within that collection.

Evaluating collections
To evaluate a collection, I need:


 * A complete list of collection pages.
 * Understanding of page structure - which content is at a higher, lower, or at the same level in the information structure (i.e. aspects of information architecture).
 * Understanding of what each single page is about and if there is overlap between pages. If there is overlap, you might want to evaluate whether this overlap is helpful or detrimental. Is it worth eliminating overlap for the sake of clarity and ease of maintenance?
 * Evaluation of all pages

Evaluating pages
This section describes the information I take into consideration when evaluating individual documentation pages.

A list of page edits

 * When was the last edit made? If it was long ago, is it because content can be considered stable, or abandoned?
 * How often do edits typically happen? Did the regular editors stop maintaining content for some reason?

This needs to be evaluated in context. When the project is stable, there is no longer any need to update its documentation. If a given project is actively developed, documentation might become outdated far quicker.

Number of page views over the last year (or other period)
There is no way to pick a general benchmark here. Some pages might receive traffic only as a result of bigger changes in the documented code, or at specific seasonal peaks.

Pages with close-to-zero numbers should be considered for archiving, especially if their content is volatile or outdated. I think the Movement is generally quite careful with archiving and deleting content, so this should not be very risky.

Pages with high numbers:


 * might benefit from being broken up, especially if they cover multiple subjects and receive high traffic as a result
 * or perhaps it is better to hold all that content (and high traffic) in one place?

List of Phabricator tasks related to a page

 * Last 20 tasks that mention the page and are active?
 * Might indicate high priority content to improve or work on.
 * Might indicate that content is difficult to change, manage, or fix and could require support from a technical writer.
 * Last 10 tasks that are closed? How were they closed? Tasks closed without a fix might indicate some problems in the project.

List of pages that link to and are linked from the page
This helps us to understand whether the page exists in isolation and is potentially self-contained, or perhaps requires additional resources to provide readers with the full picture.

Also consider if all linked pages still exist. This provides a simple estimation of how outdated the content might be. If a number of pages were removed or moved but this page was not updated, it's either not perceived as part of the collection, or just badly outdated for another reason.

From the content side
This section contains a non-exhaustive list of factors to consider when evaluating page content:


 * Descriptive title
 * Immediately clear what is and what is not in scope of the page (to a reasonable extent)
 * Immediately clear what format you can expect from the page
 * Concept description
 * Tutorial/How-to guide
 * FAQ
 * Reference material
 * Landing page
 * Introduction
 * A brief paragraph introducing the subject of the page and perhaps providing some basic navigation? Consider this an extension of the title. Is the introductory text visible in search results?
 * Meaningful sections
 * It is easy to know the contents of a given section
 * It is possible to infer content of the next and previous section based on the content of the current section
 * Reasonable progression of useful content
 * Examples or concept descriptions

Grading criteria
This section provides simple grading criteria for collections. I was curious to see if we can assign a single number to a collection to represent its overall reliability as a source of information.

Note that this is very much not an exact science. It's based on my personal preferences - specifically how strict I am on in my grading, and how comfortable I am in allowing these grades to, in turn, affect the other grades I assign. It's a feedback loop mechanism.

The grading system allows you to assign grades of 0-10 to specific categories. The total maximum of points that a collection can receive is 50, which divided by 10 neatly assigns the collection a grade between 0.0 and 5.0.

In a completely random system we could potentially expect averages around 5 for individual categories, and 2.5 for the total grade. However, in this case we can assume that content is written and designed for the purposes of fulfilling the grading criteria listed below. Nobody sets out to write incorrect, incomplete documentation. In result, I expect a well-written collection to score around 7 in every category, with 3.5 being the expectation for the total score.

Any score of 6 and below should be investigated, as should a total score below 3.5.

Completeness
Is there documentation that's missing?

Zero means no documentation is available.

Ten means all features and functionality is documented.

Correctness
Is the existing documentation correct and up to date? How many bugs are reported for the documentation? How many things are considered outdated?

Zero indicates everything is outdated or incorrect, ten means documentation is updated with new project releases.

Readability
Is the documentation easy to consume?

Zero indicates that documentation is written in a way that makes it very difficult to understand.

Ten indicates that documentation style facilitates understanding. Additionally, documentation is accessible to users of assistive technologies, older hardware, and less stable internet connections. Support for translations could also be a prerequisite for the maximum score.

At least part of this evaluation could be automated using some form of a grammar linter or similar tool.

Structure
Is it easy to find what you are looking for?

Zero indicates that documentation is disorganized, it's present in multiple different places, in multiple different formats, and there is seemingly no pattern, process, or reasonable structure behind it. Finding content is extremely difficult if not impossible.

Ten indicates that documentation is stored in a single structure that can be easily understood. If you were to add documentation, you would immediately know where to put it.

Activity
Zero indicates that the documentation has not changed in X years, it seems abandoned and is not actively developed, especially if the project it’s describing has changed in the same time.

Ten means that the project’s updates translate to documentation changes.

= Conclusions = Collections are an attempt to structure documentation in the Wikimedia ecosystem. They are a useful tool for documentarians - by clearly scoping our work and our evaluations of content we can better target our documentation efforts.

Collections will not directly or immediately grow our documentation maintainer community, this requires a more active outreach. That said, in the long run, we can expect collections to make onboarding easier, which might benefit the documentarian community as well.

Constructing and evaluating collections is a relatively complex and a largely subjective process. It requires time and conscious effort, and has many implications - from understandability and accessibility of the collected material, to its maintainability. I don’t think it’s possible for us to create materials that will make it easy for everyone to construct and evaluate collections. Any such materials would be too general or broad to be practically useful.

Documentation in general has many dependencies (e.g. development processes, release cycle, and the like) and produces many dependencies in turn (onboarding, software usage, translation processes, need for support). It is then no surprise that the only reasonable answer to any sufficiently general question about documentation will be “it depends”.