Jump to content

Documentation/Tools/Metrics generator

From mediawiki.org

This page explains how to use the Technical Documentation Metrics Generator tool to generate specialized metrics for lists of pages on mediawiki.org.

Basic functionality

[edit]

This tool is intended for use by anyone who wants to understand and improve the quality of technical documentation pages on mediawiki.org. The tool combines raw data from multiple sources to calculate scores for two high-level metrics that are indicators of technical documentation quality:

In addition to those metrics, the tool outputs some of the raw data that contributed to the scores, to help you identify potential documentation improvements.

For more technical and implementation details, see the app source code and metrics prototype documentation.

What the tool does not do

[edit]

The tool does not:

  • Work with other wikis: This tool only generates metrics for documentation published on mediawiki.org (see phab:T398049).
  • Accurately measure landing page quality: The metrics are most relevant and useful for content pages. They generally don't accurately measure the quality of landing pages, or pages focused on navigation. The metrics may have mixed utility for analyzing reference pages.
  • Accurately measure non-English content: The tool is not meant for use with pages in languages other than English. While it can generate scores for pages in any language, they will likely be innaccurate, because the metrics calculations use benchmarks from pages written in English.
  • Assess non-technical documentation: The metrics are designed for technical content, and based on quality criteria specific to technical documentation.
  • Store results: The tool doesn't store metrics results, nor provide any way to cache or save the output.

Generate metrics

[edit]

Pick your tool format

[edit]

The metrics generator is available in two formats:

  • Web app - input pages and view results in your web browser. No prerequisites for use, no account required.
  • PAWS notebook - fork the notebook, then run in your web browser. Requires a Wikimedia account to login.

Choose pages to analyze

[edit]
Warning Warning: The tool may be extremely slow (or fail) if you input more than 25 pages at a time, because it combines page-level data from multiple APIs.

Input the list of mediawiki.org page titles you want to analyze. Only input the page title, not the full URL.

How to format page input
Example page Correct input format Incorrect input format Why
API:Main_page API:Main_page https://www.mediawiki.org/wiki/API:Main_page Enter only page title, not full URL.
New_Developers New_Developers New Developers Must include underscore: use URL-formatted title, not display title.
New_Developers New_Developers New_developers Redirect pages are not supported (the lower-case "d" in "developers" points to a redirect page).
New_Developers New_Developers Desarrolladores nuevos or New_Developers/es Translated pages are not supported.

After you list your pages, click the 'Get metrics' button in the web app. Your metrics results display on a new page. If you're using PAWS, run all the cells in the notebook. The metrics results display as a table at bottom of the notebook, below all the code cells.

Interpret and use the metrics output

[edit]

This section explains the metrics outputs and how you can use them to improve technical documentation.

The tool displays metrics results as a table with highlighted scores for the two high-level metrics: succinctness and developer relevance. After each of those score columns, the results table displays additional data to help you understand what contributed to the scores:

Succinct score uses: Developer relevance score uses:
  • Length in bytes
  • Section to length ratio
Technical content score:
  • Links to code
  • Code samples
Popularity score:
  • Incoming links
  • Vising watcher percent
  • More than 1 edit in past 6 months

To get started improving the docs, try looking for pages that scored well for one metric, but poorly for another. Cells shaded darker in the results table indicate a worse score, so you should focus on those first.

Understand succinctness score

[edit]

The succinctness metric is a strong indicator of documentation usability. It reflects how easy pages are to skim, and whether they avoid walls of text. Pages that are succinct have an appropriate amount of content for their document type and audience, and they structure content to minimize the reader's cognitive burden.

Rating What it means
Needs improvement The page is probably too long, or it may be too short to be its own page. This depends on the type of page. If it's a long content page, it may need more structure to improve readability.
May need work The page may be too long, or it may be too short. It may need more structure to improve readability.
Good The page could still be improved, but is probably a reasonable length and/or has a good amount of structure.
Great The page length is just right, and/or the page is well-structured.

Use succinctness data

[edit]

For pages that have the lowest scores for the succinctness metric, look at the values for "length in bytes" and "section to length ratio". Use the following chart to identify actions you might take based on the values:

Data column Value How to interpret and use the data
Length in bytes Less than 2500 If the page is a content page (and not a landing page), it may be too short to be its own page. Try to verify if the page is still current. If so, could the information be combined with other pages? Is there missing information that should be on this page?
Length in bytes Between 2500 and 7500 If the page is a content page, it is probably a reasonable length. Check the section to length ratio to assess if it has enough structure.
Length in bytes Over 7500 If the page is a content page (and not a reference doc), it is probably too long. Look for ways to shorten it.
Section to length ratio Less than 200 If the page is a content page (and not a landing page), it may be too short, or have too much structure for the amount of content on the page. Check the page length value.
Section to length ratio 200-2000 If the page is a content page, it probably has a reasonable amount of structure.
Section to length ratio Over 2000 The page may not have enough structure for the amount of content being presented. Look for ways to improve page structure.
Shorten a page
[edit]

To help you make tech docs shorter, try to find and remove these types of content:

  • Information that is "nice to know" but not absolutely necessary: Consider the type of document and its intended audience: what is the minimum amount of information the audience needs to get from this page? Use documentation templates to get ideas for what content to include in a given type of document.
  • Overwhelming lists of links: Check the number of links in the "See Also" or "Additional information" sections, if they are present. Are all the links really useful, or are they linking to duplicate, outdated, or unnecessary information? Try to limit links in these sections to 3-8 of the most relevant and related resources. Remove links to archived or obsolete pages, or put them in a separate section that is clearly labeled "Historical" or "Background information".
  • Duplicate information in prose and code samples: If the page contains code samples, is any of the information in the code samples duplicated in the text? If so, could one or the other be removed? Check the length of code samples: Do they include non-essential information that makes them longer than necessary? Could you make them more concise or break them into smaller sections?
  • Duplicate information on related pages: Use Special:WhatLinksHere to find pages that link to your page, and also follow the links your page provides to other pages. On those incoming and outgoing linked pages: look for sections that cover the same content. Then, attempt to determine which page location makes the most sense for the information. Consolidate information about the same topic or process in one place, and replace duplicated information with cross-references or transcluded content instead.
Improve page structure
[edit]

To align your page structure with best practices, check the following:

  • Quantity and depth of headings:
    • Could any sections benefit from more (or fewer) subheadings? Generally, you should aim for at least one heading per 15 sentences.
    • Is the page structure too deep? Generally headings should not go below ==== Heading 4 ==== (or "Subheading 2" in VisualEditor).
  • Avoid large, undifferentiated sections of text:
    • Look for opportunities to use formatting or templates to present different types of information in a way that also provides visual differentiation and breaks up the content on the page.
    • If the page is a landing page or navigation-focused page: use layout templates to create meaningful groupings of related links.


Understand developer relevance score

[edit]

The developer relevance metric captures whether technical documents include information relevant for developer audiences. The score is a strong indicator of relevance, and a mild indicator of findability.

This metric combines popularity signals with technical content signals focusing on links to code repositories, and the presence of code samples. By combining these two types of signals, this metric seeks to provide a more balanced measure of relevance than the individual inputs provide separately.

For pages with a low overall developer relevance score:

  • Look at the technical content score and popularity score for the page: is the low overall score coming entirely from one of those scores?
  • Consult the table and sections below for more specific guidance for each of the raw data values included in the technical content and popularity scores.
Rating What it means
Needs improvement The page scored very low on either or both of the technical content or popularity scores. If this is a content page, check if it's still valid. Check if it needs more technical content, or if you can improve its findability.
May need work Check the technical content and popularity scores to see which scored lowest, then look for ways to improve either the content, findability, or both.
Good The page may have scored really well in technical content or popularity, or it may have mid-range scores for both of those areas. For content pages: check which scored lowest and look for improvements to make in that area.
Great The page scored well for both technical content and popularity. The page is probably findable, relatively up-to-date, and contains code samples and/or links to code.

Interpret technical content scores

[edit]

The technical content score attempts to capture how relevant a page is likely to be for technical audiences, based soley on its content. The input data comes from a simple parsing of the wikitext to look for links to code repositories and/or code samples. This is a very noisy and inaccurate way to check for technical content, but it was the best we could do with the resources and data available. To learn about other data signals we considered, see the v0 metrics Assessment.

Data column Value How to interpret and use the data
Links to code 1 or more
  • The value is a total of the links to GitLab, Gerrit, or Github on the page.
  • Pages with very high values are likely to be reference documents, so it's probably more useful to focus on pages in the mid- and lower-range values.
  • Assess the links to code.
Links to code 0
  • No links to GitLab, Gerrit, or Github were found on the page. This isn't necessarily bad: many types of technical documentation have no reason to link directly to a repository or code file.
  • If the page is about a specific technical product, component, or software, it's more likely to need links to a repository. In that case, assess the links to code.
Code samples True
Code samples False
  • No code samples were found on the page. This isn't necessarily bad: many types of technical documentation have no reason to include code samples, or they may link to code (or samples) stored off-wiki instead.
  • Assess whether the page should have code samples.

Assess code samples

[edit]

If the page does not contain any code samples: consider the type of document, its purpose, and audience.

  • Navigation pages or overviews should generally not contain code samples. A low technical content score for those types of pages is okay!
  • Technical user guides that include development tasks like using a command-line tool, querying a database, or interacting with an API should probably contain code samples.
  • Tutorials should probably contain code samples, though this depends on the topic.

If the page does contain code samples, check the following:

  • Are the code samples correct, or have they diverged from the code in production?
  • Do they use or function with the most recent stable version of the language (for example: Python 3 compatible)?
  • Do they follow relevant coding conventions?
  • Do they use descriptive class, method, and variable names?
  • Do they use proper formatting to distinguish code from other content?
[edit]

If the page does not contain links to code:

If the page does contain links to code:

  • Check if the links point to active repositories and valid code files.
  • If the link points to a specific line in a file: does the link reference a specific commit, to ensure the link remains valid even if the relevant line number changes as the code in that file evolves over time?
  • Follow tips for creating permanent links to files and repositories:

Interpret popularity scores

[edit]

The popularity score attempts to capture how relevant a page appears to be based on traffic, incoming links, and revisions. Because the findability of a page can impact its pageviews, raw pageviews aren't always the most reliable measure of how relevant or useful the page would be to people who can find it. So, this metric uses percentage of page watchers that visited the page instead of raw pageviews. We think we can get a clearer signal of content relevance by measuring how often people who are clearly aware of a page's existence actually visit it. The assumption here is: if a significant percentage of page watchers are viewing the page, and/or if it has been edited in the past 6 months, and many pages link to it, the page is more likely to contain relevant content than other pages that don't have those signals.

Doc data element Value How to interpret and use the data
Incoming links Less than 10
  • The page has a low number of incoming links. If you can easily determine that the page is no longer relevant, mark it as historical or archived. If the page seems to still be relevant, consider ways to improve page findability. Or, consider whether its information could instead be combined with other, more popular pages.
Incoming links 10-20
  • The page has a below average[1] number of incoming links, but that could be okay: there are many valid reasons for this. If you can easily determine that the page is no longer relevant, mark it as historical or archived, and try to add a redirect to a more relevant page. If the page seems to still be relevant, consider ways to improve page findability. Or, consider whether its information could instead be combined with other, more popular pages.
Incoming links 20-50
  • The page has an average or above average number of incoming links. Consider ways to improve page findability. Or, consider whether its information could instead be combined with other, even more popular pages.
  • If it doesn't make sense to consolidate page content, then investigate ways to improve content relevance.
Incoming links 50-100
  • This page has an above average number of incoming links. Focus on making its content as excellent as possible.
  • Look for pages that link to the page, but contain duplicate information, or content that you could move into the more popular page. Try to consolidate information without making any single page too long or overwhelming (see succinctness guidelines).
  • After consolidating information, deprecate or archive less popular pages, and add a redirect to your page. This reduces information sharding and also makes the already-popular page even more findable and useful.
Incoming links More than 100
Visiting watcher percent None (NaN, null)
  • The page has fewer than 30 watchers, so data is not available.
  • Check if this page is still valid and useful, or whether it may need to be marked as historical or archived.
  • If it is still valid and useful, look for ways to improve its findability.
Visiting watcher percent 0-25%
  • The page has a small or below average[2] amount of visiting watchers. This is good because it's better than none!
  • Consider trying to improve its findability.
Visiting watcher percent 25-75%
  • The page has an average or above average amount of visiting watchers.
  • Consider potential ways to improve its findability, but also focus on the content: if it were more relevant, maybe the page would become even more popular?
Visiting watcher percent Above 75%
  • The page is frequently visited (and hopefully also maintained) by the subset of people who know about it.
  • Look at other doc data elements to see if findability could be increased, and also focus on making the page content as useful and high-quality as possible: use Documentation templates and linting tools to guide you.
More than 1 edit in past 6mo Yes
  • The page had at least 1 edit in the past 6 months. However, that may not mean it is actively maintained, and you can't assume that even a high number of revisions means the page is fully accurate and complete.
  • This data element is mostly useful for differentiating which pages to focus on if they scored low in other categories.
More than 1 edit in past 6mo No
  • The page has not been edited in the past 6 months. However, that may not mean it's incorrect or no longer relevant.
  • Check whether it should be marked historical or archived, but then look at other data signals to decide whether to focus more on improving the content or improving the findability.

Improve page findability

[edit]

If pages have a good technical content score but a low popularity score, try to improve their findability. Note that landing pages, portal pages, or other navigation-focused pages may have a low technical content score but a high popularity score; this is okay. These metrics are intended more for content pages like user guides, overviews, and tutorials.

If a page serves a special purpose or covers a niche topic, it may be okay if it has very few incoming links. However, if the page contains information relevant for a wide audience, look for the following to try to improve the content's findability:

  • Clear and concise page title: The title should be descriptive and specific. This helps searchers decide whether to click links to the page. Also, on-wiki search sends users directly to a page if it has the same title as their search query, so the more the page title aligns with user's language, the more likely they are to find it. Example: "Accessing Instances on Cloud VPS" is a more descriptive and specific page title than "Instances".
  • Short introduction: Include a short introduction as the first text on the page following the title, before the first heading. This should briefly introduce the purpose of the page, its audience, and topic. Intro section content often appears as a snippet in search results, so it help searchers assess the relevance of a page for their need.
  • Shallow page nesting (subpaging): Avoid creating subpages deeper than 3 levels. This can make it harder to find pages by browsing. Also, on-wiki search results display the entire page path, so more shallow subpaging makes it easier to read page titles in search results.
  • Templates: Check for templates like {{Historical}} or {{Archive}}. Depending on the wiki's search configuration, these templates can cause a page to be ranked lower in search results. If the page should be historical or archived, this is a good thing.
  • Navigation menus: Is the page part of a collection of pages that use a navigation menu? If so, is the page included in that menu? Should it be?
  • Cross-references to and from code: if this is a page for a specific product or technology: are there links in both directions between code and on-wiki docs? Specifically check for the following:
    • Does the wiki landing page for the product/technology include links to the relevant code repositories and issues trackers?
    • Is there a README or documentation stored in the root directory that includes a link to the on-wiki documentation?
    • If the page relates to a tool or component that has a web UI: are there links in the tool/web UI to the on-wiki documentation?
  • Categories: Is this page part of a thematic area in which other pages use Categories? If so, check the definition of the Category and consider adding it to your page.
  • Links from related, popular pages: Use Special:WhatLinksHere to identify incoming links. If there are none: are there pages that should link to this page? On-wiki search sorts suggestions by the number of incoming links, so pages that have no incoming links may appear lower in results lists.
    • Consider which other pages users might encounter when trying to find the information on your page. Try searching on the wiki for keywords related to your page, and explore the pages at the top of the search results.
    • When you find related pages, consider how the reader of that page may benefit from your page. Under what conditions would they need the information on your page?
    • If you find a more prominent and well-known page that could be a better home for your content, consider consolidating your page's content into the more popular page, and adding a redirect or cross-reference. This process helps consolidate useful and relevant information into already heavily-used pages, and reduce the number of pages users must consult to get valuable information.
Look for meaningful places to add links to your page from related pages. Avoid creating long lists of links in See Also sections; this makes it more difficult for readers to understand the relevance of each link in the list.


Improve content relevance

[edit]

If pages have a good popularity score, but low technical content score:

  • Look for missing information:
    • Before you try to expand the content on a page, check if a different, more popular page already covers the information. Would it be better to consolidate information than to add more content to this page?
    • Read comments on the Talk page to identify missing information. But proceed with caution, since comments may be outdated. If you're not a subject matter expert in the domain this page covers, you may have to get input from a collaborator who has the expertise to identify missing or inaccurate information.

Troubleshooting

[edit]

KeyError: 'parse'

[edit]

Verify that page inputs match the requirements. Check that each page title in your list is enclosed in quotes, and followed by a comma.

Button to get metrics in web app does nothing

[edit]

This error appears intermittently in Firefox, but not for all users, and is hard to reproduce. Usually, if you click the button several more times, it will eventually work. If that fails, you could try a different browser, or use the PAWS notebook instead. Please also add a comment to T396384, since more information may help us determine the cause.

See also

[edit]

References

[edit]
  1. Benchmarks from test dataset - IIncoming links: Average: 31.85; Median: 17.5; Min: 0; Max: 271
  2. Benchmarks from test dataset - Visiting watcher percent: Average 36%; Median 25%.