Jump to content

Wikimedia Technical Documentation Team/Doc metrics/Prototype

From mediawiki.org

This page explains the data sources and calculations implemented in the documentation metrics generator tool.

Quick reference: doc data used in metrics

[edit]

The table below summarizes the information explained in more detail in the following sections:

Metric Data retrieved Type of signal Data source Bucketized based on benchmarks?
Succintness Section count Content XTools API Benchmarks + buckets
Succintness Page size in bytes Content Action API Benchmarks + buckets
Developer relevance Percentage of page watchers who visited in the last 30 days
  • Pages with < 30 watchers won't show data.
Traffic/edits (popularity) Action API Benchmarks + buckets
Developer relevance More than one edit in the last 6 months? Traffic/edits (popularity) Analytics API Buckets only, no benchmarks needed
Developer relevance Links to code repos from wiki pages
  • Messy data: will include links to repos for dependencies or upstreams.
  • Checks for links to Gerrit, Github, and GitLab in wikitext.
Content relevance Action API Buckets only, no benchmarks needed
Developer relevance Presence of code samples on page
  • Ignores the nuance of whether the page could/should have code samples.
  • Checks for classes used in code snippets in wikitext: "wt-codesample-wrapper" or "mw-highlight-lang".
Content relevance Action API Buckets only, no benchmarks needed
Developer relevance Incoming links from the same wiki
  • Prototype implementation excludes transclusions and redirects, but includes translation pages.
Content relevance XTools API Benchmarks + buckets

How the doc metrics generator works

[edit]

Gather and combine raw doc data from APIs

[edit]

The script uses the Action API Info module to get three pieces of data:

  • "length": Page length in bytes, used in Succinctness metric.
  • "watchers" and "visitingwatchers": Used to calculate percent of watchers that visited the page ("visitingwatcherpercent") in the past 30 days, which is then used in Developer relevance metric. If less than 30 watchers, no data is returned for that page.

Then, the script uses the XTools API Page Links endpoint to get three pieces of data:

  • "number_of_sections": Always at least one section. Used in the Succinctness metric along with page length, to calculate a ratio reflecting how structured the page is.
  • "incominglinks" and "redirects": Used to calculate "incominglinksnoredirects", the number of incoming links to the page from other mediawiki.org pages. Excludes transclusions and redirects, but includes translation pages. Used in the Developer relevance metric.

Next, the script uses the Analytics API Edits endpoint to get a single datapoint indicating whether the page had any edits in the past 6 months:

  • "sixmonth_edits": Calculated based on the time range starting six months before the current date (the date when the script is run). The API returns a 404 status if no edits occurred during the requested time range. Used in the Developer relevance metric.

Finally, the script uses the Action API to parse wikitext, and check for the presence of two types of content relevant for technical audiences:

  • "code_samples_in_content": Checks for classes used in code snippets. Used in Developer relevance metric.
  • "links_to_gerrit", "links_to_github", and "links_to_gitlab": Checks for links to the primary code repositories used by Wikimedia projects. Used in Developer relevance metric.

These last signals are noisy: they may contain links to upstream code, or snippets that no longer work. These constraints are further discussed in the v0 metrics test assessment.

Calculate metrics scores

[edit]

Succinctness metric outputs

[edit]
  • Succinct score: the overall score for this metric. Lower score means a page is less succinct (this is bad). The overall score is calculated by adding together the scores for "Length in bytes" and "Section to length ratio". Those sub-scores are not returned as output; instead, the output includes the actual values for those data elements, since that is more useful information to guide documentation work.
    • Length in bytes: The bucket min/max values are based on benchmarks from the v0 metrics testing. Warning: page length in bytes is not always an accurate reflection of actual, rendered page length: templates can include content that makes pages very long to a reader, but shorter in byte size.
    • Section to length ratio: Calculated based on number of sections and page length. The ratio itself (not the score it generated) is returned in the results output, because this is more useful data to guide documentation work. The ratio is scored according to benchmarks data, with ranges of values being placed into buckets that receive a score based on whether they reflect a dense and unstructred vs. more succinct, more structured page.

The table below summarizes how the scores are calculated. To see the math behind this, look at the calculate_succinct function in the script.

How Succinctness score is calculated
Raw input Impact on score Min/Max possible score
Page length in bytes If length is:
  • 0 - 500: score +10
  • 500 - 2500: score +30
  • 2500 - 5000: score +50
  • 5000 - 7500: score +40
  • 7500 - 10000: score +30
  • Over 15000: score +10
10 - 50
Section to length ratio If number of sections divided by page length in bytes is:
  • 1 - 200: score +20
  • 200 - 1400: score +50
  • 1400 - 2000: score +30
  • 2000 - 5000: score +10

Every page has at least 1 section.

10 - 50

Developer relevance metric outputs

[edit]

Developer relevance score: the overall score for this metric. Lower score is worse. The overall score is calculated by adding together two sub-scores:

  • Technical content score: the sub-score for technical content signals. Lower score is worse. The raw values for these input signals are included in the metrics output to help guide documentation work:
    • Links to code: The number of links to code repositories found by parsing the content.
    • Code samples: TRUE/FALSE, whether code samples were detected on the page.
  • Popularity score: the sub-score for popularity (revisions/traffic) signals. Lower score is worse. The raw values for these input signals are included in the metrics output to help guide documentation work:
    • Incoming links: raw number of incoming links
    • Visiting watcher percent: rounded to two decimal places; calculated based on watchers and visiting watchers.
    • More than 1 edit in past 6mo: boolean (1 or 0): 0 indicates there was not at least 1 edit in the past 6 months. That time range is calculated from the date that the script is being run.

The table below summarizes how the scores are calculated. To see the specific implementation, look at the calculate_devrelevance function in the script.

How Developer Relevance score is calculated
Raw input Impact on score Min/Max possible score
Links to code repos?
  • If any links to GitLab, Gerrit, or Github: Technical content score +50
  • If no links to Gitlab, Gerrit, or Github: Technical content score 0
0 or 50
Code samples on page?
  • If any code samples on page: Technical content score +50
  • If no code samples on page: Technical content score 0
0 or 50
Total Technical Content score 0, 50, or 100
Incoming links
  • Fewer than 10 links: no score increase
  • More than 10 links but fewer than 20: Popularity score +10
  • More than 20 links but fewer than 50: Popularity score +20
  • More than 50 links but fewer than 100: Popularity score +30
  • More than 100 links: Popularity score +40
0-40
Visiting watcher percent
  • Null value: fewer than 30 watchers, so no data
  • 0-25% of page watchers visited in last 30 days: Popularity score +10
  • 25-75% of page watchers visited in last 30 days: Popularity score +20
  • More than 75% of page watchers visited in last 30 days: Popularity score +30
0-30
More than 1 edit in past 6 months? If true: Popularity score + 30 0 or 30
Total Popularity score 0-100

See also

[edit]

Benchmarks from test data

[edit]

Our test dataset included 140 pages from mediawiki.org. For the data elements listed below, the value distributions from the test dataset helped define the buckets that the metrics prototype uses for scoring. For full details of the metrics testing, see Doc metrics/v0#Outcomes of metrics testing.

  • Incoming links: Average: 31.85; Median: 17.5; Min: 0; Max: 271
  • Visiting watcher percent: Average 36%; Median 25%.