Page Content Service/References

The references endpoint provides a structured output of reference lists found on a particular page (phab:T170690).

Structure
HTML and JSON are formatted for easier reading. Some attributes (some ids and data-mw) are removed for the same reason.

The output contains two main objects:
 * reference_lists : a list of reference lists, useful to build a reference lists UI; potentially there could be more information about sections if the section was omitted due to being empty after the ref list was stripped
 * references_by_id : a map of reference details

The reference_lists object
The reference_lists object is included to build a native view of reference lists. It currently contains only references lists. In the future there could be other things added as well, like information about section headings and other text (HTML) objects. The latter could come if a section towards the end of the article would become empty after stripping a references list. See T170690#3467608.

The references_by_id object
The references_by_id object contains a hash of reference details. Each references_by_id entry has an array of back_links, the HTML content , and some have an optional array for citation decorations.

Example 1: one ref list, one ref
{{markup
 * Bar.

Example 2: two simple ref lists, one ref with two backlinks
{{markup
 * Foo . Bar.

Example 3: Various citation types
Now let's look at more complex reference content. Reference content can have cite elements, which leads to various entries in the citation array in the output. citation decorations usually have one or more of the following values: book, journal , news , web. The citation decorations are derived from any cite elements in the HTML content.

{{markup
 * There are nearly 20,000 known species of bees in seven recognized biological families.

TODOs

 * [x] backlinks: make them objects with the backlink href and the link content (T182647)
 * [x] citations becomes type with just a single (enum) value of "web", "news", "journal", "book", "generic". (T182652)

Decisions

 * Citations:
 * Cite elements usually have some kind of type indicator in the class list, like "citation web" or "citation book".
 * We show only one single value in the type field. If there is a single cite tag anywhere in the reference content or there are multiple cite tags with the same value we show that value. If there are none or multiple cite tags with different values we show "generic".

Open questions
Examples: Look above for class="Z3988" or here:
 * Can be added later: Should we keep the bibliographic metadata? See also the COinS syntax. It's a machine-readable format for bibliographic metadata. They appear right after cite elements. It doesn't get displayed (see style="display:none; ). Historically MCS has stripped this out from mobile-sections (removing elements with 'span.Z3988' ). Currently it's also stripped out by the new references endpoint.

Links

 * General references
 * w:en:Help:Referencing_for_beginners
 * w:en:Help:Shortened_footnotes
 * Help:System_message
 * modified Cite system messages on enwiki
 * Cite tags
 * VisualEditor/Citation_tool
 * Extension:Cite