Phoenix books

References are a rich opportunity to create data objects from an article's page content

Last updated: 2022-12-16 by APaskulin (WMF)
Status: v1 published May 2021

Overview

Work we did before

As an initial prototype, we built a structured content store (aka knowledge store) as a “mid tier” between a content source (in this case, Simple English Wikipedia) and consumers. The video linked below shows the demo site we created.

The goal was to build

A tiny, experimental modern platform
that can serve collections of knowledge
created from multiple trusted sources (although we only used one source, more could be added using the same patterns)
to many product experiences and other platforms.

The easiest way to familiarize yourself with it is to read the summary, watch the video and peruse the repository.

Structured content summary
- Model
- Video
- Repo

Why references?

References (aka citations) are a rich opportunity to create data objects from an article's page content. We could, for example, interrelate books with the pages and subjects they reference. At the moment, references are a wikitext list at the bottom of pages. (And are formatted differently across the ecosystem.) For example, w:Albert_Einstein#References

Distributed as data, they could be used to display content as rich content.

References future ideas
Event storming overview (we asked people to model how the current editing process works)

What we’ll do

We will make a first attempt at untangling references … beginning with books that are referenced on Simple English Wikipedia. There are questions we’ll need to answer together as we go. We can use as much of the previous work as we’d like.

At a high-level, we will:

Get an article from Simple English Wikipedia
Break it down into parts (sections and citations)
Structure it according to the canonical data model (we’ll add a structure for books as citations)
Save it to the knowledge store (S3) as data objects interrelated by hypermedia links (aka a graph)
Import the topics associated with those objects from the previous knowledge store
Associate book with page and topic(s) in elastic search (lots to talk about here)
Repeat for all articles
Add a query language (GraphQL) on top and configure it to return
- Books associated with a page or section (TBD which level we associate)
- Books associated with a topic
- A single book (TBD: how would you find it?)
Consider how this might also help the editing process (see models above)

We will also deliver an artifact that enables others to understand our thinking.

Job Stories

(most are epics we’ll break down)

Story	“Done” Reviewer	Notes
When I display a book to users, I want to see a design for look and feel so I know how to style it.	Moriel	Shall we take the ones done by the mobile team? Let’s use this to be specific about what we’ll display so we can ensure the system can provide it.
When I begin developing, I want access to all essential tools so that I can deliver changes.	S&F & WMF as needed	Repo (forked) Current artifacts Backlog AWS group (if we need the previous prototype, can it be run locally?) Event stream
When I store a book, I want to see the data model so I can correctly structure it	Alex	Expand current model
When I get an article from the source, I want to break it down into sections and citations so I can create interrelated data objects	Diana	We can make use of the previous prototype and add/expand the process. We were responding to events in the source but we don’t need to respond to changes in this iteration. How to batch?
When I store knowledge, I want it to have a predictable structure so it can be semantically understood by all consumers.	Diana	Note: Ideally, we’d use the updated canonical data model.
When an article is structured, I want to save it in the knowledge store so I can serve it to consumers.	Diana	Revisit naming the S3 buckets by the page name which must be unique (musn’t it?)
When I need to retrieve knowledge, I want an interface that enables me to get only what I need	Moriel	GraphQL (with ability to get books)
When I want to retrieve a book, I need to know how to ask for it	Architeam	How will we name these objects?
When I query for a book, I want to get the topics associated with the content it references so I understand it in context	Diana	How will we leverage search to improve the relationships between the data objects? There’s a lot to unpack here
When I demonstrate this work, I want the foundation to understand it’s highest value so next steps can be taken	Kate	Prioritizing what we’ll use this data to demonstrate and how far down the editing rabbit hole we want to go
When I explore this prototype, I want to know how so I can play with it	Alex
I want a front end demonstration of this data so I can see the value of this work	Moriel
As a stakeholder, I want to see work in progress so I stay informed	Diana	Recorded demos and updates in repo

Risks & notes

The source data is inconsistent wikitext
Do we associate citations with the page or sections?
There are tools that structure citations for wikitext editing, can we leverage that?
There are lots of ways we can interrelated this data but we need to decide how much effort each is worth
We need to write down any issues we choose to ignore or tradeoffs we make for the summary artifact