Architecture Repository/Artifacts/Phoenix books

From mediawiki.org

Wikimedia logo Wikimedia Architecture Repository
Home | Artifacts | Process | Patterns

Phoenix books[edit]

References are a rich opportunity to create data objects from an article's page content

Last updated: 2022-12-16 by APaskulin (WMF)
Status: v1 published May 2021

Overview[edit]

Work we did before[edit]

As an initial prototype, we built a structured content store (aka knowledge store) as a “mid tier” between a content source (in this case, Simple English Wikipedia) and consumers. The video linked below shows the demo site we created.

The goal was to build

  • A tiny, experimental modern platform
  • that can serve collections of knowledge
  • created from multiple trusted sources (although we only used one source, more could be added using the same patterns)
  • to many product experiences and other platforms.

The easiest way to familiarize yourself with it is to read the summary, watch the video and peruse the repository.

Why references?[edit]

References (aka citations) are a rich opportunity to create data objects from an article's page content. We could, for example, interrelate books with the pages and subjects they reference. At the moment, references are a wikitext list at the bottom of pages. (And are formatted differently across the ecosystem.) For example, w:Albert_Einstein#References

Distributed as data, they could be used to display content as rich content.

What we’ll do[edit]

We will make a first attempt at untangling references … beginning with books that are referenced on Simple English Wikipedia. There are questions we’ll need to answer together as we go. We can use as much of the previous work as we’d like.

At a high-level, we will:

  • Get an article from Simple English Wikipedia
  • Break it down into parts (sections and citations)
  • Structure it according to the canonical data model (we’ll add a structure for books as citations)
  • Save it to the knowledge store (S3) as data objects interrelated by hypermedia links (aka a graph)
  • Import the topics associated with those objects from the previous knowledge store
  • Associate book with page and topic(s) in elastic search (lots to talk about here)
  • Repeat for all articles
  • Add a query language (GraphQL) on top and configure it to return
    • Books associated with a page or section (TBD which level we associate)
    • Books associated with a topic
    • A single book (TBD: how would you find it?)
  • Consider how this might also help the editing process (see models above)

We will also deliver an artifact that enables others to understand our thinking.

Job Stories[edit]

(most are epics we’ll break down)

Story “Done” Reviewer Notes
When I display a book to users, I want to see a design for look and feel so I know how to style it. Moriel Shall we take the ones done by the mobile team?

Let’s use this to be specific about what we’ll display so we can ensure the system can provide it.

When I begin developing, I want access to all essential tools so that I can deliver changes. S&F & WMF as needed
  • Repo (forked)
  • Current artifacts
  • Backlog
  • AWS group (if we need the previous prototype, can it be run locally?)
  • Event stream
When I store a book, I want to see the data model so I can correctly structure it Alex Expand current model
When I get an article from the source, I want to break  it down into sections and citations so I can create interrelated data objects Diana We can make use of the previous prototype and add/expand the process.

We were responding to events in the source but we don’t need to respond to changes in this iteration. How to batch?

When I store knowledge, I want it to have a predictable structure so it can be semantically understood by all consumers. Diana Note: Ideally, we’d use the updated canonical data model.
When an article is structured, I want to save it in the knowledge store so I can serve it to consumers. Diana Revisit naming the S3 buckets by the page name which must be unique (musn’t it?)
When I need to retrieve knowledge, I want an interface that enables me to get only what I need Moriel GraphQL (with ability to get books)
When I want to retrieve a book, I need to know how to ask for it Architeam How will we name these objects?
When I query for a book, I want to get the topics associated with the content it references so I understand it in context Diana How will we leverage search to improve the relationships between the data objects?

There’s a lot to unpack here

When I demonstrate this work, I want the foundation to understand it’s highest value so next steps can be taken Kate Prioritizing what we’ll use this data to demonstrate and how far down the editing rabbit hole we want to go
When I explore this prototype, I want to know how so I can play with it Alex
I want a front end demonstration of this data so I can see the value of this work Moriel
As a stakeholder, I want to see work in progress so I stay informed Diana Recorded demos and updates in repo

Risks & notes[edit]

  • The source data is inconsistent wikitext
  • Do we associate citations with the page or sections?
  • There are tools that structure citations for wikitext editing, can we leverage that?
  • There are lots of ways we can interrelated this data but we need to decide how much effort each is worth
  • We need to write down any issues we choose to ignore or tradeoffs we make for the summary artifact