Wikimedia Developer Summit/2017/Annotations


Session Overview[edit]

Day & Time
Tuesday 2:30 - 3:40
Chapel Hill
Phabricator Task Link
C. Scott
Matt, Nick
Remote Moderator
<no remote feed>


Video of this session is on commons at: Annotations.

Session Summary[edit]

Discuss problems and solutions for annotations of wiki content
There are many use cases for annotations. We should use Open Annotations as a standard, and start with multimedia as a use case.
  1. Discuss use cases
  2. Overview and MCR contributions
  3. Discuss open questions (API, granularity)
  4. Priorities & next steps


There are a number of cases where we have additional information to associate with main document.

In some cases, we already have it and want to pull it out. In others we want new stuff without "polluting" the document with information not relevant to all users/editors.

Distinguishing between "markup" and "annotations" in our projects -- standard wikitext markup is intended to be authored by the same person editing the main text. Annotations typically involve a workflow distinct from document editing. In the "translation" case, for example, the document authors/editors may not even be fluent in the translated languages. In other cases, the markup is maintained by other specialized teams. There is some grey area, of course; eg. are regions annotated in an image part of the main document, or a separate workflow?

Use cases[edit]

We brainstormed a bunch of different possible use cases for an annotation facility. Each is tagged with some member of the community active in that project who might serve as contact for future work.

  1. Translate extension (translation regions) - @Lokal_Profil, @cscott
    • Explicit markup currently in wikitext
  2. ContentTranslation - @cscott et. al
    • Currently for creating new articles in language X if there is one in Y. But correspondence is lost after article creation.
    • Annotations would be used to maintaining correspondence between each paragraph in old and new.
    • Long term plan for CX is to allow you to keep translations in sync. Need an annotation to say what's associated with what.
  3. Wikispeech - Annotate specific parts of an article with pronunciations. Word pronunciations. - @Lokal_Profil, @Sebastian_Berlin_(WMSE)
  4. VisualEditor/Flow - Inline discussion. Region should persist. @ESanders @Mattflaschen (not on Collaboration team roadmap)
  5. References - region -> citation data in Wikidata/wikibase. Also for "citation needed". @Dario? (wikicite) @Tarrow
  6. Non-Wikimedia. It's useful for text analysis. Think wikisource, with a document and then commentary on it.
  7. Data-curators. E.g. has diagrams of scientific discoveries. We have technical experts, who come and approve a certain version. Some people only want to see content that was in an approved version. @Anders
    • Something like fine-grained "flagged revisions".
    • Could be done as an annotation containing the "approved" content. A fuzzy match on that annotation later could trigger UI letting you know the content hasn't (yet) been approved, along with UX letting you switch between current and approved version (pulled from annotation data).
  8. Wikisource
    • for example, you can still add annotations as a first pass [?]
  9. Figure references - @Marktraceur
    • Associate an article figure with a *range* of text which refers to it, not just a point
    • Gives the renderer more freedom to decide where best to place the image
  10. Edit conflicts / proposed edits - Certainly store the entire change I want to do. @James_F? @DChan
  11. Language converter - @DChan, @cscott
    • Store individual exceptions to the language converter glossary in use (similar to wikispeech use for pronunciation exceptions)
  12. File Annotations - images, audio, video. @Marktraceur, @Prtksxna, @Brion
    • framework already considers annotation of multiple content types, including media types
    • You can already box parts of an image with annotations on WMF projects, done through a gadget. Would be nice to do this canonically.
    • Same for video (e.g. period of time)
    • Render in a cool way
  13. Maintenance tags/templates - Issue [page/section/inline] tags. @????
  14. Wikisource -- proofreading, and marking non-linear transcribed regions in the presence of inline advertisements, column jumps, etc
  15. Presenational annotations, for instance pull quotes. (Figure references as well.)
  16. Fine-grained flagged revisions and Multi-Content Revisions[edit]

C. Scott did a proof of concept implementation of a replacement for the Translate extension using annotations at the editing off-site a few months ago, using the code as much as possible:

Basic structure of

  1. Front end code (client side JS, UX for creating/editing annotations)
  2. OpenAnnotation standard (and library implementing the data structures in that standard)
  3. Backend code (indexed storage of annotations)

C. Scott really likes the second part: the annotation standard and libraries supporting it. Strongly encourages folks to use this rather make up own annotation formats.

Front-end code worked for, but we'd need other front ends for our different use cases.

The backend really isn't appropriate for WMF production use, and C. Scott didn't really want to write & maintain his own storage service.

  • But this morning Daniel K said we could use his multi-content revisions work to store annotations.
  • So backend can be considered solved by work on the WMF roadmap (and we recently got a commons grant to fund the MCR work)
    • ... unless someone has a better idea and wants to build a special-purpose storage backend.

In general was really nice, although it was not as factored as C. Scott would like.

  • IIRC one of the issues was that applying annotations to pristine documents was in the openannotation library, but the fuzzy match stuff was buried in the front-end code in a way that wasn't easily separable. Any realistic use of annotations will need to have non-exact matching.

Can you summarize the concerns with the backend?

  • Not really to the standards of WMF ops -- massively scalable, monitored, etc.
  • Takes a lot of effort to deploy a new service to WMF production, commitment to maintain it, etc.
  • Then we'd have to consider dumps, archiving, backups, etc, etc.

C. Scott talked to Sj at Wikimedia in Esino Lario this year about annotations; he was the one who pointed to

  1. Frontend Go to any arbitrary website, write annotations. Fetches annotations from backend and does actual matching. DOM tree and UX muddled together in the code base.
  2. OpenAnnotation spec. Can do strict matches. Really liked this (Code base). (I think it's missing fuzzy matching though.)
  3. Backend - Part which looked competent, but which I didn't want to maintain/try to get deployed into production.

Dario: Main reason to ask is that I'm friends with founder. Their approach is agnostic to what part of the stack you want to cherry-pick. They're building a coalition mostly based on second layer, with commitment to do something for the other layers. They are very interested in seeing if Wikimedia can become a partner in adopting some version of the bottom tiers. See

C. Scott: Would be excellent to be interoperable with standard. It would also be nice to interoperate to some degree on an API level, publishing our annotations in a way that might allow the front-end tools to eventually be able to do something useful with them.

Trevor: Arbitrary DOM range does seem good for some of these use cases, but not all.

C. Scott: Main one currently used is XPath plus (markup-stripped) context text (more details). There are other ways to anchor the annotations in the standard.

(A question about non-exact matches, and how the editors/UX handles this.)

C. Scott: Translate has been doing this for a while (surfacing that it's a fuzzy match and you need to look at this). What sort of common features might be pull out to handle these? A fuzzy match is one piece of commonality. How many different sorts of annotation regions are there?

C. Scott: This is where my certainty ends. Pretty sure general annotation support would be useful for a bunch of different folks, pretty sure that OpenAnnotation is a good extensible/interoperable way to represent those annotations, pretty sure we can use MCR to implement backend storage of annotations on our content. But there are some questions about API, implementation, what sorts of common functionality to build, etc.


"Prompt" -vs- "Delayed" API:[edit]
  • Prompt - The annotation anchor is resolved immediately when the page is edited, and the updated annotation is stored if different.
    • The client gets a precise anchor and an indication of 'fuzziness'.
    • You learn immediately after edit when the fuzziness is introduced.
    • Each update is fast and fine-grained (only one revision-change is considered)
    • Each additional annotation creates increased storage/performance impact, even if they are not being directly edited.
    • It exposes "update annotation" necessity to core.
  • Delayed - Anchors are only resolved when they are used; updated anchors are only stored when they are edited (although the updated anchor may be cached)
    • No performance/storage impact to add annotations, unless they are being used/edited.
    • Client has to compute a precise anchor, potentially by rebasing through a long-ish history of intermediate edits.
    • No immediate indication of "fuzziness" to put on project members' worklists.
    • Decouples annotation mechanism from core.

[Not mentioned in discussion: - strategy for maintaining anchors in the face of unpredictable edits. This tries to be robust in the face of more adversarial conditions than we would experience, e.g. the old revision may no longer be accessible at all.]

This fuzzy anchoring strategy was just the default anchoring mechanism I described. cscott (talk) 17:51, 11 January 2017 (UTC)Reply[reply]

Trevor: In VE there is a concept of translating a range. There is a range before and after a transaction. You could store enough meta information with a revision to answer that question. Given an arbitrary range, after this, what is the range now? In VE, you don't have to worry about fuzz, since you know exactly what happened.

C. Scott: There are always edge cases, e.g. deleting an entire paragraph, character by character.

Visual Editor maintains an even-more-fine-grained model of changes that occurred to a revision. "Prompt" anchor resolution might be able to better take advantage of this to move anchors. interoperability[edit]

C. Scott: maps URLs to annotations. There are actually several URLs associated with our articles, for example:

When I did my proof-of-concept w/, I had to play some games to associate annotations with a specific revision, not just the article.

Stability of representation[edit]

Trevor: If annotations are implemented in the client, it would fail if you have radically different renderings. It imposes a certain rigidity in the rendering of the content. If there's a way to have it annotate on Parsoid DOM, that would solve it.

Matt: Yes, there are a lot of transformations needed to Parsoid DOM before it is used for view. Flow has the same thing.

Trevor: The view must be stable for this to work.

Matt: Preferably, you can see annotations in both VE and view mode.

C. Scott: (going through use cases) Translate and ContentTranslation work at the stored level ("platonic" DOM). For VE, if you want to render in preview without VE, it's a problem.

Tom: You could use fuzzy text matching to map between different views (e.g. Parsoid, PDF) with context text, ignoring XPath.

C. Scott: (continuing through use cases) References are probably hard. If the source rendering is torn to shreds (as it done by our mobile frontend), references might be moved. OTOH "Proposed edits" apply to the platonic version. Server-side LanguageConverter applies to platonic markup, although there's a proposal from LanguageConverter community that you should be able to do switches on the fly (ie client-side conversion).

C. Scott: Annotations on files should be fine, right? Maybe not, MobileFrontend changes images too.

Fuzzy matching / dealing with change[edit]

Trevor - Indexing every character with ID is one extreme. Fuzzy matching is the other extreme. What comes to mind is real-time editing. After interviewing a lot of people that worked on RTE and evaluating their performance, everyone obsesses about conflict resolution, but it comes up in 0.1%.

C. Scott - For online things like Google Docs when you have low latency, conflict resolution is infrequent. But for higher latency or offline edits, necessity of conflict resolution increases.

Trevor - Even then, it happens less than you think. The clever thing is that fuzzy solves most cases pretty easily.

C. Scott - It's related to how often you sync up. An annotation used for translation might not be re-translated for 100 edits, so it might be problematic for a naive fuzzy match.

C. Scott - OTOH WikiSpeech pronunciation annotations should be for uncommon words, so they are probably easy to re-find in the edited document.

André - Tried to figure out minimum. We needed both range and its context.

Straw poll! (prompt -vs- delayed)[edit]

C. Scott - Prompt or Delayed? Sense of the room about which API is better?

  • Straw poll: Prompt:5, Delayed: 3

Trevor - May have to do both

David Chan - Wouldn't assume this has to be slow.

Trevor - I was drawing a connection between this and real-time. One of the reasons you can be a little lax is that your cursor will show the wrong thing. Visibility here might be less.

C. Scott - My PhD advisor drew a lot of flak by claiming most failures don't matter, e.g. air traffic control: if you program loses track of the plane, eventually the plane is on the ground. In some cases it's more important to flag the error ("missing plane") and keep going, rather than risk crashing and losing track of all the planes.

Matt - If you capture a lot of context, you will still have failures, but you will realize it's a failure.

C. Scott - We try to make our failures big and obvious so they get fixed. That's Wikimedia ethos.

Trevor this is assuming you can detect when it's failed.

C. Scott - May differ per application. Translate I'm not as worried about, because all readers of the translated article would notice the mistranslation. Losing a pronunciation annotation in WikiSpeech might be more problematic, because there's a certain minimum time required to listen to the whole article, which would make it harder to stumble across the error.

David Chan - Not necessarily so obvious, if you just pull a random sentence out of an article, it's not that obvious. Worst translation errors are where you lose the meaning

Matt - Important thing is that you don't associate it with the wrong place. It's okay to fail or associate it with the overall section.

Trevor - It still has its own level of fuzziness that can be fooled by moving things around.

Mark - I think we're right that we need both. Look at Translate, you're not going to update the other language.

Matt - This is mainly a DB storage issue, since we're doing the same transformations either way.

C. Scott - It's also a question of what we expose to core (which implementation details). There might be an intermediate point, where the database storage is delayed, but we have some sort of job queue that runs over the changed files to pull annotations forward and/or discover new fuzziness.

Trevor - One of the great advantages of having file annotations is that then when you're doing media search, you can match the search term against the paragraph the image is associated with. You'll need this data anyway.

C. Scott - I think this works with both prompt and delayed storage. Imagine that you search on Abraham Lincoln . You find a match either way. For delayed storage you just have to port the annotation forward and maybe revalidate it before showing it to user.

(What about storage implications. Is it true that prompt storage would bloat the db?)

Matt - It's pretty big when you consider how many anchors there are (citations, citation needed, discussion anchors), plus context, even if you exclude the actual content.

Trevor - concerns about performance impact of pulling from multiple locations (in the delayed scheme)

Scott - that's not quite how Multi-Content Revisions works.

(discussion of MCR storage representation)

Anders - Bottleneck is usually not that too much data needs to be stored. It's more commonly an issue of performance/scaling for requesting and displaying the data

C. Scott - There's a political angle as well. In my experience, core features never get adopted for their own sake; instead we find some feature/product to attach them to, and implement them for the sake of that. (I also see the prompt/delayed distinction as eliminating a potential objection: it's easier to say "no impact on storage for unused annotations" than it is to say "a small impact" and then you need to argue over how small is sufficiently small, etc.)

Trevor - We've already seen that when we wanted image annotations, we stored a blob in the page. I agree that you relate core to a product-related initiative, but sometimes we take shortcuts instead. I think we need to be committed to an architectural move, then we can justify prioritization, but insist on doing it properly.

Straw poll! (excitement level)[edit]

C. Scott - Straw poll: Excitement level, who would use this Tomorrow or 10 years later?

    • Tomorrow - Mark (Month and a half)
    • 1 year-ish - Daniel Kinzler (but he doesn't know it yet)
    • 10 years later - Trevor, Matt
Discussion of commons grant for MCR[edit]

TheDJ - We kept these doors open to make sure we can do what we need to do. MCR is not a goal of the grant, but if it enables us to achieve the goal then it can be funded under that umbrella.

C. Scott - The grant was just for Commons, so escape hatch is to do MCR only for Commons.

Matt - But different schema on Commons is scary.

Action items:[edit]

  • Multimedia annotations as first use case to be implemented
  • Anyone starting annotations should at lease use the annotator.js JSON format, then we can port storage to a common backend later
  • Next year we'll come back and port all the remaining use cases to the framework we've built for Multimedia.