Wikimedia Technical Conference/2018/Session notes/Choosing the technologies to build our APIs

From mediawiki.org

Theme: Increasing our technical capabilities to achieve our strategy

Type: Technical Challenges

Facilitation Exercise(s): Small group discussion, see if different groups come up with the same answer.

Leader(s): Gergő Tisza

Facilitator: TheDJ

Scribe: Irene

Description: This session will delve into the action API, REST API, and any alternative solutions to answer open questions so that the API architecture can be resolved. - phab:T206074

Questions to answer during this session[edit]

Question Significance:

Why is this question important? What is blocked by it remaining unanswered?

What are the use cases and requirements (client, infrastructure and platform) for Wikimedia and MediaWiki APIs? Blocks answering the main question - the choice of technology must be guided by stakeholder needs. The answers should mostly come from other sessions (Applying the MediaWiki Platform Architecture Principles, Determining use cases and requirements for the APIs that we build, Choosing installation methods and environments for 3rd party users) and are just restated at the beginning of the session.
Take stock of the API frameworks (existing & promising alternatives, if any). What are their strengths and weaknesses? Which use cases and requirements can they effectively fulfill, and which not? What are their infrastructure needs and costs? Blocks answering the main question - to make an informed choice we must have an informed understanding of each option.
What should the future of our APIs look like? Should we standardize on a single framework? (Which one?) Or should we aim for a microservice infrastructure with full freedom of implementation? What of the legacy API modules? What of the “custom” APIs like ORES or WDQS? The main question; the answer determines how investment into API development in the next several years should look like.
Given the above, what is the impact on system architecture, caching infrastructure and storage infrastructure? The outcome of the session should be suitable to inform infrastructure planning, and the architecture discussions on day 4.

Detailed notes[edit]

Place detailed ongoing notes here. The secondary note-taker should focus on filling any [?] gaps the primary scribe misses, and writing the highlights into the structured sections above. This allows the topic-leader/facilitator to check on missing items/answers, and thus steer the discussion.

  • Intro from Gergo, PHP MW API vs. RESTBase
  • two different api frameworks, issues to decide which one to use, and we are hoping to make a decision of which of any are the future here (one, both, neither)
  • analyzing the weaknesses of each api framework depending on each use case
  • what new tech we should look for to achieve these things?
  • outcome from previous session - if the foundation commits to support [?] we should drop the shared hosting requirement for mediawiki - we should move forward based on that assumption
  • Aaron:What is the point of this session?  Decide on PHP or JS?
  • Platform evo project included recommendation for convergent evolution in APIs: REST API in MediaWiki and make that the future
  • One outcome would be to reinforce, another would be to revisit that decision and recommend something else
  • Or you could propose two API strategies with benefits to them, and try to connect them
  • Are there product managers in the room?  Yes
  • Aaron: what do you mean by “framework”?
    • RESTBase and MW Action API
  • Q from michael - is there any restriction we want to put on the type of use case scope to pinpoint our convo
  • Some things are out of scope, we should be focusing on content and [?] mediawiki things
  • Obviously having one framework has advantages over two
  • Group Discussions:
  • Group 1 (michael)
    • Question One: What are the use cases and requirements for wikimedia and mediawiki APIs?
    • Question Two: Take stock of the API frameworks
    • Question Three What should the future of our apis look like
    • Question Four Given the above, what is the impact on system architecture
  • Group 2 (Irene)
    • Feedback - the variance in the use-cases make this hard to work with since there’s not a unifying thing
    • Question One: What are the use cases and requirements for wikimedia and mediawiki APIs?
      • Low latency and high bandwidth for many of these (wiki wand, map mashups, uc mini) because its client facing
        • Disagreement from previous session about the latency requirements for map mashups
      • Special pages are the inverse
      • Siri and alexa are both problematic as they mainly depend on their own back-ends, so we are concerned about how those systems get their content vs what siri and alexa
      • Decided to focus on wiki wand and uc mini for this discussion
    • Question Two: Take stock of [current] API frameworks
      • Strengths + weaknesses of each (esp needs and costs)
      • Storage is obviously the massive operation barrier
      • “In a vacuum” framework comparison
        • Action api has the benefit of being able to talk to the primary source of data
        • On the other hand, it has very high start-up costs, so even if it could compute the answer in 2 milisecs, it needs 50 to even take a request bc it needs to boot media-wiki to get a response
        • Rest-based doesn’t have the start-up cost, but in order to resp quickly, pre-computed data is needed
        • Frankly the mashup (rest api on media wiki) seems to have the drawbacks of both
        • Could we combine them in a different way? Ideally not having overhead and having better data access
        • Is this naive? Maybe
        • You could build a resting framework inside the media wiki, colocated in the code base but not require mw core
        • If are learning from past mistakes, we have spent time learning it
        • If this is two separate “services” that are needed to start up, we only exacerbate the complexity without gaining if we combine them
        • All of the potential gains from having a rest api layer in a php or media wiki is only if that actually is part of media wiki, as opposed to collocated, which means it’s just another php service
        • Question of direct access to the database, not just routing. How do you skip using core and maintain direct access to the data?
        • Sam looks at the proposal to the rest based as an opp to make
        • Don’t think mw is ready for a restful framework, needs clear definition, and mw core isn’t ready for that abstraction; but havibng this built next to it is a strong driver to clean that up. Having rest based outside of it and talking to the api separates that factor
      • If we look at our use-cases, which is better?
        • For our uses, without pre-computed data, it doesn’t matter which you choose, both apis “lose”
        • This seems like a blocker: the operational concerns when it comes to storage seem to be more important in these use-cases than which api is used. The question of apis is exclusive from just a developer point of view.
        • Moving forward, for the thought experiment, lets assume we have the storage, amazon has just given us a bunch of server (wait but what if they would actually do this)
    • Question Three What should the future of our apis look like
      • Feedback - group doesn’t like the framing of the question; trying to approach it from a developer happiness perspective?
        • Asked gergo
      • Its not an either/or question to marko, as the api frameworks can serve both but one is better. Looking to the future it’s a question of performance; the rest api is about that. Marko is of the opinion that the question is irrelevant
      • From a dev perspective its better to have one; devs that use action api are “driven mad” in le’s experience
      • Q: wht is the reason that both exist? Was rest created only for performance needs?
      • Can’t cache the action api, cannot be used for things that are high bandwidth, which is an argument for rest-based
      • The question of scale is more important, and that has the issues of start-up times yet again. Is it feasible to co-located frameworks to have [?]
      • The action api is traditionally used for everything, but bcecause it isn’t cacheable, marko would like to keep the action api for expensive operations - updates, saving pages via api, for example.
      • Use right tools for the right job?
      • You want a user-centric consolidated api, but on the other hand if you seperate write and read paths mentally for clients, its easier on the infrastructure.
      • Should the action api be turned into a rest api? That doesn’t address the thing the client sees.
      • For the foundation, the nogs or not is a low-priority question
      • We need user-centric apis but instead we are concerned from our point of view
      • Antoine: If we switch to rest it mean they have to install the nogs
      • Question of how this fits in the
      • we don’t have the data to understand what “low-budget” means, (could be part of pingback) which means we can’t identify if nogs is a good/bad idea
    • Question Four Given the above, what is the impact on system architecture
      • Didn’t have much time to discuss this q, also
  • Group 3 (Halfak - postits)
    • Question One: What are the use cases and requirements for wikimedia and mediawiki APIs?
      • External web + apps (browser-like)
        • Content with presentation
        • Structured metadata
        • Authentication
        • All editing and admin actions
        • Granular changes
        • Versioned
        • Consistent naming and semantics
        • Batch reading (and maybe editing) of data and metadata
        • Access transformed media (resized images)
        • Streaming blob APIs (audio & video)
        • Streaming APIs (RC, live filter watchlist)
        • Push events (notifications)
      • Crawlers
        • Content and presentation
        • Change events (streaming)
        • Usage SLA
        • Protection - ratelimiting?
      • Gadgets and edit tools?
        • Same as external web + apps
        • Access to wikitext (read and write)
        • Ability to parse wikitext
    • Question Two: Take stock of the API frameworks
      • REST / Node.js:
        • +: JS popular and easy for newcomers
        • +: Easy to write and deploy / scale
        • +: No constant startup cost
        • -: Harder to integrate with MediaWiki and extensions
        • -: Lack of batching
      • REST / PHP / MediaWiki:
        • +: Easy to integrate with extensions and core
        • +: Supports high-volume reads
        • -: Controlled PHP env -> slow dev
        • -: Diversity of new devs using other tech
        • -: Batching needs figuring out
        • -: Startup cost
      • Action API:
        • +: Batching
        • +: Optimized for edits
        • -: Inconsistent design
        • -: Lack of versioning
        • -: No caching
      • GraphQL (in PHP or Node):
        • +: Batch mutation and read
        • +: Field / response narrowing
        • +: Type enforcement at boundary (for both request and response)
        • +: Can be built on top of REST or wrap other APIs or DBs
        • -: Unclear practices for versioning and caching
        • -: Pagination needs figuring out
        • -: Limiting complexity of queries needs to be done
    • Question Three What should the future of our apis look like
    • Question Four Given the above, what is the impact on system architecture
  • Group 4 (?)
    • Question One: What are the use cases and requirements for wikimedia and mediawiki APIs?
    • Question Two: Take stock of the API frameworks
    • Question Three What should the future of our apis look like
    • Question Four Given the above, what is the impact on system architecture

Small group notes (Michael):

Distilling earlier topics

  1. Mobile product use cases
    1. Ability to reorg content for responsive design
  2. Syndiction to popular web properties / social media
    1. FB and YT fact exposing, social media embeds
    2. Metadata and open graph for link sharing
      1. Q: what is open graph?  Olga: Protocol that enables e.g., sharing link with page image
    3. Ability to share specific sections, sentences
    4. Retain context when sharing (attribution, link back)
    5. Q: Do we retain the ability to ask what and how, e.g., I want this in HTML form vs JSON form
    6. Q: If I select a sentence or article I’d like to share, e.g., on Facebook, then, we need to expose the very sentence which I’ve highlighted…
      1. Olga: need the sentence, attribution, and probably the styling
    7. Raz: API is a product by defn, we are trying to tackle this discussion in the abstract without reference to product use cases and it’s artificial
    8. Conclusion: The API is itself a product, we should agree on this and publicize it
    9. Then it’s a question of user-facing products defining requirements
    10. Question of granularity has come up often.
      1. We all agree that we should have a low-level API that exposes everything first, and then if we want to build APIs on top of that we can do that, but it’s not our reason for being

RESTBase reduces costs for devs, e.g., the mobile app devs don’t have to know a lot about how internal MediaWiki concepts Let’s deemphasize the PHP and Node.js death match and assume we can support anything in either

Raz: Action API feels a bit like GraphQL to me in that I know the internals and can get anything that I want, whereas REST API is inviting to third parties, use and caching are simple by design.  I don’t think we will ever have any world where we will have one API. Olga: For both use cases we need article HTML, sentences, images, templates, other things would be nice like references, “metadata,” open graph, abstracts/summaries There’s nothing we’re doing that can’t be done in both, it’s about the interface we want to present to the consumer (n.b., this includes internal consumers).  It’s about the interface we want to present.

But for some of these (open graph, abstracts), we need additional infrastructure -- e.g., a summarizer for an article.  

How do we define cost?

  • Knowledge
  • Performance
  • Hardware and maintenance cost + scalability

REST raises the problems of continually needing to add whole new endpoints as opposed to simply modifying a query, this is a significant cost

Future: Should we standardize on a single framework?  Depends on what you mean by “framework.” And it really depends on what consumers need.  

What are we sure about?

  • That we probably won’t have one standard API  

Postit notes[edit]

(Keep in mind that these are from four groups that worked separately with no consolidation, so they might contain duplications or disagree with each other.)

Goals[edit]

  • Consistent semantics and versioning
  • All APIs, regardless of type or implementation, are versioned
  • All APIs should be internally consistent in formats + fields that are accepted and returned
  • To unify the underlying tech stack in a way that allows us to server both types of clients (REST-preferring and power user)
  • Support access control and authentication
  • Engage new engineers who learned to program in JS/Python/Java

Questions[edit]

  • The operational concerns when it comes to storage seem to be more important and/or impactful than asking which API framework is better
  • REST is not great at batching. Strategy for that?

Actions[edit]

  • Do research into seeing who is running our systems so we can better define “low-budget” and their needs, and project what “low-budget” will look in five years.

Decisions[edit]

  • Product needs and cost effectiveness should drive our choices. No holy wars about tech stacks.
  • Standardize on a representation and not an implementation.
  • RESTful is good (independent of stack)
  • It is hard to imagine a future in which we won’t have a need to produce both a REST API and something like an Action API (or a graph API or similar).