RESTBase/Alternative architectural options considered

During the RFC discussion in early November 2014 several participants expressed a desire to split RESTBase into a front & backend part. This page is meant to collaboratively explore some options and their respective advantages / disadvantages. We[who?] discussed this in a hangout meeting on Thursday, November 20, 2014.

The outcome of this discussion was to keep RESTBase's architecture as it is for now, and consider optimizing the path to non-storage API entry points later. In the first public deploy, RESTBase will only serve obviously public content (public wiki, not deleted) and proxy to api.php/Parsoid if the content is private.

Current request flow schema: https://commons.wikimedia.org/wiki/File:Restbase.svg

1) Split at the table storage layer, and allow direct access to table storage.
Advantages:
 * Low implementation complexity. This is essentially at the interface between restbase and backends like restbase-cassandra. The interface is already implemented as HTTP requests, so doing this is just a matter of sending those requests over the network.

Disadvantages:
 * For common requests like 'html for revision X', clients will have to manually perform multiple requests against different tables. Since clients need to know the schema of underlying tables, it becomes harder to evolve the implementation of high-level functionality like 'html for revision X'.
 * Update consistency across tables needs to be maintained by each client separately. There is no way to enforce this globally, so we risk inconsistent data.
 * Security: Enforcement of obligatory processing like sanitization or spam filtering is difficult at the lowest level, especially when combined with consistency requirements. If enforced with signatures as discussed in the SOA auth RFC, each client would need to manually implement calls to each of those other services before sending raw requests to the storage service (complexity, code duplication).
 * Consistency: Difficult to reliably trigger / schedule dependent updates without a central registration point.

== 2) Move all per-bucket / table request orchestration involving other services like Parsoid, spam filtering, HTML sanitization etc from the storage layer to a separate API service. Allow direct access to the storage layer. ==

Advantages:
 * Security (please elaborate!)
 * Slightly reduced load on the storage service by moving orchestration to a separate service

Disadvantages:
 * Security: Harder to evolve obligatory processing on writes when this is implemented independently by different clients. Would need to change the processing in all clients before requiring it at the storage layer.
 * Security: Basically impossible to generally enforce obligatory processing on reads, as this would be fully in the hands of clients.
 * More complex request flows:
 * Need to implement fall-back handling, on-save validation etc in other services; clients will need to know which service to connect to for which task.
 * Harder to understand, systematically monitor and debug recursive communication patterns ('where did it go wrong?'). No central place for consistent retry handling.
 * Consistency: Difficult to reliably trigger / schedule dependent updates without a central registration point.
 * Difficult to synchronize bucket / table state from backend with API for routing and documentation generation. Extra implementation complexity.
 * Minor, performance: Extra network hop in common read path to storage.

== 3) Move all bucket-independent request routing to an API layer, but keep storage-related pre/post processing coordination in the storage layer. Allow clients to bypass the API layer & interact directly with the storage layer. ==

Advantages:

Disadvantages:
 * Inconsistency between public & internal interfaces
 * Loss of some general request routing flexibility in backend as API can be bypassed
 * Difficult (but not impossible) to generate a consistent API documentation for the effective public API

4) Move all bucket-independent (non-storage) request routing to an API layer. All clients requests use the public API, direct access to the storage layer is disallowed.
Advantages:
 * Provides a consistent and high-level API for internal & external clients.
 * Relatively straightforward iterative request flow with consistent monitoring and logging; declarative request flow configuration
 * Reduces code duplication and client complexity; reduces need for service discovery
 * Can evolve backend implementation behind high-level interface
 * Consistency: Fairly straightforward to reliably trigger / schedule dependent updates with a central registration point.
 * Security: Fairly straightforward to consistently enforce and evolve pre-save and on-read processing (spam filtering, sanitization etc).

Disadvantages:
 * Potential for divergence between internal & external API
 * Extra implementation complexity through need of tracking bucket / table metadata in multiple services
 * Higher latency through extra network hop and need for serialization in common storage read path
 * Difficult (but not impossible) to generate a consistent API documentation for the effective public API

5) Handle routing and service request orchestration in RESTBase. Optimize bucket-independent (non-storage) API request routing for common requests by also implementing it at the Varnish layer.
Advantages:
 * Low implementation complexity
 * Performance: Cut out RESTBase for non-storage requests (see https://github.com/wikimedia/restbase/blob/master/doc/UseCases.md#use-cases-for-pure-services-without-storage-in-restbase)
 * Provides a consistent and high-level API for internal & external clients.
 * Relatively straightforward iterative request flow with consistent monitoring and logging; declarative request flow configuration
 * Reduces code duplication and client complexity; reduces need for service discovery
 * Can evolve backend implementation behind high-level interface
 * Easier to generate a consistent API documentation by maintaining the same entry points at the RESTBase layer
 * Consistency: Fairly straightforward to reliably trigger / schedule dependent updates with a central registration point.

Disadvantages:
 * Potential for divergence between Varnish routing & RESTBase if not maintained well