Requests for comment/Business Layer Architecture on budget

From mediawiki.org
Request for comment (RFC)
Business Layer Architecture on budget
Component General
Creation date
Author(s) Yurik
Document status declined
See Phabricator.

Overview[edit]

I would like to propose a simple and inexpensive way to migrate to Business Layer Architecture that would greatly simplify our codebase and allow very rapid UI improvements. It involves a relatively easy refactoring of the current API subsystem, followed by a gradual "as needed" migration of the rest of the code.

Architecture goals[edit]

This proposal aims for the following general architecture goals:

  • Reduce the amount of code that does direct SQL queries and localize such code in a few well-defined components, with the aim to localize deep knowledge about DB structure, centralize permissions/security checks and data sanitization, and allow for batching DB operations. This will also allow to improve test coverage and reduce coupling between components.
  • Equalize the capabilities of PHP code and JS code with regard to data and business logic access, so the same capabilities would be available to both. This will allow more agile UI development on JS side and also will keep the API layer "honest" with regard to which capabilities are necessary for implementing necessary user-facing functions.
  • Promote code reuse and thus increase test coverage and stability, by preferring existing tested and covered code to new and potentially buggy/under-covered code.

Problem[edit]

Our API already implements most of the business logic functionality. It does optimized SQL, permission validation, batch processing, query continuation, localized error messages. From the functionality perspective, API classes do everything that BL logic needs to do. The usefulness of the API even outweighs the difficulty of using it internally, and we have plenty of ApiMain calls in PHP.

But, API tightly couples with the output formatting, and it mostly operates with strings, not objects. While 90% of the API module code is BL functionality, the remaining 10% is formatting.

Proposal[edit]

My idea is to decouple formatting from the API, and let API modules work with objects such as Title, TitleValue, User, etc, instead of strings, for both input and output. The thin wrapper on top of the API would de/serialize objects as needed, and expose it to JavaScript and bots. Internally, the code should also access database via API modules, except that they would use objects, and have better interface than FauxRequest.

Data Flow[edit]

Internal Call
Request: POPO objects (e.g. TitleValue) -> Validate params -> Execute needed modules
Result: DB -> POPO objects (e.g. TitleValue) -> caller
External Call
Request: URL Query -> Validate params and convert to POPO objects -> Execute needed modules
Result: DB -> POPO objects (e.g. TitleValue) -> serialized as JSON -> caller

Concerns[edit]

Performance
While obviously a concern, the actual API usage does not add significant performance cost. Passing objects as parameters would avoid any marshalling costs. Additional parameter validation should have minimal impact and would ensure consistent security model.
Auto-complete / Static Analysis
Having a generic `$result = wfApi($params)` is clearly not as dev friendly as `$pages=getListOfPages(prefix="abc", limit=10)`. On the other hand, PHP does not support non-ordered named parameters, the way python & C# do. So unless we create a full blown parameter object - `ListOfPagesParam(<required parameters>)`, which would contain all optional parameters as named/typed functions, we won't be able to provide all the API functionality in a nice autocomplet-y way.
We could partially remedy it for the input parameters by providing special per-module parameter wrapper functions, e.g. Api::AllPages(30), and Api::AllPagesByPrefix('blah',30), etc, but this might lead us into overly complex system.
Results are both harder and easier. On one hand, we will return a generic stdClass as an overall result, so no autocomplete there. On the other, inside that object we will return POPO data objects like TileValue, which is clearly better than a bunch of strings.

Special Permissions (TBD)[edit]

When called internally, we might consider allowing a bit more leniency.

  • Do not enforce any limits - caller should supply limits when needed, domain specific
  • Do not enforce maximum response size - the api call is subject to the same memory constraint as the caller, many strings will be interned, and objects memory usage is not trivial to calculate.

Coding Ideas (Brainstorming)[edit]

Simple approach - works for any API with very little work

// wfApi( $action, $params ) - returns stdClass
$res = wfApi( 'query', array(
    'titles' => 'Main Page', // could also be Title, Title[], TitleValue, TitleValue[], string[]
    'props' => array( 'links' ),
) );
// Result structure:
$res->pages   // array of stdClass
      pages[0]->pageid // int
              ->title  // TitleValue
              ->links  // array of TitleValues

Special "query" API

wfQuery( array(
    'titles' => 'Main Page', // could also be Title, Title[], TitleValue, TitleValue[], string[]
    'links' => array( 'limit' => 30 ), // Query api is smart enough to figure that links is a prop
                                       // 'limit' does not need the module prefix
                                       // for generator we could use 'generator-links' as a magic key
) );

Per-module helpers - requires special objects or functions for each module

Query::run(
    // pageset for the query to work on
    // for generator, use another function,
    // e.g. Query::Generator( Query::AllPages() )
    Query::Pages( 'Main Page' ),
    // Each function returns an ['module' => [params]], so they can be added
    Query::Links() + Query::Categories()
);

See also[edit]