Wikimedia Product Infrastructure team/Push Notifications Infrastructure/Design Decisions

High-level components
This project will involve two software components:
 * a new Node.js service for processing notification requests, and
 * an update to the Echo extension to manage push subcriptions and support forwarding Echo notifications to the push service.

MW extension

 * Maintain a database of device push subscriptions based on provider tokens
 * Provide an API for users to register, renew, and unregister push subscriptions
 * Maintain a database table mapping global user accounts to push subscriptions
 * Provide an API for wiki users to associate their global user accounts with device push subscriptions
 * Provide an Echo notifier type to allow handling notifications with push
 * Provide a handler implementation for formatting notification requests and forwarding them to the push notification service

Push notification service

 * Provide an API for accepting notification requests
 * Forward notification requests to providers (Apple Push Notification Service, Firebase Cloud Messaging, etc.)

Draft DB schema
User:MHolloway (WMF)/Drafts/Push DB Schema

Client to call MW API for notification message details

 * Status: accepted (with caveats)
 * Decision: The Push Service provides some sort of notification type, which is the cue for the app client to invoke a known MW API query.
 * Pros:
 * No personal data transmitted
 * No need for I18N in push infrastructure and needed tracking of what locale a given notification should be sent to. A client could just use existing mechanisms to specify the locale in the MW API request (accept-language header, etc.).
 * Some clients cannot subscribe to push notifications (e.g. app on an Android device without Google Play Services). They may need to know about these queries, too, so they can poll instead.
 * Cons:
 * One more request for clients to make
 * Client needs to know about the notification type to handle it (client discards unknown types)
 * Question: Should client log notifications received with unknown types using event logging?
 * What should and shouldn't be logged will have to be addressed in the course of the privacy analysis. MHolloway (WMF) (talk) 18:29, 13 May 2020 (UTC)
 * Reasons for decision:
 * 1) Supports user privacy
 * 2) Ease of implementation — it removes message encryption concerns from the service, and will require only modest changes in the app clients from what they're already doing
 * Caveats
 * 1) The exact design of the notification type indicators is subject to change based on ongoing privacy and security analysis.
 * 2) This design involves an extra network request for clients, which is a disadvantage for mobile users on weaker connections. If this architecture turns out to exclude a significant number of users from receiving push notifications, we may have to revisit the possibility of sending full message content with E2E encryption. I am explicitly noting this tradeoff as required by the Wikimedia Engineering Architecture Principles (EQUITY/DEVICE).
 * Date: 2020-05-13

Write push specific MW code directly in Echo extension

 * Status: accepted
 * Reason: 1. There's little reason to write a new extension that's so tightly coupled with Echo. 2. It takes too long to get a new extension approved.
 * Date: 2020-05-05

Manage only user-push subscription ID mappings

 * Status: rejected
 * Decision: Manage push subscriptions primarily in the external service. Manage mappings between wiki user accounts and push subscriptions in MediaWiki.
 * Reason for proposal: Push credentials are specific to a web browser or native app installation and not to a wiki user. Handling them separately allows for cleanly supporting anons and multi-user devices.
 * Reason for rejection: To simplify the architecture and client interaction patterns given the lack of present need to support anon subscriptions.
 * Date: 2020-05-13

Manage all subscription data

 * Status: accepted
 * Decision: Have Echo manage full push subscription data in MySQL.
 * Reason: To simplify the architecture, given the lack of a product need to support anon push subscriptions.
 * Date: 2020-05-13

Manage subscription data in Echo extension

 * Decision: subscription tokens will be associated with MediaWiki sessions and for that reason, it will be managed by the Echo extension, Echo will not be responsible for the broker communication logic which will be handled by the service.
 * Reasons: Anons and multi-user devices are not scoped for v1 and might bring unnecessary complexity to the system before we decide to support it; Authentication and session management are more reliable in MediaWiki and there is no necessity to add this logic layer to the service; Validating the subscription sessions in the Echo extension before firing the notification event will avoid overhead in the JobQueue of no-op events when users are not subscribed.
 * Date: 2020-05-08

Clients query Echo for notifications content

 * Status: accepted
 * Reason: Considering that [ https://phabricator.wikimedia.org/T251436  Echo handles SUL], clients don't need to perform requests for each wiki project, this reduces complexity for clients to phone-home to get appropriate data.
 * Date: 2020-05-12

Use Airship rather than a service built in-house

 * Status: investigating
 * Reason: It might be better (and cheaper) to pay someone to do this rather than building, operating, and maintaining custom software for it in-house.
 * Date: 2020-05-13

Write a new from scratch

 * Status: accepted
 * Decision: Write from scratch but incorporate non-intuitive bits (if any) from existing projects.
 * Reason: The FOSS projects we found were stale and unmaintained. The last commit to the front-runner project was 5 years ago.

Write the push service in Node.js

 * Status: accepted
 * Alternatives: Go, Python, Java, ...
 * Reason: More experience in the Foundation running services in Node.js than the others.
 * Date: 2020-04-20

Try using TypeScript

 * Status: accepted
 * Reason: Added type safety
 * Date: 2020-04-20

Use service-runner

 * Status: accepted
 * Reason: service-runner provides health metrics reporting required by SRE + logging
 * Date: 2020-04-20

Base project on service-template-node

 * Status: accepted
 * Reason: Provides a code structure that is consistent with other Node.js services running in WMF production
 * Date: 2020-04-20

Use Gerrit for code review

 * Status: accepted
 * Reason: Deployment pipeline is not ready for Github
 * Date: 2020-04-21

Store primary subscription data but not user IDs

 * Status: rejected 2020-05-08
 * Decision: Manage push subscriptions primarily in the service. Manage mappings between wiki user accounts and push subscriptions in MediaWiki.
 * Reason for proposal: Push credentials are specific to a web browser or native app installation and not to a wiki user. Handling them separately allows for cleanly supporting anons and multi-user devices.
 * Reason for rejection: It makes the architecture more complicated than necessary given that anon support is not (yet) required.
 * Date: 2020-04-28

Basic endpoints logic separation

 * Status: accepted 2020-05-27
 * Decision: Endpoints will be separate per provider, e. g.  will trigger a message specific for a Android client through fcm.
 * Reason: Make logic separate in the service side will give us more flexibility and simplicity during the development of the codebase
 * Date: 2020-05-27

APNS client module

 * Status accepted 2020-06-02
 * Decision: Use parse-community/node-apn fork for APNS push notifications
 * Reason: Most of the node modules for APNS don't look very active. From all the options the parse-community/node-apn fork looked the most up-to-date. For the immediate future lets use that one until we have more data points on using a different lib or build one in-house.
 * Date: 2020-06-02

Keep throttling/batching logic in the service

 * Status: accepted 2020-06-03
 * Decision: Keep throttling and batching logic in the service and not in MediaWiki.
 * Reason: Keep as much of our feature-specific business logic as possible in the service and outside of MW. (Note: Separately, we can throttle the overall rate of requests being sent to the service from MediaWiki using the Job Queue.)
 * Date: 2020-06-03

No delivery guarantees for v1

 * Status: accepted 2020-06-03
 * Decision: Provide no specific message delivery SLA for v1
 * Reason: To minimize blockers for initial launch.
 * Date: 2020-06-03

Housekeeping
Some ideas on what to include in decision records. Pretty much everything here is optional for now.

Template

 * Status: [proposed | rejected | accepted | deprecated]
 * Decision:
 * Reason:
 * Date: [YYYY-MM-DD when the decision was last updated]