Wikimedia Product/Wikimedia Product Infrastructure team/Action API request analytics

Action API request analytics will be reports and/or dashboards to track usage of the MediaWiki Action API for Wikimedia production websites. This tracking is intended to be similar to the Pageviews tracking that is currently done by the Analytics team for articles in the main namespace.

Data acquisition
Raw Action API requests will be tracked using MediaWiki structured logging, Kafka and Hive.


 * 1) Log events will be emitted by MediaWiki for each Action API request using a structured logging context that contains the data needed to populate the Hive tables.
 * 2) Monolog will be configured to route these log events to a Kafka topic.
 * 3) (Camus?) will process events from the Kafka topic and load them into a raw data table in Hive.
 * 4) (?Something?) will summarize the raw data table into various aggregate tables designed for specific reporting needs via ETL processing.
 * 5) (?Something?) will discard the raw request data after processing to reduce the risk of leaking sensitive data due to a network break or malicious actor.

Example reports
Number of user agents coming from Labs or third party services, on a monthly basis + all time (DevRel, to check whether our APIs are increasing adoption)

Volume of API requests coming from Labs or third party services, on a monthly basis (DevRel, to check the trend of usage of our APIs)

Ranking of user agents coming from Labs or third party services with a highest activity, on a monthly basis + all time (DevRel, to help identifying the services making intensive use of our APIs)

Ranking of most requested actions/parameters, on a monthly basis + all time (DevRel, to help identifying usage of our APIs and check against our documentation, APIs we should promote...)