Topic on Talk:Machine Learning

Model Inventory and Reporting

3 comments • 21:13, 9 March 2021 3 years ago

3

ACraze (WMF) (talkcontribs)

We are currently in the early stages of discussing AI/ML model governance on the Machine Learning team. An important concept in this space is the idea of model reporting, or providing a single point of reference for discovery of all ML models in production. Previously, there was the ores-support-checklist, however, this required manual maintenance and was often out of date. We need an automated solution that provides both a singular view of all models in production (model registry), as well as a view containing detailed (and up-to-date) information about each model.

Our first step in addressing this issue was to take a current inventory of all models and gather data about what they do, the target language, and when they were last trained/tested/deployed. We produced a public CSV file with this data, which is available in this ticket: https://phabricator.wikimedia.org/T275709

Going forwards, we plan to experiment with the idea of using wiki pages as living documentation for each of our models. There is some prior art in this area, one interesting approach we are looking at is called model cards (Mitchell et. al 2018). We have a ticket open related to exploring these ideas a bit further: https://phabricator.wikimedia.org/T276398

Reply 22:15, 3 March 2021 3 years ago

73.158.253.145 (talkcontribs)

How are you motivating your teams to keep these model cards or any related metadata up to date?!!

Reply 18:36, 9 March 2021 3 years ago

CAlbon (WMF) (talkcontribs)

We think it is going to be a mix of manual and automated. Imagine a Wikipedia (technicaly Mediawiki.org) page for each model in production. Some of the page is populated with manually written text, other parts are live updated from the latest training of the model (AUC curves etc.)

Reply 21:13, 9 March 2021 3 years ago

Reply to "Model Inventory and Reporting"