Jump to content

Talk:Machine Learning/2021

Add topic
From mediawiki.org
Latest comment: 4 years ago by CAlbon (WMF) in topic Weekly Team Update 2021-12-15

A liveblog and forum about machine learning at Wikimedia. Create new topics to ask questions or post updates.

Weekly Team Update 2021-02-23

[edit]
  • Steady progress on the Lift Wing MVP. Currently the servers are installed and the basic server configuration is complete. Now the team has moved on to setting up the servers as a Kubernetes cluster. Once this is complete the final step is setting up KFServing Standalone on the cluster.
  • Work continues on preparing an initial set of ORES models for migration to Lift Wing.
  • The Machine Learning team is revamping our public comms. This means a new public team page, new forum/update blog (in the form of a Mediawiki.org talk page), new weekly office hours, and a few more things. The goal is to reduce the barriers to the public understanding what we are working on and how they can effectively plug in. CAlbon (WMF) (talk) 20:58, 1 March 2021 (UTC)Reply

Weekly Team Update 2021-03-01

[edit]
  • The team continue to push to the Lift Wing MVP up. Here is the primary phab ticket. We've made progress on the Kubernetes configuration, however we are making sure to coordinate with the SRE team so we don't go too far off on our own.
  • We are still trying to source 4 GPUs for the eqiad Lift Wing servers. Our first attempt had sizing issues in the server chassis. We think we one that will work (Radeon RX 5600XT) but we need to receive it and test it first.
  • Work on revamping our comm's strategy is moving forward. The bottleneck is Chris' schedule.
  • The team has started work on guidelines for AI model governance. The goal is to have a set of ethical and technical standards for the deployment of a machine learning model by WMF. Over the next few weeks we will be speaking with members of the community, Trust & Safety, Security, Research, and other stakeholders for input. Based on those conversations we will create a draft set of guidelines for further discussion.
  • CAlbon (WMF) (talk) 21:26, 1 March 2021 (UTC)Reply

Feedback On This Talk Page Wanted!

[edit]

The Machine Learning team is trying something new and used this talk page as a forum / blog for interaction with the community. We'd love to hear people's feedback on this format, especially in comparison to say a mailing list or even IRC. CAlbon (WMF) (talk) 19:55, 2 March 2021 (UTC)Reply

seems fine 73.216.203.65 (talk) 19:45, 9 March 2021 (UTC)Reply
Nice 98.128.229.81 (talk) 19:50, 9 March 2021 (UTC)Reply
Awesome! Thanks for the help! CAlbon (WMF) (talk) 21:12, 9 March 2021 (UTC)Reply

Model Inventory and Reporting

[edit]

We are currently in the early stages of discussing AI/ML model governance on the Machine Learning team. An important concept in this space is the idea of model reporting, or providing a single point of reference for discovery of all ML models in production. Previously, there was the ores-support-checklist, however, this required manual maintenance and was often out of date. We need an automated solution that provides both a singular view of all models in production (model registry), as well as a view containing detailed (and up-to-date) information about each model.


Our first step in addressing this issue was to take a current inventory of all models and gather data about what they do, the target language, and when they were last trained/tested/deployed. We produced a public CSV file with this data, which is available in this ticket: https://phabricator.wikimedia.org/T275709


Going forwards, we plan to experiment with the idea of using wiki pages as living documentation for each of our models. There is some prior art in this area, one interesting approach we are looking at is called model cards (Mitchell et. al 2018). We have a ticket open related to exploring these ideas a bit further: https://phabricator.wikimedia.org/T276398 ACraze (WMF) (talk) 22:15, 3 March 2021 (UTC)Reply

How are you motivating your teams to keep these model cards or any related metadata up to date?!! 73.158.253.145 (talk) 18:36, 9 March 2021 (UTC)Reply
We think it is going to be a mix of manual and automated. Imagine a Wikipedia (technicaly Mediawiki.org) page for each model in production. Some of the page is populated with manually written text, other parts are live updated from the latest training of the model (AUC curves etc.) CAlbon (WMF) (talk) 21:13, 9 March 2021 (UTC)Reply

Experimenting With Livestreamed Public Office Hours

[edit]

One of the things we've been working on is lower the bars to interacting with the Machine Learning team. As part of that we've been experimenting with weekly live office hours. During an impromptu 30 minute test stream around 30 folks showed up and discussed our infrastructure. In the future we will post times and have a regular public cadence.


There is a question around if Twitch or Youtube is a better platform, let us know what you think below. CAlbon (WMF) (talk) 22:40, 4 March 2021 (UTC)Reply

I would prefer to see this content be streamed and also available on YouTube because it's a site that our firewall allows us access to versus Twitch. 98.201.93.236 (talk) 18:09, 9 March 2021 (UTC)Reply
Awesome thanks. Yeah I think YouTube is probably the best place, especially because Twitch deletes the videos after 14 days. CAlbon (WMF) (talk) 21:14, 9 March 2021 (UTC)Reply
I think Youtube works well for now. If we find other (or multiple) places that the community prefers, we can multi-stream using something like OBS Studio. ACraze (WMF) (talk) 23:25, 9 March 2021 (UTC)Reply

Weekly Team Update 2021-03-09

[edit]
  • Kubernetes control layer is complete, now the team is setting up the worker nodes. The goal is for the worker nodes on eqiad as soon as possible. The deployment and configuration of the stack up to and including the worker nodes is now automated. The final layer of the stack, KFServing Standalone will soon be ready to be installed. This final layer will most likely be installed initially as a proof of concept, torn down, and then reinstalled through an automated process via Puppet, meaning the entire stack can be deployed from bare metal automatically.
  • The team is working on initial discussions around AI model governance. The goal is to eventually put in place a process for evaluating candidate models for deployment from inside the team, from another Foundation team, and the community. CAlbon (WMF) (talk) 16:12, 9 March 2021 (UTC)Reply

Machine Learning Team 18 Month Roadmap

[edit]

Here is the slide deck for the Machine Learning team's 18 roadmap. I've made a few changes to allow for greater accessibility. We would love to hear your thoughts and questions.


https://drive.google.com/file/d/191_H0OEO2s3_HQuJMfwz40OjnAr37_Bh/view?usp=sharing CAlbon (WMF) (talk) 21:09, 9 March 2021 (UTC)Reply

Followup on suggestion for Recent Changes log

[edit]

Hi everyone. A little over a year ago I shared an idea for giving Recent Changes patrollers the option to see changes that had probably been submitted at times when very few reverts took place: https://en.wikipedia.org/wiki/Wikipedia_talk:STiki/Archive_27#Feature_request_by_Clayoquot Are there any updates on this? I'm wondering if it would help if I try to move it forward through community feature request channels, such as by starting a discussion on the English Wikipedia Village Pump. Clayoquot (talk) 17:41, 12 March 2021 (UTC)Reply

Hey Clayoquot! Unfortunately I don't have any updates for you! At least not that I know about. I would recommend Village Pump and I'll keep track of it from my end! Sorry I couldn't help more! CAlbon (WMF) (talk) 20:49, 12 March 2021 (UTC)Reply
Hi. No worries, thanks for getting back to me so quickly. I've put a proposal here: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical) . Looking forward to the discussion. Clayoquot (talk) 22:34, 12 March 2021 (UTC)Reply
Awesome thanks! CAlbon (WMF) (talk) 15:17, 16 March 2021 (UTC)Reply

Weekly Team Update 2021-03-16

[edit]
  • Lift Wing work continues. Currently the worker nodes are setup and we are working on a "Hello World" using the Lift Wing cluster. This continues our steady progress moving forward.
  • We're starting conversations on AI model governance both within the foundation and the community.
  • We are exploring how best to migrate the functionality of the models on ORES into Lift Wing, whether through a direct migration or a retraining using the original training data. CAlbon (WMF) (talk) 15:17, 16 March 2021 (UTC)Reply

Weekly Team Update 2021-03-22

[edit]
  • Kubernetes cluster work continues. The team is working on establishing the istio and knative services as the next levels in the stack. These are critical components for KFServing. This process is slower for the rest of March due to staffing events.
  • Google Cloud Compute credits! As part of our migration to Kubeflow, the Machine Learning team has been approved for some Google Cloud Compute credits for running a development Kubeflow instance for learning and experimentation by ourselves and the community. This will unblock some folks while Kubeflow is being deployed. There is no plan in using Google Cloud Compute in production.
  • Annual planning is now in full swing. The team is discussing priorities and narrowing down on a set we think are ambitious but accomplishable in the next financial year. More on that soon. CAlbon (WMF) (talk) 19:05, 22 March 2021 (UTC)Reply

Updated Team Homepage

[edit]

The updated team homepage is now live. This is just a start to a much larger expanding of the public information about our infrastructure, projects, and models. The goal of the effort is to lower the bars to collaboration and increase the transparency of the team and its work. CAlbon (WMF) (talk) 21:13, 22 March 2021 (UTC)Reply

Weekly Team Update 2021-04-06

[edit]
  • We've achieved "Hello World" from the Lift Wing cluster. This means we are in the final push to get that cluster up and running. Kudos to the ML team's SREs and advice from volunteers.
  • We've started to formalize our AI model governance work, putting together a plan for the coming year.
  • We are applying for Google Summer of Code for a fellow to help retrain some models current on ORES to Lift Wing. CAlbon (WMF) (talk) 18:50, 7 April 2021 (UTC)Reply

Weekly Team Update 2021-04-13

[edit]
  • The Lift Wing server cluster continues to be configured. Hopefully more good news on that soon.
  • The team is starting to formalize a process for moving forward on the AI model governance process.
  • Our Google Summer of Code candidates are beginning to submit applications for review. The GSoC fellows will work on retraining ORES models on Lift Wing. CAlbon (WMF) (talk) 13:55, 13 April 2021 (UTC)Reply

Weekly Team Update 2021-04-19

[edit]
  • Lift Wing cluster configuration continues. Currently working on the storage system (Swift) and Istio/Knative.
  • Exploring replacing NLP library NLTK with alternative as part of the retraining of ORES model system.
  • Conversations continue around AI model governance as the start of a process for building a strategy.
  • Team's second live ML office hours will occur this week on Twitch. CAlbon (WMF) (talk) 19:53, 19 April 2021 (UTC)Reply
Did you already announce the exact time/date for the office hour anywhere? MisterSynergy (talk) 19:56, 19 April 2021 (UTC)Reply
MisterSynergy I had to delay at the last minute due to technical difficulties. But new time is Wednesday 18:00 GMT on https://www.twitch.tv/WikimediaML CAlbon (WMF) (talk) 15:47, 23 April 2021 (UTC)Reply

Weekly Team Update 2021-04-26

[edit]

Weekly Livestreamed Public Office Hours!

[edit]

We're hosting a weekly livestreamed office hours about machine learning at the Wikimedia Foundation on Twitch. Wikimedia is the only global scale transparent stack in the world. Come hang out and ask me anything.


https://twitch.tv/wikimediaml

Wednesday April 28th

11:00 PST

18:00 GMT CAlbon (WMF) (talk) 16:55, 26 April 2021 (UTC)Reply

I was able to listen to the first ~30 minutes before real life had something else for me to do. You mentioned that you might be uploading this session to Youtube or something similar. Is this the case? If so, on which channel?
Apart from that, nice format! If you plan further sessions, I think Twitch is the best place for streaming due to its UI that was made for exactly this purpose. MisterSynergy (talk) 20:05, 28 April 2021 (UTC)Reply
Thanks for coming! I needed to change a setting to record the video, so unfortunately we don't have one for that office hours, but I've fixed it now. I am planning on uploading it to YouTube but haven't made the channel yet. CAlbon (WMF) (talk) 15:23, 4 May 2021 (UTC)Reply

Weekly Team Update 2021-05-04

[edit]
  • We are working to get Data Label Studio running on Cloud VPS, this is part of our experiment to see if we can build a generic data labeling platform.
  • Annual planning continues. More on that soon when it is finalized.
  • New ORES deploy is being scheduled.
  • New livestreamed public office hours this Wednesday at 11am PST on Twitch: https://www.twitch.tv/WikimediaML. The office hours will be uploaded to YouTube. We'll be focused on model cards. CAlbon (WMF) (talk) 15:17, 4 May 2021 (UTC)Reply

ML Office Hours #3 Video

[edit]

During this week's machine learning office hours we discussed model cards. The video is posted on YouTube.


https://www.youtube.com/watch?v=fNK7yu7LLGw CAlbon (WMF) (talk) 14:59, 6 May 2021 (UTC)Reply

Do you like this talk page?

[edit]

We have been experimenting with using this page as a live blog / forum for the team for two months. Hopefully folks are finding it useful. If you are (or aren't) we'd love to hear about it. CAlbon (WMF) (talk) 15:47, 10 May 2021 (UTC)Reply

+1 I'm watching the page and skimming the weekly updates KHarlan (WMF) (talk) 15:57, 10 May 2021 (UTC)Reply
+1 Yes it's useful. There should be a possibility to get in touch with you/your team on a Wikimedia wiki anyways. Everything else is already too much of a barrier for many. MisterSynergy (talk) 18:55, 10 May 2021 (UTC)Reply
Awesome thanks for the feedback! CAlbon (WMF) (talk) 20:35, 10 May 2021 (UTC)Reply

No Office Hours This Week

[edit]

Sorry all, no live-streamed machine learning office hours today. Unexpected thing I need to handle. However, we’ve had over 100 people attend each week so we will definitely be continuing next Wednesday CAlbon (WMF) (talk) 16:33, 12 May 2021 (UTC)Reply

GitHub Approvals For ORES Commits

[edit]

Going forward commits around ORES will require approval by a member of the Machine Learning team before being merged. There is more discussion in the Phabricator ticket, but the big goal is to make sure the team is aware to any changes in the ORES eco-system.


We love and encourage contributions from the community and other teams, and hope this won't add a major barrier for folks. CAlbon (WMF) (talk) 19:15, 14 May 2021 (UTC)Reply

Archiving JADE

[edit]

JADE was a MediaWiki extension for data labeling created by the Wikimedia Foundation. The goal was to create a tool for communities and Wikimedians to gather training data and allow communities to provide feedback on machine learning models hosted by the foundation. JADE was never put into production but was hosted on our beta servers. Sadly, last week we started the process of archiving JADE.


We believe more than ever in the purpose of JADE. Gathering training and evaluation data from communities is one of the best tools we have for creating models that reflect those communities. After conducting a 360 technical review of JADE the team came to the decision that the extension as designed was not going to be able to accomplish what it set out to do. We need to go back to the drawing board, bringing in broader set of folks to redesign our approach. We are fully committed to building a new version of JADE in near future.


This was a hard decision for us as a team, because we know how much time volunteers have spent on the project. We hope to reuse as much of that work as possible in the next version. CAlbon (WMF) (talk) 16:04, 17 May 2021 (UTC)Reply

Is there documentation of the 360 technical review? EpochFail (talk) 22:41, 17 May 2021 (UTC)Reply
@EpochFail Unfortunately not. It was a series of meetings. CAlbon (WMF) (talk) 15:13, 18 May 2021 (UTC)Reply
Bummer. EpochFail (talk) 15:42, 18 May 2021 (UTC)Reply
Yeah, I'd do it differently now. I was still a WMF newb. CAlbon (WMF) (talk) 14:47, 19 May 2021 (UTC)Reply

Weekly Team Update 2021-05-17

[edit]
  • Continuing to work on the configuration of Kubernetes and Kubeflow for the Lift Wing machine learning model serving cluster. Progress is steady, but there is a significant learning curve to both techniques.
  • Another ORES model for Turkish Wikipedia has been deployed.
  • A new livestreamed, public weekly machine learning office hours this Wednesday at 11am PST on Twitch at https://twitch.tv/wikimediaML
  • More and more we believe we will be able to migrate ORES models onto Lift Wing in their entirety rather than having to retrain them. This is great news, but there are still many unknown unknowns. CAlbon (WMF) (talk) 15:10, 18 May 2021 (UTC)Reply

Weekly Livestreamed Public Office Hours!

[edit]

We're hosting a weekly livestreamed office hours about machine learning at the Wikimedia Foundation on Twitch. Wikimedia is the only global scale transparent stack in the world.

This week we are going to talk about data labeling!

https://twitch.tv/wikimediaml

Wednesday April 19th

11:00 PST

18:00 GMT CAlbon (WMF) (talk) 15:23, 18 May 2021 (UTC)Reply

Moving IRC To Libera Chat

[edit]

Wikimedia is migrating off Freenode and to Libera Chat. You can view the discussion around the decision here


Instructions for migrating to Libera Chat CAlbon (WMF) (talk) 17:31, 21 May 2021 (UTC)Reply

Weekly Team Update 2021-05-25

[edit]
  • Lift Wing is steadily moving towards MVP!
  • Over 100 people have attended our weekly livestreamed office hours for the third time in a row.
  • The Machine Learning team is moving IRC platforms to Libera.Chat CAlbon (WMF) (talk) 14:39, 25 May 2021 (UTC)Reply

Livestreamed Office Hours Switching To Bi-Weekly

[edit]

In the last month we've started a weekly office hours livestreamed on Twitch and recorded on Youtube. This format has been great. I have learned a lot from the community and hopefully the community has more transparency into the team's work. Every single office hours has had over 100 attendees.


That said, starting this week we'll be moving to a bi-weekly format. One of the lessons learned by me over this month is that I want to keep discussions fresh and interesting. I am a little worried a weekly schedule will mean we are putting on livestreams just to keep the schedule, rather than having an actually interesting issue we want to puzzle out publicly.


I'd love to hear people's thoughts. CAlbon (WMF) (talk) 16:24, 26 May 2021 (UTC)Reply

Weekly Team Update 2021-06-08

[edit]
  • A Google Summer Of Code fellow has started! We are excited to welcome them. They are taking the summer to explore the models on the ORES infrastructure and taking a look at them with fresh eyes.
  • Lift Wing continues to move forward. I don't have major updates, but if you want to learn about the details come chat with us on Libera.Chat #wikimedia-ml
  • More work is being one looking at how to migrate models from the ORES infrastructure to the new Lift Wing infrastructure. CAlbon (WMF) (talk) 15:11, 8 June 2021 (UTC)Reply

Weekly Team Update 2021-06-21

[edit]
  • The team's Site Reliability Engineers are working on getting the Istio layer of Lift Wing. This is the penultimate layer in the technical stack.
  • Work continues on the migration of the existing models from ORES to Lift Wing. CAlbon (WMF) (talk) 19:14, 21 June 2021 (UTC)Reply

Livestreamed Public Office Hours!

[edit]

We're hosting our livestreamed office hours about machine learning at the Wikimedia Foundation on Twitch.

This week we are going to be reading some journal articles!

https://twitch.tv/wikimediaml

Wednesday June 23rd

11:00am PST CAlbon (WMF) (talk) 15:19, 23 June 2021 (UTC)Reply

Weekly Team Update 2021-15-21

[edit]

My apologies for posting this update later in the week (last week was a holiday).


* We are getting so so close to an MVP for Lift Wing. The team's SREs combined with the Service Ops team is working through the final layers of the stack.

* Work continues on working out a migration plan for moving Revscoring models over to Lift Wing.

* Work continues on creating model cards for each model hosted by the team. CAlbon (WMF) (talk) 18:33, 15 July 2021 (UTC)Reply

Weekly Team Update 2021-07-20

[edit]

Weekly Team Update 2021-07-26

[edit]
  • We are getting pretty close to being able to migrate ORES models into Lift Wing, which is going to be so important moving forward because it will mean all the existing models will work seamlessly on the new infrastructure.
  • We are getting very very close to a "hello world" for Lift Wing after a long effort getting it up and running.
  • We have a new topic model and a few other brand new models to deploy to help out some other teams.
  • Our Google Summer of Code fellow, is working on trying to figure out it would be possible to use language agnostic models where good language specific models are unavailable.
  • We are starting to prototype model cards for every model we host detailing everything from proper and improper uses, training datasets, model quality metrics etc. as a first step towards enabling communities to govern the models that affect them. CAlbon (WMF) (talk) 18:34, 26 July 2021 (UTC)Reply

No Office Hours Livestream This Week

[edit]

Unfortunately an issue means I cannot host a livestream this week. My apologies to everyone, but we will be back next time, hopefully with a full model card example to demo. CAlbon (WMF) (talk) 17:55, 3 August 2021 (UTC)Reply

Weekly Team Update 2021-08-03

[edit]
  • Lift Wing progress continues. Still not launched yet, but we are hopeful.
  • We are continue to prototype ideas around model cards. We will have something to show folks soon and get everyone's feedback. CAlbon (WMF) (talk) 17:57, 3 August 2021 (UTC)Reply

Weekly Team Update 2021-08-26

[edit]
  • We have Dockerized most of the ORES models for migration to Lift Wing.
  • We are working on connecting Lift Wing to the API Gateway, specifically deciding if we should do our own rate limited on our system and how we should do routing (i.e. if you swap out versions of a model, which side of the infrastructure (lift wing or the API gateway) is changed)
  • We are pushing forward on the Model Cards, next step is to make a single model card programmatically and put it in a repo on Gitlab. https://gitlab.wikimedia.org/htriedman/algo-accountability
  • Need to talk to procurement about GPUs, most likely we will need to install the new servers, then at a later date pull them out and put the GPUs in. CAlbon (WMF) (talk) 18:13, 26 August 2021 (UTC)Reply

Weekly Team Update 2021-09-08

[edit]

- We continue to work on the Lift Wing infrastructure. It is close, but still more work to do.

- We continue to work on how to connect Lift Wing to the API Gateway so that folks can access the models when Lift Wing is live.

- We continue to push forward on model cards. CAlbon (WMF) (talk) 15:35, 8 September 2021 (UTC)Reply

No Office Hours Livestream This Week

[edit]

Sorry everyone, the livestream is canceled this week. But we will be back next time! CAlbon (WMF) (talk) 14:50, 15 September 2021 (UTC)Reply

Weekly Team Update 2021-09-15

[edit]

- Work continues on Lift Wing, models, and the API gateway.

- Work continues on model cards. Still a long way to go but we are confident of its value.

- We will be posting a new MLE job soon! CAlbon (WMF) (talk) 14:53, 15 September 2021 (UTC)Reply

Weekly Team Update 2021-09-30

[edit]

- Lift Wing work continues. Although slower this week due to PTO.

- The servers for Train Wing have been ordered, currently being shipped to the two data centers.

- We are working on how to connect Lift Wing to the API Gateway

- We are working on how to programmatically generate model cards.

- We deployed "add a link" models for 9 more languages.

- Livestreamed Live office hours will restart next week instead of this week. My apologies for the delay. CAlbon (WMF) (talk) 16:25, 30 September 2021 (UTC)Reply

Weekly Team Update 2021-10-06

[edit]

- Data Science And Engineering Hackathon this week, so no additional work other than some fun side projects. 135.180.39.39 (talk) 16:34, 6 October 2021 (UTC)Reply

Weekly Team Update 2021-10-13

[edit]

This week Lift Wing started serving model predictions, a big moment for us as a team. This proves out a lot of the infrastructure decisions we've made. There is still more work to do, but it is good to see the service alive and kicking.

$ time curl "https://inference.svc.eqiad.wmnet:30443/v1/models/enwiki-goodfaith:predict" -X POST -d @input.json -i -H "Host: enwiki-goodfaith.revscoring-editquality.wikimedia.org" --http1.1
HTTP/1.1 200 OK
content-length: 112
content-type: application/json; charset=UTF-8
date: Wed, 13 Oct 2021 15:00:58 GMT
server: istio-envoy
x-envoy-upstream-service-time: 302{"predictions": {"prediction": true, "probability": {"false":  0.06715093098078351, "true": 0.9328490690192165}}}
real	0m0.332s
user	0m0.017s
sys	0m0.008s CAlbon (WMF) (talk) 16:48, 13 October 2021 (UTC)Reply
Congratulations! 🙌 Michael Große (WMDE) (talk) 15:42, 14 October 2021 (UTC)Reply

Office Hours on October 14th

[edit]

This week we will have our recurring livestreamed office hours on Twitch: https://www.twitch.tv/WikimediaML

This week we will discuss what the team is working on! Come hang out for an hour! CAlbon (WMF) (talk) 16:53, 13 October 2021 (UTC)Reply

Weekly Team Update 2021-10-20

[edit]
  • This week we continue to start test deployments on Lift Wing. As part of this, we are turning off two AWS KServe instances we were running so that our engineers could work on models in parallel with the infrastructure being built. In the coming weeks we are going to start doing "test flights" with Lift Wing, such as connecting it to the event streaming platform, hooking up analytics, etc. The goal is to reach feature parity with the ORES infrastructure so the community doesn't lose any features during the migration.
  • Work on model cards and data cards continues. We are trying to figure out the scope we want for both (i.e. what is the enough features that they are useful but not so many that we never launch anything). CAlbon (WMF) (talk) 19:31, 20 October 2021 (UTC)Reply

Fiscal Year Quarterly Update 1

[edit]

As part of our key result 3, the two big goals for the fiscal year are the launch of Lift Wing’s MVP, our new machine learning model hosting and management platform and a new model governance strategy. There are three motivations for this new platform:

First, as the number of models hosted by the foundation continues to increase, the difficulty in managing each model with its individual eccentricities increases.

Second, the serverless model hosting on Lift Wing means that instead of hosting ~300 models maximum at WMF, we can theoretically host tens of thousands of models. This means that we can host and deploy models regardless of how popular they are. If a model is used 1000 times a second or once a year, it doesn’t matter to the infrastructure.

Third, the new infrastructure brings down the time it takes to deploy a new model from weeks to ~30 minutes. This means we can iterate on models fast, submitting improvements, and fixes.

Right now we are making steady progress towards the Lift Wing MVP.

The other major goal of the fiscal year is the development of an ethical machine learning model management strategy. As part of our multi-year plan of making the Wikimedia Foundation a public, best-practice example of applied ethical machine learning. As a first step towards this goal, we will create a strategy in collaboration with the community on how models hosted by the foundation are governed. For example: How does a community learn about the models that are impacting them? How does a community consent to them? How does that community provide feedback or voice concerns? How does a community revoke that consent?

We are just at the start of this effort. CAlbon (WMF) (talk) 19:37, 20 October 2021 (UTC)Reply

We Are Hiring!

[edit]

We are hiring a Machine Learning Engineer! 2+ years of experience, no education requirement, remote. CAlbon (WMF) (talk) 19:40, 20 October 2021 (UTC)Reply

Hi Chris, thanks for sharing! I cannot find anything on for how many hours the position is? Ciell (talk) 21:07, 20 October 2021 (UTC)Reply
oh! It would be a full time position, so 40 hours a week. CAlbon (WMF) (talk) 16:42, 26 October 2021 (UTC)Reply
Also sorry I didn't see this for 6 days. Not sure why. CAlbon (WMF) (talk) 16:42, 26 October 2021 (UTC)Reply
Thanks! Ciell (talk) 17:36, 26 October 2021 (UTC)Reply

Wikimedia ML Livestreamed Office Hours

[edit]

Today I'll be hosting another Wikimedia machine learning office hours on Twitch . We'll be discussing model cards and my new roadmap. CAlbon (WMF) (talk) 18:15, 28 October 2021 (UTC)Reply

Thanks to everyone who came! For those that couldn't make it or want to watch me at 3x (I don't blame you) I'll also post the video to youtube in a few days and provide a link here. CAlbon (WMF) (talk) 19:38, 28 October 2021 (UTC)Reply

Weekly Team Update 2021-10-28

[edit]

Lift Wing is currently up and hosts a handful of models for testing. We are (for lack of a better phrase) doing a series of "test flights" with the system to help us see where are areas of concern. Furthermore, we have what we think is a good list of things that needs to get done before we can claim to have reach MVP:

  • Implement minimum Observability for ml-serve (prometheus metrics, grafana dashboards, logstash dashboard, etc..)
  • Add firewall rules to ml-serve.
  • Add an istio proxy as sidecar container to our pods, mostly to implement a sane circuit breaking policy for the MW api.
  • Bootstrap the ml-serve-codfw cluster and complete the work for inference.discovery.wmnet
  • Add a workflow to upload models to Swift (likely from stat100x boxes). Our current way is to use ml-serve1001 but it is not a long term solution :D
  • Add more models to ml-serve and test how traffic is handled at various levels (basically a complete load test).

Upgrade our docker images to kserve 0.7

  • Test features of the API-Gateway with our current ml-serve-eqiad cluster. For example, trying to restrict access to a model via authentication.
  • Understand how feast can be integrated to ml-serve for the online feature store part.
  • Bonus point - deprecate the testing minikube on AWS

We are still a bit away, but it feels great to have the launch of the MVP in sight. CAlbon (WMF) (talk) 19:43, 28 October 2021 (UTC)Reply

Sorry for the delay in updates. It is my fault.

[edit]

I'm reorganized my to-do list so this is surfaced. CAlbon (WMF) (talk) 14:55, 16 November 2021 (UTC)Reply

Weekly Team Update 2021-11-24

[edit]

It is Thanksgiving for the two American members of the team! But work still continues!

  • No livestreamed office hours this week because I am out tomorrow!
  • User interviews are continuing with model cards. We are hoping to get ~30 interviews with volunteers, researchers, experts and start to put together some early draft product plan for models cards which we share with everyone. The is not to build then show, but rather show and build at the same time, and having some kind of public document describing our current thinking which folks can jump into and change or discuss is a big step.
  • I know I saw this every week but Lift Wing is close. Models are being hosted, we can serve predictions. There are two big areas we are working on. First, the API Gateway (api.wikimedia.org) needs one more feature before we are ready to connect, specifically service level API limits (instead of the current global API limits). This is because we believe we'll need to fine tune the API limits based on how much an individual query costs. This means a global rate limit applied to all Wikimedia APIs might not work well in our use case. Second, we have started some initial testing on the models hosted on Lift Wing, for example how fast they return a query and other things. Initial results say the performance is good, not great, but good. This is an awesome starting point since over time we will make the models more efficient and increase the prediction speed.
  • We have started to talk about what an online feature store would look like, specifically Feast. Next step for us to install it somewhere and start to explore what it needs to work well.
  • Finally, those to hang out in the IRC chatroom will see a pleasant changing soon. For the past two months Luca on the team has been writing more in IRC, talking about what he is working on in that moment and his ups and downs. Going forward we are going to do that with the whole team and see if it leads to some good discussions. So overall you should see more activity in IRC. CAlbon (WMF) (talk) 19:30, 24 November 2021 (UTC)Reply
Thank you for writing these weekly updates, they're always interesting and informative. Legoktm (talk) 19:40, 24 November 2021 (UTC)Reply

Out Sick!

[edit]

Sorry all, no update or office hours this week. I'm out sick with a cold. CAlbon (WMF) (talk) 20:44, 10 December 2021 (UTC)Reply

Weekly Team Update 2021-12-15

[edit]

This is the last week before most of the team goes out on holiday for a few weeks.

  • The last few weeks things have been going really well. We still haven't officially launched the MVP yet, but we are making a lot of improvements to how we are using Kubeflow. For example, originally we were running all preprocessing steps inside the model's Docker container, however the better/more canonical method is to have separate preprocessing transformers.
  • We hired another machine learning engineer! They are going to start mid-February! Their role will be to help get models into production from both the community and within the Foundation as Lift Wing grows.
  • We need a name! Lift Wing is the name of our model deployment cluster about the launch. Train Wing is the name of our planned model training cluster. In addition to that, we have plans for a feature store and a few other systems. We have been calling the entire thing "the new infrastructure" which is pretty bland. There has been a request for a name that covers everything. Any ideas?

See you all in 2022! CAlbon (WMF) (talk) 15:21, 15 December 2021 (UTC)Reply