Sunsetting Working Group/meeting notes/20170830
Notes from the Tech-Mgt F2F in Montreal: https://docs.google.com/document/d/1kdbNOXc4j7L0dPZS3kwoHMfpcCYkx1nWh2Ulyx9W7wA/edit (private to tech-mgt)
Who: Adam, Brion, Bryan, Erika, Faidon, Gabriel, Greg, Katie, Olga, Victoria, Kevin, Gabriel
- Purpose and goals for the group
- tl;dr: Something written down about how to A) determine if a product/offering should be sunset (or not) and B) how to go about that sunsetting
- Introductions / what projects/products you are thinking about in this context
- sub-groups working on each issue/question? Maybe not but maybe?
- Next steps
During TechMgt f2f, we raised several topics that spun out working groups, and this was one of those. "Sunsetting things"...software things, products, services Questions centered around
- How? (e.g. what process)
This group should come back with answers to those questions to guide people and teams
Introductions around the room
- AB Readers software manager; Readers has Maps frontend
- BD Used to work on core; work with community developers, Logstash
- BV On MW Platform team; interested in technical needs of sunsetting (e.g. namespace)
- EB Eng mgr of search platform; interested in variety, from symptoms of needing sunsetting to getting it done; worked with Maps team
- FL Tech ops; partly responsible for raising this issue; OCG? led to discussions w/Jon Katz about difficulty of sunsetting; Tune-up included recommendations about sunsetting https://office.wikimedia.org/wiki/Product_and_Tech_consultation#4._Sunset
- GG Release mgr; many new things go through me; RelEng and Ops work directly with things in production, so we care about having people responsible for them
- KH Fundraising tech mgr; we don't have problems turning things off, but am interested in seeing how this tension works out: Try lots of things, but if they don't work, you can't turn them off. This affects all work.
- KS ProgMgr in Tech. Here in a supportive/facilitative role
- OV ProdMgr in Reading Web; our team is responsible for a variety of extensions and features so would like this process to be less chaotic
- VC CTO; getting much more systematic about accepting things to support and moving them to retirement is fundamental; our teams are so thinly stretched, so we need to focus on what matters. We should be thoughful and systematic on both ends. I just wanted to be here to see the team start; it's one of the most important things to work on. Need to reduce side work to focus on important strategic work.
- GW Mgr of services platform; actively involved with OCG; interested in ownership, and defining it; we can easily make decisions on the tech side, but what are the other implications
Three things came up during f2f: OCG, Maps, Logstash, Graphoid Logstash doesn't fit the mold, as it's not user-facing. But everything is fair game for the sunsetting lens. We should think through those
This is a large group (9 + facilitator). Should we split up and work on the 2 questions separately? This is a new thought, so not sure about that.
Question: Eventually teams will form to look at individual instances. Will this group create a general framework for what we deploy? That (early lifecycle) question wasn't put in as part of this agenda Perhaps cover the early stage in another forum/group
Let's focus on one of the questions in terms of one of the products, and think through it in a straightforward sense. No product is well-known by everyone.
- Proposal: OCG, since there is a harder timeline. RelEng is trying to get rid of trebuchet, and ops is trying to get rid of salt (which trebuchet uses); goals to get rid of them this quarter, but OCG uses them
- OCG: Offline Content Generation (aka: making pdfs of articles/groups of articles). Replaced a system that was abandoned. Created by CScott and maybe Matt Walker, as part of an ad-hoc "sprint" team. Fell victim of the Eng reorg of 2015 (if it ever actually had an owner). CScott ended up in Editing, and this clearly was not an Editing feature; Readers didn't want to adopt this finished product; nobody wanted to turn it off.
- There were other reasons why OCG is on the sunsetting block:
- OCG was very poorly architected and not reliable; requires manual interventions
- Doesn't really have an owner team, so firefighting and security issues; nobody to test
- We all agreed that OCG is not the way forward
- Not very important since users can generate PDF in their browser
- WMDE was involved, and there was a community wishlist item to improve rendering, but they didn't own it, so it ended up in Readers
- Agreement (by Readers) to shut it down in Aug which was revised to this quarter
- Olga is the PM for that area
- WMDE's Community wishlist survey included something that fits within OCG
- Original plan was not as a replacement, but in addition; then we found out there were problems with OCG.
- Readers picked it up and decided to shift everything to electron. But there was no specific team to pick it up.
- Few months later, Readers Web picked it up, and electron was not able to meet the requirements, so went into research stage
- ToC, page numbers
- We have 2 portions for books (concatenation OR post-processing(redo pagenumbers)), and we're ready for single articles
- We need concatenation for the basic story of creating a book; we'll deploy this and retire OCG as the renderer, as long as we can test and electron can handle
- After OCG is retired, we can look at post-processing to bring it to parity with OCG
- Compare OCG with electron?
- We looked into browser-based rendering; electron is a way to run chrome server-side
- We in services offered to run it in order to allow shutting down OCG; will require less maintenance; we worked with WMDE
- Now it is a production service; they rolled out a rendering service using it
- There are some issues with it, but at least it works; has improved a lot; Chrome now has a native headless render mode: https://phabricator.wikimedia.org/T172815
- Has been collab between Reading Infrastructure and Reading Web.
- Current assumption is that Reading Infrastructure will take on electron, but discussions still happening
So in this case, we are replacing OCG with something that performs roughly the same features from a user standpoint? Yes, but at times we were discussing just discontinuing it without a replacement. Are we doing this [building a replacement] because it's easier to replace something than to announce to users that we are removing a feature There is a toolforge tool that genrates ~20k epubs/wk for wikisource visitors OCG/Electron usage graphs: https://grafana.wikimedia.org/dashboard/db/mediawiki-electronpdfservice?orgId=1 Looked at percentage of single-page vs. multi-page; book usage was low With offline functionality it became important again Books are getting low percentage right now; we'll be working on a reading lists feature which we expect to make book creation more popular; avoid having to rebuild to support reading lists Keeping single-page was most important as it was most used; that support was enough to kill off OCG, then we can support books later During the process, Readers/ProdMgt evaluated, changed priorities, made decisions
This was a huge saga, with lots of teams involved. 10 minutes left. How to use them?
There are other items (besides OCG) that are in varying stages of needing sunsetting. Graphs and maps are the big ones. Decisions haven't been made. Is there anything on the chopping block that is not waiting for big decisions from Audiences?
- Logstash...but backend search might resist that
Is there a more simple case before getting into more complicated instances? There are cases where we are owners; the challenge is when someone else will get the blame because incentives are not right In some cases, has the decision rolled downhill, where it should be made by product but ends up in tech? Probably
Another example: Zotero (+Citoid); IRC stream, heavily used but unowned and fragile; internal technologies beyond products like parsers (but let's not go there) Zotero is more like tech debt; when to invest the time to replace it. When to write code for translations, waiting for upstream. Not unmaintained but question of when to invest. Slightly easier than unowned where nobody can make decisions It's interesting because there are different incentives. It's not hurting Editing, but it's hurting ops. Maybe worth discussing because of that Topics like graphoid are more critical because they affect stored data / content. We need to think about maintenance at the point of deployment.
At some point we should bring non-staff into these discussions. When, how, and who Maybe we should discuss "ownership", which would clarify who could make these decisions. That goes up a level in meta-ness, so may be more complicated; table and talk more later? Advice from VC: Don't shy away from high-level general topics, but this is an awesome start
Real-life examples are good for learning what worked/didn't work. This one isn't complete yet, but we can do a post-mortem up until now...next time Next steps:
- Schedule next meeting, to continue discussing OCG (probably next week; maybe recurring)
- Post-mortem of it so far/what we can pull out as learning (what worked well/what could have been better)
- Open Questions
- Invest time to kick the can down the line or invest more time to sunset it correctly now
- How do we decide ^
- Invest time to kick the can down the line or invest more time to sunset it correctly now
- Consider whether to get into ownership issues / defining what ownership entails
- KS Schedule meeting
- GG Create wiki page for this group
- GG Post these notes on wiki