Talk:Wikimedia Developer Summit/2017/Topic ideas

Jump to navigation Jump to search

About this board

WikiDev17 topic: Multilingualism

21
Summary by RobLa-WMF

Positions so far:

RobLa-WMF (talkcontribs)
This is the text of the "Multilingualism" section of WikiDev17/Topic ideas as of this writing

How can we make our websites better support languages other than English (and character sets other than Latin)?

Doubling down on Machine Translation

  • Annotation service to record fine-grained translation correspondences between wikis over time (not just at the time of first translation)
  • Suggestion service to suggest new edits to wiki A when translated text wiki B is modified (or vice-versa)
  • Refactoring existing language converter pairs as (sometimes trivial) translation engines, eg cyrillic-to-latin
  • Building a translation engine in house, training it with translated wiki pages, improving it over time, etc
  • Tightly integrating the translation UX for everyone. More: one community wearing babel fishes / Less: scattered villagers after the Tower of Babel fell.
  • Improving harassment/vandalism/civility/inclusiveness/diversity mechanisms to handle these larger cross-cultural communities.
  • i18n of global pages, global templates, etc. May need mechanisms to allow translation of comments, for example.

Fora: translators-l, translatewiki.net

Strainu (talkcontribs)

I would like to remind you of my suggestion to think broadly here: don't limit the pool of potential participants only to people actively working on i18n in the software , but search for people reporting bugs and writing scripts as well.

Qgil-WMF (talkcontribs)

Yes, I totally agree. If you think someone should be at the Summit discussing these topics, please ask them to request an invitation / travel sponsorship. We are also trying to improve remote parcipation, which is a factor for any topics proposed but evidently useful for this one.

Qgil-WMF (talkcontribs)

@Runab_WMF there is clear interest for good coverage of i18n topics in the Summit. In addition to all the discussions around this topic, I can see how the translation / multilingualism aspect may plug well into two two Wikimedia process that surround the Summit: the Community Wishlist Survey (which we want to see with an increased participation of editors of non-English projects -- see Phab:T144074) and the Wikimedia Strategy movement discussion, which will start in January as well and will have an obvious focus on our diversity of readers, editors, and communities.

Now, in order to fulfill these expectations I think we need several things:

  • confirmation that the Language team will be well represented at the Summit
  • participation of volunteers and other stakeholders involved in translation and other aspects of i18n
  • ideally, a main topic pointing to some direction in order to focus the attention of participants and proposals, for instance, is Machine translation the thing to focus on? Better than "i18n".)
Runab WMF (talkcontribs)

@Qgil-WMF - to respond to the 3 points you make here:

  • The Language team has plans to attend the DevSummit (like all the past years), subject to budgetary (and associated) logistics being completed
  • A major disadvantage is that the conference location may not have many local MediaWiki i18n participants. However, a combination of the Language team, other WMF developers and other participants involved with i18n and l10n projects with overlapping interest may be able to move forward on some focused topics.
  • I would vote against a mammoth topic like Machine translation and particularly like 'building an in-house translation engine'. It would be a stretch on time and resources. It might actually be better to reach out to people we already collaborate with (like the Apertium project) and explore options that can be of shared interest, but I don't think the DevSummit would be a good place for that.

The Internationalisation wishlist is somewhat dated and possibly needs another review. Language support has so many different aspects and for so many user groups, even within MediaWiki that I am somewhat unsure whether we can zero in on a main topic. Another option could be to have perhaps 2 running themes - a generic theme, and another that addresses a specific area of i18n (e.g. RTL), and then find individual main topics in these 2 groups. Any thoughts on this? Thanks.

Qgil-WMF (talkcontribs)

@Runab WMF deciding main topics is always somewhat unfair, somewhat inaccurate. Still, they can be useful. The idea is that the multilingualism main topic you define will help us reaching out to the related contributors and assuring that they can participate in the Summit.

See the list of main topics proposed so far. I hope it helps defining the main topic for this area.

In addition to the main topics, anyone will be able to propose more specific topics, and the very least as an Unconference session. By defining a main topic you are not ruling out possibilities to have other conversations at the Summit.

Runab WMF (talkcontribs)

I am inclined to base the language topic along side the main topic of 'editorial collaboration', primarily because of the options to connect into this. To clarify, I am not trying to shoehorn a open topic into what the Language team is currently working on. However, given that for the past 2 years the primary focus has been on an editing tool, there have been several conversations (direct or incidental) during this entire period, on things that could have been in a different state for the larger good. Templates is one such example. I am pinging @Amire80 about this. He had some thoughts on this topic from the last DevSummit.

Qgil-WMF (talkcontribs)

@Runab WMF, if I understand your reply correctly, there would be no main topic about multilingualism. You would submit multilingualism-related proposals under main topics where corresponds, and of course you are free to propose specific topics in the Unconference context. Correct?

Qgil-WMF (talkcontribs)

Yes, correct. No main topic for multilingualism this time. Multilingualism related proposals are welcome either in relation to main topics or in the Unconference.

Runab WMF (talkcontribs)

Is there a way we can mention this clearly on the main page for the topics so that people are aware that they can submit language related topics both for the summit and the unconference? Otherwise people may be looking for a main topic for language support and won't know for sure where to put them. Thanks.

Qgil-WMF (talkcontribs)

I think we should not go beyond main topics in the main page, otherwise we might have similar feedback about other areas in a similar situation. The best the Language team and anybody interested in multilingualism can do is to promote the Summit and encourage the submission of proposals around this topic directly. You work on a regular basis with developers, translators and multilingual users. Reach out to them directly and tell them about the Summit, otherwise they will not even be aware of this event and its main page.

Cscott (talkcontribs)

I will note that historically the summit has not had good attendance from those with experience or expertise in i18n topics. I think we'd need to make a big push w/ explicit invitees, scholarships, etc in order to overcome this.

Qgil-WMF (talkcontribs)

We want to change that history. In addition to putting more emphasis in flying to San Francisco more people working in this area, I wonder if we could reach out to people already in SF Bay Area working in i18n (but not Wikimedia) to join us. So many problems are so common, and the Bay Area has a huge pool of projects and experts (and some of them know Wikimedia very well).

Runab WMF (talkcontribs)

@Qgil-WMF This is actually a great idea but my worry is that it depends on whether this particular group of people will be available on those dates and later. Secondly, some from this group may not be active participants in Open Source projects or may not be in a position to share information about their work. The Language team used to host something called a Language Summit earlier, primarily for Indic languages, with a similar group. While it was supremely useful to get things rolling or even completed (given discussions started early), following up from where we left it off at the end of the summit was a challenge as people moved on to their own projects/work. Thanks.

Qgil-WMF (talkcontribs)

Also, @Nemo bis has pointed to Internationalisation wishlist 2014, but that is a long flat list, and I don't know how up to date it is now. As someone interested in multilingual wikis, I have tried to go through that list to find a main topic/theme, but I could not. Ping @Nikerabbit (I have already reached out to @Runab WMF).

Amire80 (talkcontribs)

My opinion is my own, but it's close to what @Runab WMF said: Internationalization is great, and machine translation in particular is great. Moar Free-Software machine translation is even greater, and Wikimedia should become an important player in this field some time in the foreseeable future.

And yet, it's a tad early to focus on this in 2017. Given the current resources, Wikimedia simply cannot go beyond the following two things:

  1. Supporting existing Free Software machine translation projects. This is already being done—see Meta:Grants:IEG/Pan-Scandinavian Machine-assisted Content Translation for a successful example. I hope to see more of this, but it's not really relevant for the dev summit.
  2. Talking about far-fetched dreams of how it should be some day. This was already done in the last dev summit. It was pretty, but not much has changed since then in terms of machine translation. It's quite possible (really!) that stuff will change next year and it will become a much more relevant topic for 2018.

Till then, however, there are a bunch of much more urgent things to clear out. Here are some examples:

  1. Support for cross-wiki templates. @Legoktm made an excellent presentation about his proposal for the technical implementation of global templates in dev summit 2016. There was nothing in his presentation about the user side, however: how will the templates be actually internationalized and localized. This is an immediately important topic given the current developments in ContentTranslation, in which the template support is being completely overhauled.
  2. Cleaning up the hairy mess around "standard" language codes and site name lookup. @Duesentrieb's SiteIdMapper project is one tip of this iceberg, and there are many more. Getting this rolling will unblock or outright fix a lot of other bugs in Wikidata, ContentTranslation, searching, etc.
  3. What next for interlanguage links? Their design has significantly changed on desktop and on mobile in 2016, and this is already making a positive impact, but more could be done to improve their design and make them even more accessible, and to improve their technical handling (e.g., a proper JS API for handling them on desktop instead of hacky jQuery selectors).

These internationalization topics immediately come to my mind for January 2017. They may be less attractive than machine translation, but they are much more immediately relevant. Resolving them will bring future machine translation projects within our reach. Two prerequisites for successful machine translation projects are engaging people who know different languages and getting a lot of parallel texts actually written. Content Translation and Wikidata are projects that make this much easier, and the three points above will make them run more smoothly.

Cscott (talkcontribs)

Amir and I talked about this at the editing offsite. I think we identified several near-term projects that would lay the ground work for future machine translation:

  • Exporting interlanguage links in "apertium dictionary" format
  • Starting to collect part-of-speech information in wikidata for articles (thus, interlanguage links)
  • Exporting CX translation pairs in "moses training data" format.

Amir -- was there anything else I've forgotten? I think those three projects fit in well with the "far-fetched dreams" of what WMF *might* do with Machine Translation, and lay important ground work.

Nikerabbit (talkcontribs)

I don't think these are the projects that will help us to get to the glorious future.

To build a good machine translation system you need a good data. I think interlanguage links are quite low quality in that regard. For collecting parts of speech information, there is work going on towards Wikidata for Wiktionary that would make this task moot.

For exporting things in "moses training data format", I think that anyone familiar with Moses can more easily convert our dumps to the correct format than we. In any case our data is not well aligned, so heavy post-processing is needed. Opus is one possible platform that could do that processing and distribution.

Improving our translation memory engine, which is not necessarily more complicated than the proposed projects, so that the engine could be integrated to ContentTranslation, would provide benefits to languages where machine translation does not yet exist, in addition to improving all our existing translation processes where translation memory is already in use.

Legoktm (talkcontribs)

I personally would like to see anonymous users being able to set an interface language (primarily for multilingual wikis), which we can probably now do with varnish+xkey.

And I think making Translate+VE or even Translate+wikitext work nicely would be awesome, and fix a lot of technical debt as well. It would also maybe be a decent summit topic because it would require support from different teams (Language, VE, Parsing, etc.). And every person who edits mw.o would love it. (But that would require Translate work getting resources, and I don't think that's been the case lately...).

And last idea for tonight would be librarizing MW's awesome i18n code. The mission of the Wikimedia movement requires all knowledge to be accessible to all people, which requires i18n. And creating a solid set of PHP and/or JS libraries that are battle tested and work well for such a large project would be pretty valuable in service of that goal. This would require quite a bit of planning, untangling, and refactoring to become a reality.

SSastry (WMF) (talkcontribs)

I am going to +1 @Amire80's observations about immediacy + @Legoktm's proposal about Translate.

Specifically, Niklas and I discussed Translate + VE sometime in 2014/2015, but that at that time, as far as I remember, it was meant to be a VE-only support, which we weren't sure how palatable it would be. If that is not going to be an issue, that is one way to go. However, it is possible that we can come up with good solutions for wikitext, but yes, that requires looking at existing translate use cases (which I am somewhat unfamiliar with) and work through the details. We might be able to do some of this work at the Editing offsite and come up with proposals for further discussion at the Dev Summit.

Nikerabbit (talkcontribs)

I can be brief since many of my thoughts have already been said above.

My wishlist from 2014 is of course little outdated, but I would consider it as a good starting point when we think about language support in wide angle.

From my perspective there are three things which I would consider high priority, but it's easy to classify them as technical debt which makes it harder to argue for them:

  1. Interface language selection for anonymous users. This has been blocked for a long long time.
  2. Alternative wiki page translation mechanism for Translate that works with VE and parsoid and does not make the wikitext to be hard to edit.
  3. Improvements to translation memory. Currently it requires a special plug-in and does not scale in any direction. It handles paragraphs poorly, it doesn't always find results and it doesn't even support multi-dc configuration for increased availability.

I would be happy to propose (1) for discussion to reach consensus and find the ways to achieve it. For (2) I gather there is already consensus and some ideas how to start, but scheduling and resourcing such multi-team work is hard.

I am against taking a deep dive in building machine translation software at the foundation, when we cannot even take care of the simpler task of a translation memory. At most we could discuss in what other ways and places we could use the machine translation services currently used in CX and Translate.

Finally, librarization of i18n code is still a good think to do, but I think we missed the train where we could attract other users of those libraries. PHP and jQuery already have multiple i18n libraries: jquery.i18n which pretty much is a librarized version of MediaWiki code (but we never got MediaWiki ported to use jquery.i18n) has seen, in my opinion, quite limited use.

Oops, that wasn't very brief after all :)

Reply to "WikiDev17 topic: Multilingualism"
RobLa-WMF (talkcontribs)
This is the text of the "Wikitext" section of WikiDev17/Topic ideas as of this writing

How do we make manipulating the information on our websites easier and more useful? (both for humans and computers). Improving Modular Wikitext Maintenance

  • Infoboxes from wikidata, categories from wikidata, wikidata in commons, oh my!
  • Visual editing of templates, alternative template mechanisms, etc
  • Wikitext 2.0 -- how to shave off the rough edges but still provide a text-based power-user editing interface
  • Global pages, Global templates, etc
  • Improving composition of text and media content on the page
  • Moving to a Glossary model for LanguageConverter rules
  • Splitting metadata (categories, page flags, etc) from content in the DB
  • Multi-Content Revisions

Fora: wikitech-l, wikitext-l, and Wikidata readers and participants

Qgil-WMF (talkcontribs)

Isn't "Wikitext" a too wide umbrella for a Summit main topic?

Cscott (talkcontribs)

It may be a wide umbrella, but the Venn diagram combining wikitext development with typical summit participants is small! Most WMF engineers don't actually work directly with wikitext or the wikitext workflow (templates, scripts, gadgets), odd though that may sound. This suggests that actual breadth of topic will be limited in practice by the expertise of the folks we can recruit to participate.

(And besides, "wikitext" was Rob's title. My topic description was "maintenance of wikitext", more or less.)

RobLa-WMF (talkcontribs)

A couple months ago, @John Vandenberg and I had a great conversation about the future of Wikitext (in the "CommonMark" topic on the Markdown RFC). He wrote:

[A world where Markdown becomes the accepted standard for wikitext is a world] in which the MediaWiki (and wiki) technical community has failed (that includes me), as the world overtook it with a less useful format, and the generic terms 'wiki' and 'markup' have lost their meaning, and the Wikimedia content is stored in a format that is no longer a format that MediaWiki believes and invests in.

He suggests that MediaWiki wikitext is still in the running, and that we should do a few things if we believe in the format. His suggestions:

Invest heavily [into the Wikitext spec], and into alternative/reference implementations, including providing a saner version of the syntax that cater for the needs of other organisations that consider the security and processing overhead of wikitext to be problems compared to markdown. It should be the default for new MediaWiki installs, and older installs should have a nice tool that converts the crazy unspecified wikitext into 'wikitext-simplified' where possible.
Work with other wiki vendors that might want to make 'markdown' syntax an implicit choice, not clearly identified as markdown syntax, stressing that causes confusion for users.
And work with other wiki vendors to increase compatibility between their syntax and ours, so that 'wikitext' and 'wikitext-simplified' are viable long term formats.

Other wiki vendors have given us up for dead (see Google Trends comparison for MediaWiki and Markdown over time). Let's not roll over and play dead. WikiDev17 seems like a great opportunity to gather the wiki markup community and inclusively shape our long term markup strategy.

Cscott (talkcontribs)

Hm, I'm not sure I agree that we should double down on wiki markup. That's partly the sunk-cost fallacy I think. Markdown has succeeded in large part because it's generally *nicer to write*, with fewer corner cases in the syntax and fewer special cases that apply only to mediawiki ([[..]] syntax, magic RFC links, {{..}} behavior, etc). I certainly prefer to write markdown where possible, although that's in large part to the `...` construct, which is extremely useful in the type of writing I do most often.

We have written a markdown-to-wikitext converter using Parsoid. The main thing that "the less useful format' (markdown) is missing is an extension mechanism (which other competitors such as restructured text include). If a standard extension mechanism could be defined for markdown, then you could build a template mechanism and special "link to an WMF project" syntax on top of that.

As it turns out, the template mechanism we have for wikitext is pretty fragile as well, in addition to being completely non-portable to non-mediawiki usage. And it's pretty fundamental! So thinking hard about a more elegant and general template mechanism -- or at least cleaning up some of the grungy corners of our existing mechanism, like token concatenation and start-of-line fudging -- would be vital if we actually wanted to push "wikitext" further into the world.

But in any case, I do think that work on "wikitext simplified" (syntax and templates) would be interesting. Lots of our power users would like something that is very close to the wikitext they are familiar with, but without some of the hidden pitfalls. Lots of our engineers would like to have a parser/template implementation which is drastically simpler than what we have now.

But the other side of this ought to be (in my opinion) decoupling the core of mediawiki from the particular choice of markup language. If you want to have a pure-VE wiki with HTML-native storage, you should be able to. If you want to use markdown (+a template mechanism) for your wiki, you should be able to do that. Similarly with "legacy wikitext" and "wikitext 2.0" and who-knows-what in the future. Step 0 was gwicke's introduction of "lossless round trip translation" as embodied with Parsoid, which decoupled the user experience from the actual markup stored in our database. Step 1 is completing that process so the entire mediawiki core is markup agnostic. *Then* we can have a thousand flowers bloom. If one of the hardy flowers is "simplified wikitext", hurrah. If users really prefer markdown syntax, good for them. We can make both work, and (to a large degree) even let "edit in markdown syntax" co-exist with "edit in wikitext syntax", with the content represented in the underlying database using some other format altogether.

Cscott (talkcontribs)

At the parsing team offsite, my thoughts above got boiled down into a "zero parsers in core" proposal (parsers should be fully pluggable and optional), and a concrete "simplified wikitext" proposal that I hope to write up properly soon. (In the meantime see https://gerrit.wikimedia.org/r/316237 ).

It also became apparent (to me at least) that "wikitext syntax" and "template semantics" are largely orthogonal. Some of what we think of as "wikitext 2.0" is actually related to the template engine, and can be worked in with proposals like {{#balance}} irrespective of wikitext syntax reform discussions.

SSastry (WMF) (talkcontribs)

Exactly! :-) I think syntactical changes should be decoupled from changes to the processing model. I think we should go all the way with adopting DOM semantics for wikitext and {{#balance}} is just the first step along the way.

SSastry (WMF) (talkcontribs)

I think Parsing/Notes/Wikitext 2.0, Parsing/Notes/Wikitext 2.0/Strawman Spec , and Parsing/Notes/Two Systems Problem are all relevant to this topic. We are going to be talking in depth about some of these at the parsing team offsite in october and possibly the editing team offsite as well, but yes, there is a lot of group discussions to be had in this area to cohere around strategies. I think "evolving wikitext" or "technical debt in wikitext" or "wikitext maintenance" are all interesting subtopics in this area.

SSastry (WMF) (talkcontribs)

And, to clarify my comment about "group discussions ... to cohere around strategies", the discussions will probably benefit from participation of template editors, bot writers, toolling developers.

Qgil-WMF (talkcontribs)

Reading back the list of points in @RobLa-WMF's first post, I think one title that would capture the intention could be "Handling wiki content beyond plaintext".

This main topic would be a good umbrella for those points and the topics discussed here.

Antigng (talkcontribs)

Any attempt in the name of evolution / modification / technical debt payment that change the wikitext syntax greatly will never be supported by the commnuity. Storing content in other forms (no matter how efficiently they are parsed ) other than the old wikitext is also a bad idea, since this will make database dumping (in wikitext form) impossible, and all bots that rely on dumped data will stop working.

Reply to "WikiDev17 topic: Wikitext"
Tgr (WMF) (talkcontribs)

Figuring out how our current resource allocation might be shortsighted, what trends are we missing out on might be a good topic for the summit. My bet for that would be artificial intelligence (in the wide sense, including natural language processing, machine learning etc). User interfaces are getting more and more clever: they understand what you want without requiring you to become an expert in their syntax; they can guess what you will want next; they can relieve you from repetitive tasks by learning how to do it and let you focus on tasks with more added value. Your phone understands what you ask from it; the search engine you use tailors its answers to your past interests; soon your car will drive itself.

When the shift to mobile happened, Wikimedia was late to recognize its importance; out mobile interface was significantly delayed compared to other top sites, and had to be monkey-patched on top of a system that was not built with such a use case in mind. There is a danger that we repeat that history with AI. There are various initiatives that already involve or might benefit from AI tools (NLP in Discovery, article feeds based on past visits in the apps; suggestions in the editor; ORES; and chapter and third party initiatives like Wikispeech), but they are disconnected and not really part of any organization-level strategy, and in some cases suffers from inequality of investment and benefit (where everyone is happy to use an API but no one helps with its development - it seems ridiculous, for example, that the Foundation does not employ a single developer to work on ORES, which is really the only plan we have for addressing editor decline).

I might be misjudging the trends (AI is well outside my area of expertise) but my impression is that this area is way too important to happen in an ad hoc, skunkworks way; there should be a unified strategy, more support from the users to the providers of the machine learning infrastructure, and exchange of knowledge between Wikimedia developers working in a similar are, and between Wikimedia developers and external experts. WikiDev17 might be a good place to work towards that.

EpochFail (talkcontribs)

+1

I think that LZia and Ewulczyn will have a lot to contribute too.

SSastry (WMF) (talkcontribs)

Looks reasonable to me. So, +1.

Basvb (talkcontribs)

Supporting this. Partial automatisation of image categorisation (which is basically classification) using deep learning/convolutional neural networks seems like one of the next logical steps in this area to me.

EpochFail (talkcontribs)

In the m:ORES project, we're currently predicting edit quality (vandalism? good-faith?) and article quality (Stub, GA, FA? and the equiv for russian and french). We're working on building models for predicting the type of change made in an edit (e.g. reformatting the current information or adding new information?) and detecting personal attacks in talk posts (based on work done in m:R:Detox).

A lot of AI relies on human-labeled data. So, one of our major projects is m:Wiki labels. Getting the resources to turn that into a proper mediawiki extension has been difficult. It seems that, if we can come together and discuss the value of AI and where we'd like to see support, finding resources for these types of technologies that all AIs use will be easier.  :)

Qgil-WMF (talkcontribs)

Alright, so it seems that this proposal is still making it as main topic based on de facto consensus. Works for me.

We still need to fine tune the title of this main topic, just as we have done with the other ones. "AI" is too generic and doesn't provide any direction. What about AI?

See the other main topics at Wikimedia Developer Summit/2017/Program. This one needs to fit there.

EpochFail (talkcontribs)

I propose: "Leveraging AI to help build and navigate content"

Qgil-WMF (talkcontribs)

Spelling out "AI", what about "Artificial Intelligence to build and navigate content". Although I am not sure what "navigate content here means".

"Artificial Intelligence to build and curate content"?

EpochFail (talkcontribs)

Can you be more specific about what you think is ambiguous about "navigate content"?

Qgil-WMF (talkcontribs)

I didn't say "navigate content" was ambiguous. I said that I wasn't sure what it meant. After thinking more, I believe it might refer to AI helping readers to find interesting information based on their actions? I got to this conclusion because I am aware of experiments like creating related pages automatically or recommending articles for translation. Maybe it is only me getting confused about this "navigate content", but if I can be considered a benchmark, then many more people are going to be confused by this expression.

EpochFail (talkcontribs)

It seems you are interpreting "navigate content" in the way I hoped. :)

You've given a specific example of an AI that might recommend content based on past activity. We have AIs that generate new links based on people's browsing behavior. We have AIs that recommend articles based on general trends. You might imagine an AI that helps you find the right commons images for your new article. Both building and navigating content suffer from en:information overload issues. AI can help.

Qgil-WMF (talkcontribs)
Reply to "WikiDev17 topic: AI"
RobLa-WMF (talkcontribs)
This is the text of the "Collaboration" section of WikiDev17/Topic ideas as of this writing
  • Improving technical collaboration. There are many angles to look at, all of them needing improvements: between developers (professional-professional, professional-volunteer developers, Wikimedia - MediaWiki 3rd parties), and between developers and users (developers-Wikimedia communities, developers-readers). The code review discussion, the Technical Collaboration Guideline, and the efforts to learn from existing and new readers would fit here.
  • Functioning code review process, because it affects our ability to ship better software faster, and our ability to recruit volunteer contributors to these efforts.
  • A plan for discussion pages in MediaWiki, because it is a key aspect of user collaboration and we are far from meeting all user expectations.
  • Real-time collaborative editing. Often discussed, rarely planned at a long-term level
  • Consolidate work on MediaWiki core
  • (A unified vision for) Collaboration
    • Real-time collaboration (not just editing, but chatting, curation, patrolling)
    • WikiProject enhancements: User groups, finding people to work with, making these first class DB concepts
    • Civility/diversity/inclusiveness, mechanisms to handle/prevent harassment, vandalism, trolling while working together
    • Real-time reading -- watching edits occur in real time
    • Integration with WikiEdu
    • Broadening notion of "an edit" in DB -- multiple contributors, possibly multiple levels of granularity
    • Tip-toeing toward "draft"/"merge" models of editing
    • Better diff tools: refreshed non-wikitext UX, timelines, authorship maps, etc.
Trizek (WMF) (talkcontribs)

I see 3 main topics on that:

  • Technical collaboration improvement (first point), which is on TC scope, with TCG and the to be hired developer advocate
  • MediaWiki core
  • Real-time things & collaboration

That last point has a common blocking point: discussions. I would recommend to focus on discussions as a brick for future products.

Qgil-WMF (talkcontribs)

I agree with @Trizek (WMF) and CScott that there are too very different topics under this "Collaboration" umbrella. We should at least separate

  • editorial collaboration to improve content
  • technical collaboration to improve software

If we want to keep both as main topics, we should find separate statements for each. For instance:

  • A unified vision for editorial collaboration
  • How to engage new technical contributors

The MediaWiki core part looks like a topic in itself, more related with "MediaWiki core" than "collaboration".

RobLa-WMF (talkcontribs)

@Qgil-WMF and I had a conversation about this topic earlier this morning. The piece that I'm most interested in is the technical collaboration piece, but I don't object to the editorial collaboration piece. Here's links to the two areas:

I'm glad that the new name for "technical collaboration" ditched the qualifier "new", since I think that biases us to ignore the challenges of collaboration among experienced developers. I like the emphasis on growth as a goal. We should strive to create an environment that people want to be part of, which is going to involve some mix of celebrating who we are now (including the quirky bits), while simultaneously figuring how to improve our collaboration (e.g. how to make it more welcoming for those that aren't here yet to join the fun)

Reply to "WikiDev17 topic: Collaboration"
RobLa-WMF (talkcontribs)
This is the text of the "Quality" section of WikiDev17/Topic ideas as of this writing

How can we improve the quality of our software, and pay down the technical debt we've accumulated over the years?

In developer discussions this is a recurrent topic as well, from the perspective of how difficult is to change anything from a technical point of view (from a social point of view too, but our platform + desktop + mobile web + mobile apps is a challenge in itself).

Fora: qa and wikitech-l readers and participants

Qgil-WMF (talkcontribs)

I think quality and technical debt should be drivers of the Summit agenda. However, as a listener of many Wikimedia discussions, I have the impression that we frewuenrly compile problems but we lack some sense of North, of priorities, of where to start. Rob, @Greg (WMF), @BDavis (WMF), what "main topic" could be defined that would help focusing Summit participants and discussions on quality and technical debt more productively?

BDavis (WMF) (talkcontribs)

I'm not sure that anyone needs to be convinced that cleaning up the code base is a good idea. Where we seem to get stuck is in finding people who actually want to spend time fixing the broken windows and/or reviewing the fixes that others have proposed. There is a lot of good work that is done quietly by @Aaron Schulz, @Anomie, @Tgr, and @Legoktm to name just a few. Maybe one or more of them has a idea about a particular area that could use more discussion or just more advertisement of things that could use help?

Cscott (talkcontribs)

My impression talking to non-engineering staff at the WMF is that we have a rather large communication problem between engineering and the rest of the organization that we need to tackle -- the perception is that engineering is being given lots of money and "not having anything to show for it". This impression was strengthened during rough times recently when a number of expensive projects were started and then abandoned due to poor management -- since rememdied, one hopes, but the damage has been done.

From top-down (board, C-level, public) what is needed is engineering to actually *accomplish things*. Spending a year working on "quality" in this environment -- without a corresponding push in communication/etc -- is likely to do harm (like extend our current hiring freeze, continued flatlined budgets, ongoing community disapproval).

This isn't to say I'm opposed to this topic! But we should recognize at the outset that a large part of this work isn't *finding things to do* or *finding engineering willing to work on quality*, but rather in *convincing management that resources should be spent on things that aren't products or features* and *quantifying and communicating the results of quality work*. We can't just go into our engineering workroom and come out in a year pronouncing things "much cleaner now".

BDavis (WMF) (talkcontribs)
This isn't to say I'm opposed to this topic! But we should recognize at the outset that a large part of this work isn't *finding things to do* or *finding engineering willing to work on quality*, but rather in *convincing management that resources should be spent on things that aren't products or features* and *quantifying and communicating the results of quality work*. We can't just go into our engineering workroom and come out in a year pronouncing things "much cleaner now".

This is a good point I think. I don't feel it is at all unique to the Wikimedia Foundation as an organization either. This is my 5th day job as a software developer and I've experienced the same issue at all of them. There is one thing that I have found to be different here however. That difference is a split of "product" vs "platform" that somehow sees these as completely separate concerns. The "product" side seems to have been in the ascendancy for the last couple of years which is perhaps making the prior split even more exaggerated.

A large portion of my time at $DAYJOB-1 where I held the title "Software Architect" was negotiating a balance with engineering and organization management over the number of new things that could be built in a given time period and the technical debt cleanup and forward looking R&D work that also needed to happen for the software products to remain healthy and viable. This is a role that I have not seen well represented in the WMF org charts or the small number of teams that I have participated in. I have no idea how that translates into topics for a development summit.

Strainu (talkcontribs)

Considering the above, sounds like this would not be a very good topic for an inclusive summit. It seems that in this area everybody is aware of what needs to be done, so it is only a resource issue. These are better resolved in smaller meetings than in conferences in my experience.

Greg (WMF) (talkcontribs)

This is a pretty Developer-focused (both WMF and community, so let's just say "Wikimedia") topic, not the *most* inclusive if we also consider other contributors (not the worst though, way better than "what team should hire what type of staff").

I would like to move forward on some topics in this area, however. One big thing is "unit" test coverage. We're horrible (MW Core unit test coverage is at ~10%). We also have a fairly inverted testing pyramid (we (WMF product teams, mainly) spend a lot of effort on browser/api level tests, but not unit/integration tests). That's not the makings of a solid base to build on top of.

Once the topic of test coverage comes up usually one of the immediate responses is "but who's responsible for Core?" Good question. I have some vague ideas on how to move forward here, but I also like getting feedback/brainstorming from groups of people before moving forward (just my style ;)).

Would that (something like "how do we improve unit (with real unit tests) and integration test coverage for the software we write and maintain?") be a narrower but useful focus?

Cscott (talkcontribs)

Greg -- I think that would be a useful concrete subtopic, but i'd worry that we'd get lost in the details of unit testing frameworks and forget to deal with the broader question posed above: "but who pays for it?".

Greg (WMF) (talkcontribs)

That's actually the point/topic I was suggesting and it's relatively easy to steer the conversation away from tooling to who pays for it/how to get there :) (we did so at the Product&Tech onsite last week).

RobLa-WMF (talkcontribs)

It seems a death spiral is possible here:

  1. We look at our seemingly overwhelming mountain of tech debt, and declare ourselves without sufficient resources
  2. Tech debt accumulates, with fewer and fewer voluntary collaborators being involved, because life is too short to deal with the mountain of some else's tech debt
  3. We get lower funding, because we don't seem to be accomplishing as much as we used to
  4. Repeat

It may be that the name ("Quality") is poorly chosen, as it leads to this death spiral thinking. It may be that we should think reframe this as "Developer Experience" in a nod to @Sherah (WMF)'s earlier comment. In what ways can we make improving our code safer and more fun? How can we improve the motivation for people not paid by WMF to improve the code we deploy? Are WMF employees the only people that can possibly understand how many of our deployed systems work?

In our recent IRC conversation on this topic (Phab:E269), we talked about how it's important to use the Dev Summit as a place to discuss tech debt issues. @Legoktm pointed out "I think it is important that we don't lose the architecture part of the summit. Most of our day-to-day work/priorities are dictated by the products (in the user-facing sense) that we work on, and the summit is the one of the few places we get to work on the actual architecture/tech-debt" Let's spend some of our time figuring out how to pay down our tech debt.

As far as the "who pays for it?" topic, let's get each team working on new features to account for how they are going to clean up messes in their environment. Let's get out of the habit of hand-wringing about our current plight, imagining ourselves lying in filth (possibly self-created), whining about how someone else is not offering to clean up the environment we operate in.

Thankfully, MediaWiki isn't "filth", it's actually very good software that has served the movement well. We have a really good problem to solve: improve the software running on a website that is already in the global top 10. Not everything in it is written as we would write it from scratch today, but many of the people who built the original versions are still working with us and can teach us how they did it, and help us avoid making the mistakes they made. We can use this as an opportunity to celebrate our collective achievement; that we have something good enough to improve. Let's make sure that the product of future improvement will be something worthy of more improvement.

Qgil-WMF (talkcontribs)

"How to pay down our technical debt" would be a good main topic.

RobLa-WMF (talkcontribs)

I think "How do we manage our technical debt?" is a better way of putting it. To build on the "debt" metaphor we're using, many economists believe that deficit spending can be a healthy thing. With technical debt, incurring additional debt is sometimes a necessary step to get to a minimum viable product; there just needs to be a strategy to manage the load of technical debt.

Qgil-WMF (talkcontribs)
Legoktm (talkcontribs)

Well, I think there's a resourcing issue, but enough people have already talked about that. But there's also a systematic issue of that we are okay (as in, we don't block it) with people adding more technical debt, without a plan to clean it up.

  • New extensions being deployed to the cluster with no clear end goal or removal plan ("this is just a small test/experiment" and then spend 3+ months getting it undeployed)
  • People writing new fancy interfaces/classes that replace old stuff without finishing replacing it, even in core (my fear about MCR)
  • New code being written without test coverage is always technical debt.
  • Missing documentation :(
  • No policy of removal on old code (see recent CH deprecation sprint + a draft deprecation policy RfC)

There are probably more things, but it's a bit late. I think those general ideas would make for a good big group discussion, where people can discuss their needs and pros/cons in a large group setting. And then we can break into smaller "working groups" (I suppose that's the term) to analyze specific problems and solutions. So for my missing test coverage example, a working group could investigate making code coverage run before the +2, and informing developers whether they're reducing it. And then identify classes which are extremely hard to test and need refactoring, and pass that onto the rest of the developers. Or finding CI infra instabilities and providing a recommendation to the releng/labs teams for resourcing. etc. I think that would a good use of time if we want to discuss the technical debt aspect of quality.

SSastry (WMF) (talkcontribs)

@Legoktm ... this is pretty close to what was discussed at the September onsite in SF and there seemed to be broad agreement about it, right @Greg (WMF) ? Maybe coming up with specific recommendations and coalescing around those might be one idea here?

Greg (WMF) (talkcontribs)
Tgr (WMF) (talkcontribs)

I think "quality" and "tech debt" are not that useful for communicating the importance of having a well-maintained codebase to non-developers (and possibly not that helpful for developers either when they need to decide what cleanup activities to prioritize). Two topics that are not proper subsets of quality but have a large overlap, and which are framed around the targeted benefits:

  • Reliability: how do we make sure that our code is secure and robust by design, it is easy to identify and learn the correct usage patterns, mistakes are hard to make and easy to spot, errors are caught before deployment and in the rare cases when they aren't they are easy to debug? This would span topics like test coverage, exploratory testing, staging, logging infrastructure, documentation, API contracts, code review friendliness of the architecture (e.g. is it easy to tell which pieces of text need to be escaped?).
  • DX: how do we ensure that our codebase is easy to understand and efficient to use? This includes stuff like design patterns, identifying confusing APIs, consistency of the architecture, documentation, CI, logging, debugging tools, developer environment (vagrant, docker etc), IDE integration. Also getting feedback from developers on what the DX pain points are, for which the summit is a great opportunity.
Reply to "WikiDev17 topic: Software quality"

WikiDev17 topic: Distribution and Analysis

9
RobLa-WMF (talkcontribs)
This is the text of the "Distribution and Analysis" section of WikiDev17/Topic ideas as of this writing

How can we better distribute the information on our websites? What data should we make available? How should we offer it? What APIs should we offer to manipulate our content? Topics to discuss: Kiwix, ORES, Bots, RESTbase.

Fora: xmldumps, mediawiki-api, analytics-l, and research-l readers and participants

Qgil-WMF (talkcontribs)

"Distribution and Analysis" seems a too wide umbrella for a main topic? One common denominator there are the Wikimedia APIs, which has been a relevant topic during all this year and will continue to be a hot topic. What do you think about defining a main topic related to the Wikimedia APIs? Ping @BDavis (WMF) since this also affects the Labs users.

BDavis (WMF) (talkcontribs)

I'm not sure that "APIs" is much more narrow. I guess it could mean focusing more on how to improve moving data into and out of the wikis (distribution) rather than the business cases for doing so (analysis).

One topic that could be discussed both broadly and in depth would be bulk data dumps. This is an area that @ArielGlenn works in heavily and was discussed in the 2015 conference. I know that discussion lead to creating a phabricator project. Maybe there are topics in that area that are ready for additional discussion or evangelization?

Another general topic that crosses over a bit with the topics of code quality and future user interface enhancements is ensuring that all new business logic is available via the Action API or other externally automatable means. If we could make a collective decision that a SpecialPage must just be a user interface over an API that can be manipulated programmatically both inside and outside of MediaWiki+extensions then we would be turning a corner towards the possibility of radical new user interfaces (single page app, service workers, rich desktop/mobile apps).

Cscott (talkcontribs)

@BDavis (WMF) I think one of the structural issues w/ bulk data dumps has been who owns the service, and who decides what formats we should dump (or archive). I think @GWicke has taken the lead with the services teams on aspects of this, but I'm not sure that services wants to own all our dumping.

Qgil-WMF (talkcontribs)

I am still proposing "a main topic related to the Wikimedia APIs". Not "APIs" but a main idea related to our APIs that could serve as driver for many related discussions. I'll throw an idea:

Useful, consistent, and well documented APIs

Tgr (WMF) (talkcontribs)

That seems to me like a sub-topic of DX. The requirements for useful, consistent and well documented web APIs is largely the same as for useful, consistent and well documented PHP service interfaces or useful, consistent and well documented puppet roles for Labs (although the way these requirements will be implemented will of course be completely separate).

Qgil-WMF (talkcontribs)

All the main topics proposed could be a sub-topic of something wider.  :) If you are interested in this topic, you are encouraged to think of a proposal and/or promote it in your context so we get the right people and the right topics to discuss.

Tgr (WMF) (talkcontribs)

I'd be interested in gathering feedback about the usability of the action API. For third-party users, the summit is probably not the best place for that (the hackathon or Wikimania are more interesting events for third-party data reusers). For Wikimedia/MediaWiki developers, I think we should gather feedback more generally about what pain points are there. See my comment about DX in the Quality thread.

Tgr (WMF) (talkcontribs)

Offline distribution would be an interesting topic (although maybe too narrow?)

Reply to "WikiDev17 topic: Distribution and Analysis"

WikiDev17 topic: Wishlist

8
Summary last edited by Mvolz 13:03, 27 September 2016 3 years ago

Gerrit-style ratings for this topic:

RobLa-WMF (talkcontribs)
This is the text of the "Wishlist" section of WikiDev17/Topic ideas as of this writing

Broad technical plan agreed for the hardest wishes among the Community Wishlist 2016 results.

Qgil-WMF (talkcontribs)

From all the main topics proposed, this is the one that has been supported by everyone at different times. We have reflected it in the related Technical Collaboration quarterly goal (Phab:T141938). Danny Horn from the Community Tech team is on board as well.

We don't know what requests will make it to the top of the wishlist and we don't know which technical challenges will those bring, but there should be a good coverage of technical areas and skills among the Summit participants, and it is healthy to commit to prioritize those discussions whatever happens.

Let's consider this main topic selected.

Cscott (talkcontribs)

I spent a lot of time reviewing the community wishlist. I don't think it's suitable as a topic without more structure. As constituted, it is an accurate reflection of the community, but that means it's a mix of "overlooked feature we could do easily", "impossible thing we don't have resources for", "not as simple as it's made out to be", "thing WMF is already actively working on", "thing that the community needs to fix itself instead of waiting for WMF", "thing we wish we could do but the board won't give us resources", etc. Just dropping the wishlist intact as a summit topic would (IMO) be opening a can of worms, since there are so many totally different challenges involved with each item, only some of which are actually engineering-related.

(As one example, I know that one funded group has been pursuing a "what's the least possible hack we could do to claim to have addressed wishlist item X" strategy, which is a top-level approach that is worth serious meta-discussion on its own.)

Some ways the "wishlist" topic could work:

  1. As an format for invited talks. Solicit worked proposals and/or in-depth review of wishlist topics. If the wishlist is "something WMF has on its roadmap" this is an opportunity for the relevant team to present its roadmap. If the wishlist item is "not as simple as it's made out to be", then someone who understands the issues can present them and lead a discussion. If the wishlist item is "overlooked feature", then a presenter can make a case for prioritizing it, provide examples of its use in the community, etc.
  2. As an unconference. Similar to the above, but again we'd be discussing specific wishlist items and the unconference process can do some of the work of narrowing scope.
Qgil-WMF (talkcontribs)

"Just dropping the wishlist intact as a summit topic" is definitely not what this proposal is about. By the beginning of the Summit the Community Tech team will have decided the top tasks that they will focus on. Some of those tasks might be straightforward ("just work") while others might need deeper and wider discussion on existing platform limitations, different possible implementations, etc. Those are the discussions we want to have at the Summit.

There might be also discussions more meta about the Community Wishlist itself, i.e. how to make progress in the rest of tasks out of the scope of the Community Tech team (hackathons, outreach programs, Foundation team's goals, 20% projects...), how to improve survey/voting system itself, etc.

Qgil-WMF (talkcontribs)

Ping @DHorn (WMF) because he is the ultimate owner of this main topic.

Qgil-WMF (talkcontribs)

The title of this main topic could be "A plan for the Community Wishlist 2016 top results".

Mvolz (talkcontribs)
Qgil-WMF (talkcontribs)

Not exactly. We are encouraging developers to *work* on Community Wishlist projects in every hackathon we organize. At the Summit, we want to identify technical obstacles to solve the top Wishlist requests and have a plan for them (so the Community Tech team and others can start working on them).

Reply to "WikiDev17 topic: Wishlist"
Qgil-WMF (talkcontribs)

In order to make the Summit web pages more accessible for more audiences, I think it would be better to have own subpages for each main topic selected. There, the main topic could be described, pointing to big challenges and linking to related resources. Summit activities related to these main topics would be listed there as well.

What do you think?

RobLa-WMF (talkcontribs)
I think it would be better to have own subpages for each main topic selected.

Agreed. Key word is "selected", though...let's not prematurely create subpages for topics likely to be relegated to the dustbin. Sprawl can be an enemy of quick collaboration, as proposals get lost in everyone's respective watchlists. Let's try this process:

  • Each proposed WikiDev17 topic has a Flow topic in Talk:WikiDev17/Topic_ideas
  • We build the summary for each topic, which says whether or not the topic is planned to remain a WikiDev17 topic. We also collaborate on improving the prose on the WikiDev17/Topic ideas page
  • Once we have consensus about a topic being a planned topic, then we make a subpage for it.

That work for you?

RobLa-WMF (talkcontribs)

A further thought: it could make sense to use Gerrit-style +2/+1/0/-1/-2 ratings for this, where each level is:

  • +2 - Yes, please make room for this topic (potentially eliminating other topics)
  • +1 - Seems like a good idea
  • 0 - Undecided or "no comment"
  • -1 - Has problems that need to be addressed
  • -2 - Please don't pursue this topic

I think I'm going to try to structure this page around this scale, and see if I can help formulate a good discussion structure.

Cscott (talkcontribs)

Last year's approach also seemed to work, more or less, which was to collect individual items for discussion, collect votes unconference-style on attendance/interest (as you propose above), and *then* trying to collect the highly-voted ideas into "topics". We could attempt a +2/+1/0/-1/-2 vote using last year's fine-grained proposals as a starting point, for instance, which could indicate whether devs think "we addressed this adequately last year", "we've talked enough about this" or "there is still work to do in this area".

This year we seem to be trying a more top-down approach which makes me nervous because it seems like it's basically only Rob and I who have proposed topics. I think we're both smart fellows :) but I don't want this summit to be dragged exclusively in the directions of our own obsessions, with highest ranking for those obsessions we share...

Qgil-WMF (talkcontribs)

Rob, +1 to your suggestions. ;)

The model of last year works for those who know what is the Summit and know that they want to attend. Less so for all the rest of potential participants that we want to open the Summit for. The idea is to provide a bit of initial framework and then open the call for participation pretty much in the same open terms as last year.

There is more people that have been proposing topics... In any case, this "top down" stage will only last a few days more. The main topics selected will inform our active outreach and our criteria to promote travel sponsorship requests.

Qgil-WMF (talkcontribs)

The main topics are defined now. I have pointed links to subpages for each topic.

Reply to "One subpage for each main topic"
RobLa-WMF (talkcontribs)
This is the text of the "Usability" section of WikiDev17/Topic ideas as of this writing

How can we make our websites better learning environments?

The MediaWiki UX for readers hasn't changed much in a decade and it is showing an age. Meanwhile, in the internet... What needs to be done in our platform to enable a UX update?

Easier login to Wikimedia wikis allowing users to control their on-wiki identity (e.g. login using e-mail address, case-insensitive login, "display name").

Merge the different feeds for the tracking changes pages (history, usercontribs, recentchanges, watchlist, logs, etc) to allow for easier maintenance and improvements. This would make it easier to add simple "re-designs" (layout tweaks, that can be toggled on/off) to all pages at once. This would make it easier for newcomer editors/readers to understand the contents of the various pages.

Fora: usability mailing list readers and participants

Qgil-WMF (talkcontribs)

Usability has a precise meaning, and reading the description I can see the intent, but it will probably confuse both those interested in usability and those interested in the UX revamp and the features mentioned.

I still think the UX refresh and what is stopping it is a topic that outweights all the rest by large, and I would focus on that. However, that iseasy for me to say since I am not involved in any of the teams responsible of pushing the user experience of Wikimedia users further. @Sherah (WMF), your opinion is welcomed here. Do you think that the time is good to pull a main Summit topic? This would imply wide participation of UX designers, frontend engineers, the mobile web & apps developers, probably also Research / Design research and even people who can represent editors and readers.

Strainu (talkcontribs)

Why only the refresh and not UX improvements (by updates, experiments and uniformisations)? UX refresh sounds like a very narrow, specialized topic.

Cscott (talkcontribs)

There seem to be a few different interpretations of "Usability" which are being mixed. Here's my attempt at disentangling them:

  1. "Assuming the functionality of the site is completely unchanged, how can we refresh the UX to make it more usable"? This includes "layout tweaks" and "UX refresh" mentioned above.
  2. "What are the sorts of things we'd *like* to do with the site which our current UX is preventing us from doing?" This includes Rob's "Easier login to Wikimedia wikis".
  3. (Orthogonally) "What sorts of non-UX refactoring could we do to facilitate UX experimentation?" I think this would include Rob's "merge the different feeds for tracking changes" and gwicke's client-side frontend experiments, as well as perhaps ideas about creating forked wikis with alternate UX for different use cases, in the way that we currently do for mobile.
Sherah (WMF) (talkcontribs)

Thanks, Cscott. These are true.

1) See my comments below (I posted before I saw this) about the term "refresh." I would challenge us to think more about sustainable systems that allow us to constantly improve our experience on an ongoing basis.

2) This is, in a nutshell, what I do for a living. :) To put it very simply (because this response could get way too long), over the last three quarters I have spent the majority of my time interviewing users of our Web platform (mobile and desktop), as well as Android and iOS apps, to assess workflows therein for pain points and opportunities for improvement. I then take those findings to PMs and engineers to re-implement if necessary, with the help of designers on how to re-implement. I think the Summit is a good place to show our processes since we do not have these practices on other teams and everyone isn't up to speed on our approach/where we are right now with such processes. We could also find opportunities to improve said processes with community/engineering feedback.

3) Sweet! I'd love for UX and Engineering to work together on problems like these.

Sherah (WMF) (talkcontribs)

I'd also like to add that I think that Developer Experience (devs are users too!) is an important area we could use a ton more energy around improving. I would be more than thrilled to work on that issue; I think the Summit could be a great place to gather structured feedback to start working on problems around this area.

Sherah (WMF) (talkcontribs)

Hi RobLa and Quim,

Throughout the Foundation, there are many valuable ideas for modernizing the user experience, an