Live Chat System

This page documents the ideas and effort behind a proposal for a synchronous chat system to support MediaWiki editors with a focus on English Wikipedia.

Rationale
Wikipedia is seeing a slow, steady decline in editorship. This is especially true on enwiki, but is seen on wikis in other languages as well. One of the primary goals of the WMF in 2011-2012 is to arrest this decline.

Some of the primary reasons have been identified as:
 * Poor editing usability
 * Difficulty understanding policy for new editors
 * Existing editor distrust of newbies
 * Lack of new editor integration into the community

I believe it will be possible to at least partially address each of these issues with the careful deployment of a usable, real-time chat system.

Concurrent editing
We are currently working on Concurrent editing. To be functional, a concurrent editor needs a chat subsystem (see EtherPad, Google Docs, etc). Otherwise people just chat in the doc they're editing, which is inconvenient and leads to errors. Chat requires nontrivial investment in infrastructure and maintenance, so we should try to integrate all our tools into the same system.

Live help
A live help system is intended to allow users in distress one-click access to an experienced (hopefully re-assuring) wiki-denizen to help the user.
 * See Live Chat System/Help

Topic-based chat
Currently, it's difficult to be social while editing Wikipedia. We have talk pages, but they're not a place for casual conversation. While it's true that people come together to work side-by-side around centers of common interest, those communities are sustained by the person-to-person bonds that form within that framework. Furthermore, actual human connection is a potent antidote to pedantry, trollism, etc.

Much of WMF's community engagement projects center around this idea. We promote conferences, meetups, hack-a-thons... We could do a lot more to enable this sort of connection online.

There are a number of challenges here, which must all be balanced:
 * Social chat must be topical--we can't expect people who want to talk about their love of arachnids to hang out in the same chatroom as those who want to discuss WWII-vintage aircraft. It must be possible to create a chatroom for any topic, but that means a lot of chatrooms.
 * When choosing to chat, people must be presented with a manageable list of rooms. That is, no more than 10 (with the option to discover more).  Which rooms to present will most likely be based on topics related to the current article.
 * The UI must prioritize joining an existing chat over creating a new one. People click "chat" because they want to communicate.  We must find them people to communicate with.
 * People who want to talk about a thing don't necessarily want to edit its article. It must be possible for a subset of users in a room to switch from discussing a topic to editing a page.  This doesn't have to happen in the chat UI, but it's important that a page's chat and its edit-chat be separate entities.
 * People are more candid and open when their words aren't archived. However, when decisions are made about the content of an article, it's important to keep a record of it.

Category chat
A possible implementation: Each article has a "Chat" option in the page header, next to "Article" and "Discussion". Clicking drops down a chat drawer, which loads room information relevant to this page. Options could include the four most popular related chats, the option to create a new room here, a link to find more rooms, and a couple global rooms (see below).

Determining article "relatedness" is a tough problem. Some inroads have been made, but we'd have to settle on a strategy here. I can imagine basing it on the existing category tree where that is developed enough. What might work best is a combination of automatic discovery, combined with hinting from editors. A couple possibilities for hinting:


 * a template that points at a different article where relevant chatrooms often form
 * a template that indicates a point in a category tree where it makes sense to host a chatroom

This is perhaps the hardest problem here: how do we create effective tools to map chat-space onto article-space? It has to be done, though, since the current article is all we know of a given reader's present interest.

Social chat
Much like category-based chat, but these rooms exist outside the article namespace. This would be the place for a meta-room (the way we use IRC now), or a strictly social room without any specific topic. The most popular or relevant of these should be displayed in the chat drawer as well.

Plan
It's in our best interest to deploy the simplest usable system. That probably means implementing live-help as it's the most clearly beneficial feature, then adding topic-based chat and concurrent editor support in a phased deployment.

Wikia node.js-based chat
Wikia wrote a chat system from scratch, which is available under the GPL. It's currently deployed in Wikia Labs, and supports about 15000 concurrent users on a single server.

The client is implemented using jQuery and backbone.js. The server is also built using backbone.js running in Node.js, and is backed by a redis database.

Tomasz at Wikia (not WMF's Tomasz) got a hacked version of their code running on 1.17, and I got it to run against trunk as of early August 2011. It takes advantage of a number of Wikia-specific extensions, so it's not exactly plug-and-play, but could be adapted fairly quickly.

I've met with folks at Wikia, and we've discussed merging our requirements and creating a single chat system, rather than a WMF fork. They're amenable to the idea, and we might move it into the WMF SVN repo.

Pros

 * We know JavaScript, so it'll be easy to change all this software whenever we want
 * Mostly integrated with MediaWiki.
 * Mediawiki permissions support built-in
 * Developing this will form a closer relationship with wikia, which would be nice
 * Uses socket.io for browser transport
 * The Node.js community is growing, so expertise here will be easy to find
 * Efficiently uses server resources
 * redis is probably an inevitable part of our infrastructure anyway

Cons

 * Not entirely integrated. Wikia chat depends on:
 * Some 1.16 features
 * Wikia avatar service
 * Wikia JS-based localization library (but we have one of these now)
 * We don't know redis (yet)
 * No external client support
 * Scaling is hard--this system runs on one server, and clustering hasn't been implemented. We will have to do this, since wikia has little incentive, as their multi-wiki topology means they can shard their chat infrastructure.  Redis-cluster will probably make this easier, but it's not done yet.
 * There's no way to connect to IRC. We'd have to write that if we want it.
 * Monitoring this service will require us to write the components from scratch. Nothing exists because it's all custom.
 * Doesn't currently support cross-wiki chat, or the concept of including the wiki in a user's identity.

Implementation notes
The best working code (last used with pre-1.18, needs work on localization) is available in the extensions-realtime branch

Jabber
A second option is an XMPP-based solution. During my research, I set up ejabberd in my dev environment, wrote an authentication plugin in PHP, and integrated that with Neilk's experimental InternalAuth extension. I wrote a simple extension that places the Candy chat client at a special page. It works pretty seamlessly.

Pros

 * Already scalable. ejabberd has been demonstrated with 300k concurrent users.
 * Very tested.
 * Standard protocol (XMPP)
 * External client support is pretty good.
 * While we don't know erlang, people in the Bay Area do, so we could find consultants if we need them.
 * Authentication is easy to integrate (has been done, though could use cleanup)
 * Hooks up to IRC easily.
 * High-reliability
 * Monitoring should be straightforward, since client libraries are common.
 * "Hot" updates with erlang (that is, push server code with no downtime)
 * Cross-wiki chat would work fine
 * Web client is more mature
 * We get software improvements for free, since the components are maintained elsewhere.

Cons

 * Nobody knows erlang, so extending the server will require learning a language
 * XMPP is verbose and topheavy.
 * Inefficient use of network
 * uses BOSH (long-polling), which sends more data than socket.io
 * websocket support exists, but is quite experimental
 * Mediawiki permissions are not integrated, we'd need to write server code for that, but the plugin interface seems pretty decent.

Server discussion
A quick brain dump on server research (citation needed, etc):

ejabberd: scalable, stable, and widely deployed. People seem to say this one is best.

openfire: written in Java. My previous experiences working with Jive suggest that they can't be relied upon, even when dealing with their commercial clients. Also, indications point to ejabberd being better, and we don't speak java here either.

jabberd2 & jabberd14: I remember deciding not to go with these, but I'd have to repeat the research to remind myself of why. If someone has a compelling case for them, we should look into them again.

Implementation notes

 * A mediawiki extension that implements the Candy chat client can be found here in the extensions-realtime branch
 * An example ejabberd configuration that works with the extension is also provided.
 * This extension requires that User:Neilk's experimental IdentityApi extension.
 * See the README for more details

Why not IRC?
Put simply, IRC suffers from poor usability. The major issues are:
 * Nickname collisions (or hijacking). Most users will not understand or care that the chat system is in any way independent of the wiki, and it will be unclear why their username doesn't work.  We need a 1:1 identity correspondence between chat and wiki.  While it's possible to mitigate this somewhat by registering with Nickserv, the process is anything but simple.
 * Departure from current user-model. In 2011, people expect a chat system to function like Facebook chat, iChat, or AIM.  IRC handles many common tasks differently from those systems.  Nobody's joining the chat room so they can learn a new piece of software--they're doing it so they can communicate with other people.

I'm not proposing that we replace IRC. It serves us well and will continue to do so. Rather, we supplement it with a system that mirrors the present-day user model of how a chat system functions.

IRC integration is possible with Jabber, so if this is a thing that's in demand, it's a point in favor of building a Jabber-based solution.

Why Not IRC if not permanently then at least until something better comes along?
IRC may offer from poor usability for the average user, but they also don't need to use many of its features. Freenode IRC is already used in much of the wikipedia project.

https://meta.wikimedia.org/wiki/Special:MyLanguage/IRC/Channels#MediaWiki_and_technical

As for the wiki, it would suffice to add a tab next to the talk tab, which pops out a freenode webchat on the channel


 * 1) Wikipedia-article-article_name

Editors are at least familiar enough with IM to enter text in a chat box. They don't need to change channels, or anything fancy. As for editor name collisions, simply add a random string onto the end of the username, and that should work well enough to avoid collisions.