Live Chat System

This page documents the ideas and effort behind a proposal for a synchronous chat system to support MediaWiki editors with a focus on English Wikipedia.

Rationale
Wikipedia is seeing a slow, steady decline in editorship. This is especially true on enwiki, but is seen on wikis in other languages as well. One of the primary goals of the WMF in 2011-2012 is to arrest this decline.

Some of the primary reasons have been identified as:
 * Poor editing usability
 * Difficulty understanding policy for new editors
 * Existing editor distrust of newbies
 * Lack of new editor integration into the community

I believe it will be possible to at least partially address each of these issues with the careful deployment of a usable, real-time chat system.

Concurrent editing
We are currently working on Concurrent editing. To be functional, a concurrent editor needs a chat subsystem (see EtherPad, Google Docs, etc). Otherwise people just chat in the doc they're editing, which is inconvenient and leads to errors. Chat requires nontrivial investment in infrastructure and maintenance, so we should try to integrate all our tools into the same system.

Live help
A "Live Help" button is added in several strategic places across the Wiki for users with a registered account. When clicked, the system tracks down an experienced editor to answer questions one-on-one.

Merit
Previous work shows only a very small proportion of newcomers to Wikipedia who ask for help will do so effectively. By improving the visibility and usability of help systems in Wikipedia we should be able to dramatically improve the new user experience.

In addition to helping new editors understand how to use the Mediawiki software and work within Wikipedia policies, this feature will help experienced editors to understand newbies, integrate new editors into the community, and give article maintainers a way to help guarantee quality aside from policing edits.

Identifying helpers
An ideal implementation might look like this: An experienced editor sets a pref indicating their willingness to lend a hand. Presence information is maintained using the chat system, which is queried when help is requested. We offer the option of helping to 2-3 editors using a notification, and cancel the notification when someone answers. We start by querying a small number of users, and fan out to larger numbers as time goes on (perhaps even posting a link to an IRC channel). This balances finding help in a timely fashion against bothering too many editors with many help requests. Who to ask first could be chosen randomly, or using some sort of more intelligent heuristic.

If collaborative editing is available, the helper who answers the call gets pulled into the editing session. If it's not available, they get a one-on-one chat session.

Topic-based chat
Currently, it's difficult to be social while editing Wikipedia. We have talk pages, but they're not a place for casual conversation. While it's true that people come together to work side-by-side around centers of common interest, those communities are sustained by the person-to-person bonds that form within that framework. Furthermore, actual human connection is a potent antidote to pedantry, trollism, etc.

Much of WMF's community engagement projects center around this idea. We promote conferences, meetups, hack-a-thons... We could do a lot more to enable this sort of connection online.

There are a number of challenges here, which must all be balanced:
 * Social chat must be topical--we can't expect people who want to talk about their love of arachnids to hang out in the same chatroom as those who want to discuss WWII-vintage aircraft. It must be possible to create a chatroom for any topic, but that means a lot of chatrooms.
 * When choosing to chat, people must be presented with a manageable list of rooms. That is, no more than 10 (with the option to discover more).  Which rooms to present will most likely be based on topics related to the current article.
 * The UI must prioritize joining an existing chat over creating a new one. People click "chat" because they want to communicate.  We must find them people to communicate with.
 * People who want to talk about a thing don't necessarily want to edit its article. It must be possible for a subset of users in a room to switch from discussing a topic to editing a page.  This doesn't have to happen in the chat UI, but it's important that a page's chat and its edit-chat be separate entities.
 * People are more candid and open when their words aren't archived. However, when decisions are made about the content of an article, it's important to keep a record of it.

Category chat
A possible implementation: Each article has a "Chat" option in the page header, next to "Article" and "Discussion". Clicking drops down a chat drawer, which loads room information relevant to this page. Options could include the four most popular related chats, the option to create a new room here, a link to find more rooms, and a couple global rooms (see below).

Determining article "relatedness" is a tough problem. Some inroads have been made, but we'd have to settle on a strategy here. I can imagine basing it on the existing category tree where that is developed enough. What might work best is a combination of automatic discovery, combined with hinting from editors. A couple possibilities for hinting:


 * a template that points at a different article where relevant chatrooms often form
 * a template that indicates a point in a category tree where it makes sense to host a chatroom

This is perhaps the hardest problem here: how do we create effective tools to map chat-space onto article-space? It has to be done, though, since the current article is all we know of a given reader's present interest.

Social chat
Much like category-based chat, but these rooms exist outside the article namespace. This would be the place for a meta-room (the way we use IRC now), or a strictly social room without any specific topic. The most popular or relevant of these should be displayed in the chat drawer as well.

Plan
It's in our best interest to deploy the simplest usable system. That probably means implementing live-help as it's the most clearly beneficial feature, then adding topic-based chat and concurrent editor support in a phased deployment.

Wikia node.js-based chat
Wikia wrote a chat system from scratch, which is available under the GPL. It's currently deployed in Wikia Labs, and supports about 15000 concurrent users on a single server.

The client is implemented using jQuery and backbone.js. The server is also built using backbone.js running in Node.js, and is backed by a redis database.

Tomasz at Wikia (not WMF's Tomasz) got a hacked version of their code running on 1.17, and I got it to run against trunk as of early August 2011. It takes advantage of a number of Wikia-specific extensions, so it's not exactly plug-and-play, but could be adapted fairly quickly.

I've met with folks at Wikia, and we've discussed merging our requirements and creating a single chat system, rather than a WMF fork. They're amenable to the idea, and we might move it into the WMF SVN repo.

Pros

 * We know JavaScript, so it'll be easy to change all this software whenever we want
 * Mostly integrated with MediaWiki.
 * Mediawiki permissions support built-in
 * Developing this will form a closer relationship with wikia, which would be nice
 * Uses socket.io for browser transport
 * The Node.js community is growing, so expertise here will be easy to find
 * Efficiently uses server resources
 * redis is probably an inevitable part of our infrastructure anyway

Cons

 * Not entirely integrated. Wikia chat depends on:
 * Some 1.16 features
 * Wikia avatar service
 * Wikia JS-based localization library (but we have one of these now)
 * We don't know redis (yet)
 * No external client support
 * Scaling is hard--this system runs on one server, and clustering hasn't been implemented. We will have to do this, since wikia has little incentive, as their multi-wiki topology means they can shard their chat infrastructure.  Redis-cluster will probably make this easier, but it's not done yet.
 * There's no way to connect to IRC. We'd have to write that if we want it.
 * Monitoring this service will require us to write the components from scratch. Nothing exists because it's all custom.
 * Doesn't currently support cross-wiki chat, or the concept of including the wiki in a user's identity.

Jabber
A second option is an XMPP-based solution. During my research, I set up ejabberd in my dev environment, wrote an authentication plugin in PHP, and integrated that with Neilk's experimental InternalAuth extension. I wrote a simple extension that places the Candy chat client at a special page. It works pretty seamlessly.

Pros

 * Already scalable. ejabberd has been demonstrated with 300k concurrent users.
 * Very tested.
 * Standard protocol (XMPP)
 * External client support is pretty good.
 * While we don't know erlang, people in the Bay Area do, so we could find consultants if we need them.
 * Authentication is easy to integrate (has been done, though could use cleanup)
 * Hooks up to IRC easily.
 * High-reliability
 * Monitoring should be straightforward, since client libraries are common.
 * "Hot" updates with erlang (that is, push server code with no downtime)
 * Cross-wiki chat would work fine
 * Web client is more mature
 * We get software improvements for free, since the components are maintained elsewhere.

Cons

 * Nobody knows erlang, so extending the server will require learning a language
 * XMPP is verbose and topheavy.
 * Inefficient use of network
 * uses BOSH (long-polling), which sends more data than socket.io
 * websocket support exists, but is quite experimental
 * Mediawiki permissions are not integrated, we'd need to write server code for that, but the plugin interface seems pretty decent.

Server discussion
A quick brain dump on server research (citation needed, etc):

ejabberd: scalable, stable, and widely deployed. People seem to say this one is best.

openfire: written in Java. My previous experiences working with Jive suggest that they can't be relied upon, even when dealing with their commercial clients. Also, indications point to ejabberd being better, and we don't speak java here either.

jabberd2 & jabberd14: I remember deciding not to go with these, but I'd have to repeat the research to remind myself of why. If someone has a compelling case for them, we should look into them again.

Why not IRC?
Put simply, IRC suffers from poor usability. The major issues are:
 * Nickname collisions (or hijacking). Most users will not understand or care that the chat system is in any way independent of the wiki, and it will be unclear why their username doesn't work.  We need a 1:1 identity correspondence between chat and wiki.  While it's possible to mitigate this somewhat by registering with Nickserv, the process is anything but simple.
 * Departure from current user-model. In 2011, people expect a chat system to function like Facebook chat, iChat, or AIM.  IRC handles many common tasks differently from those systems.  Nobody's joining the chat room so they can learn a new piece of software--they're doing it so they can communicate with other people.

I'm not proposing that we replace IRC. It serves us well and will continue to do so. Rather, we supplement it with a system that mirrors the present-day user model of how a chat system functions.

IRC integration is possible with Jabber, so if this is a thing that's in demand, it's a point in favor of building a Jabber-based solution.