Requests for comment/Support for user-specific page lists in core

From MediaWiki.org
Jump to: navigation, search
General2013-06-20Ori Livneh, Steven Wallingcomplete
Request for comment
Support for user-specific page lists in core
Component General
Creation date 2013-06-20
Author(s) Ori Livneh, Steven Walling
Document status complete

This request for comment proposes to take MediaWiki's implementation of watchlists and abstract it in to a more generic facility for working with user-specific lists of pages. This could be done with only a modest refactoring of existing code, and without changing the watchlist feature itself. Finally, it illustrates how this generic facility could drive a range of powerful features, including some that have long been requested by users.

Schema changes[edit | edit source]

We propose the following changes:

  1. The addition of two columns to the watchlist table: a varbinary(14) `wl_timestamp` column, representing the time the entry was created, and wl_tag, a varbinary(100) column representing the type of the entry. Both columns would likely require an index.
  2. The tag "watchlist" would be applied to all existing rows in the table. Existing queries for watchlist items in core would need to be amended to add an additional constraint, WHERE `wl_tag` = `watchlist`.
  3. The `wl_user` key would be altered to enforce uniqueness on (`wl_user`, `wl_tag`, `wl_namespace`, `wl_title`)
Doesn't the proposed schema limit (with an additional wl_tag column) limit each watchlist entry to a single tag? A separate, normalized table may make more sense here. --MZ

API[edit | edit source]

User-specific page list would be managed by a new UserPageList class:

<?php
/**
 * Represents a user-specific list of pages. Each list is identified by a tag, 
 * which is a short, free-form string. Lists are unique per (user, tag) 
 * combination.
 */
class UserPageList implements Iterator {
 
	function __construct( User $user, string $tag );
 
	/**
	 * Return the string identifier of this list. Tags are unique per user.
	 * @return string
	 */
	function getTag();
 
	/**
	 * Check whether the supplied title is included in the list.
	 * @return bool
	 */
	function hasPage( Title $page );
 
	/**
	 * Add a page to the list.
	 */
	function addPage( Title $page );
 
	/**
	 * Remove a page from the list.
	 */
	function removePage( Title $page );
 
	/**
	 * Get the number of pages in the list.
	 * @return int
	 */
	function getPageCount();
 
	/**
	 * Return the user to whom this list belongs.
	 * @return User
	 */
	function getUser();
}

Code sample[edit | edit source]

Interactive shell
 
php > $list = $user->getPageList( 'watchlist' );
php > // Equivalent to:
php > $list = new UserPageList( $user, 'watchlist' );
php > var_dump( $list->getTag() );
string(9) "watchlist"
php > $title = Title::newFromText( 'Main page' );
php > var_dump( $list->hasPage( $title ) );
bool(true);
php > $list->addPage( $title );
php > $list->removePage( $title );
php > foreach ( $list as $title ) {
php {	echo $title->getText() . "\n";
php {   }
Barack Obama
Main page
Republic Records
php > echo 'You have ' . $wgUser->getPageList( 'watchlist' )->getPageCount() . ' items in your watchlist!';
You have 3 items in your watchlist!

Use cases[edit | edit source]

Like revision tags, wl_tags are metadata that could be used in a variety of ways. Tags could be added, edited, or removed; both manually by users or programmatically by extensions, gadgets, and so on.

Revision tags have a number of issues with regard to normalization: spaces v. dashes v. underscores, non-ASCII characters, length limitations, case sensitivity, etc. We should be cautious of following this approach. --MZ
Good point. We should try and solve some of these issues before they come and bite us in the butt. --SW
Let's avoid creating a schema that can't be effectively sorted, though. I'm thinking here about the externallinks table, where there's no good way to page through a long query since the partial column indexes don't support non-filesorting sorts. Anomie (talk) 13:48, 21 June 2013 (UTC)
  • Tags could be added directly to the current Special:Watchlist interface, for which there is a lengthy wishlist of features. In particular, this schema change would be a step toward the following ideas...
    • Tags could be used to create watchlist items that expire within a given time period (bug 6964).
    • Grouping of watchlist items (bug 5875, bug 20444)
    • Watching bundles of pages (bug 2308)
    • Creating multiple or sub-watchlists, some of which might be public (bug 7467)
  • Tags could be used to create a to-do list for editors (using structured data, as opposed to a simple wiki page).
  • Tags could be used to create a "read later" or favorites list.
  • Tags could be used to create reminders about pages (bug 582)
  • Tagging a page to be saved for offline reading could create a cached list for users on desktop or mobile devices
  • Users could create custom tags for things they care about. (Ideally, users would be able to bulk edit, add, or remove tags, just like they can with watchlist items. bug 33888)
  • Extensions -- such as GettingStarted or PageTriage -- could keep track of tasks you've started or completed on a page.
  • Tags, such as for pages you've created or files you've uploaded, could be shared publicly in a list on your userpage or in other places.
  • Widely-used community tools like bots or gadgets could create tagged lists (or add to lists) for users who wanted them. For example, a bot could deliver a list of new pages associated with a WikiProject to interested users. Bots that suggest tasks to do, like SuggestBot and DPL Bot, could store and deliver suggestions in better ways. Gadgets like Twinkle could more easily help users keep track of pages on which they have taken certain actions, such as nominating for deletion.

Comments[edit | edit source]

  • I like the proposal. It's generally clear and to the point. Please specify that wl_tags is intended to be a human-readable string, and that only reason it's binary is because that's how core handles strings. I'd like to go further and suggest a naming convention for extension-started or extension-managed tags (e.g. "Extension:GettingStarted/tasks"), rather than every extension coming up with its own naming convention (like was done with i18n message keys). Superm401 - Talk 00:09, 21 June 2013 (UTC)
  • I think the timestamp is essential. The tag would seemingly enable users to have individual todo lists. That seems useful. What would it take to enable a user to share a distinctly tagged portion of such watchlist with others. If we do not do that we may pass up a chance to enhance collaboration. DCDuring (talk) 00:43, 21 June 2013 (UTC)
  • In its current state, the watchlist serves only one use case really well: I care about a page enough to monitor all changes to it for the indefinite future. It's not very intelligent, and doesn't help me distinguish between different types of pages I care about. There are numerous reasonable outstanding requests for feature enhancements around the watchlist, but the current schema for the watchlist table is inflexible enough to block many of these requests, because it was designed solely to denote the binary watch/unwatch state and single gargantuan list to present users with. With fairly minimal changes, I think it is possible to make the underlying structure meet the needs of a variety of features MediaWiki users need for organizing user-specific lists of pages. With the proposal in its current state, we tried to outline a way to gain more power and flexibility without needing to change the way watchlists behave as far as editors are concerned. Steven Walling (WMF) • talk 00:40, 21 June 2013 (UTC)
  • Could the tag be used for what users perceive as multiple tags? Could one tag indicate the sections (header or some CSS) that a user was interested in. Could such a tag be read by the watchlist software to offer the user only items changed in the designated sections. This matters for Wiktionary a great deal because we can have many language sections on a page with a given user normally only interested in one or a few of them. It has come up repeatedly at the top of desiderata for the watchlist. DCDuring (talk) 21 June 2013 (UTC)
    • Yes. There'd be nothing stopping us from literally exposing tags ala Flickr-style tags. We're not tied to doing it that way, though. Steven Walling (WMF) • talk 01:03, 21 June 2013 (UTC)
  • Andre K. is a pretty big opponent of free-form tagging. He makes a reasonable case against its use (it came up in the context of Bugzilla's keywords field, which requires an administrator to add or remove a defined keyword). You should ask him to weigh in here. --MZMcBride (talk) 02:32, 21 June 2013 (UTC)
    • Andre's input would be great, but for the sake of accuracy, I think I should point out there is nothing in these schema changes that requires us to support free-form tagging by users. We could in fact set tags only using a predetermined set allowed by core and/or extensions. Unlike categories, which are set via free-form wikitext syntax, you could potentially disallow willy-nilly creation of new tags. Personally I think the success of projects like Delicious and Flickr strongly suggests tagging by users works though. Steven Walling (WMF) • talk 05:05, 21 June 2013 (UTC)
      • OTOH, "willy-nilly creation of new tags" is what is needed to allow some of the suggested use cases. Anomie (talk) 13:43, 21 June 2013 (UTC)
  • This sounds great. I'd also love it if this could be extended with a notion of private and public lists. It would be great to be able to group various articles and share them with a wider audience who can subscribe to them. This would open up so many features... Jdlrobson (talk) 02:44, 21 June 2013 (UTC)
  • Instead of implementing a new simple tagging system, why not just finish https://gerrit.wikimedia.org/r/#/c/16419/ ? Eran (talk) 05:27, 21 June 2013 (UTC)
    • That patch is pretty big, and enables just one feature. This schema change would enable many features. Steven Walling (WMF) • talk 06:28, 21 June 2013 (UTC)
      • I suspect you're using the verb "to enable" with different meanings here: one is "make active and available" (the paginated watchlist/multiple watchlist features, IIRC); the other "make it possible for future changes". --Nemo 07:10, 21 June 2013 (UTC)
        • The change is actually very similar to the proposed, but instead of "tag" it is called "group" and there is also a different entity for it (watchlist_group table). This could enable both private/public groups for watchlists. How will the tag enable "public" watchlists? Eran (talk) 15:12, 21 June 2013 (UTC)
          • Eran, thanks for flagging the patch. We discussed this within our team and decided to take some time to go over the patch and assess its suitability. We'll report back. --Ori.livneh (talk) 18:28, 21 June 2013 (UTC)
  • I like the concept, but the specific proposal seems a bit unfocused. First, "tag" seems like a poor name; "list" or "group" seems more accurate IMO. And then there's the comment that watchlists would use the "watchlist" tag, and later talk about watchlists using an unspecified (and possibly user-defined!) list of tags. I'm also not sure about the comment about todo lists using "structured data", as there isn't much structure possible here (you get date and page, but no comment on what it was you were intending to do). Anomie (talk) 13:43, 21 June 2013 (UTC)
    • If the comment about structured todo lists is confusing the matter feel free to ignore it. The point of the schema change isn't to serve the needs of that one particular feature, but to provide a structure that could serve multiple use cases for storing user-specific lists of pages. Steven Walling (WMF) • talk 18:33, 21 June 2013 (UTC)
  • There are different wishlist items that have different needs.
    • At Wiktionary the language-specific watchlist need would merit something specific: each user being interested in watching language-specific content from pages on her watchlist to filter out content in any of the other languages, each language on a page being under a specific L2 header from a well defined list of name which correspond to language codes specific to Wiktionary (close to !SO). Each page should be and usually is in one of a set of categories that indicate that the page contains content in that language. That information about language interest is user-specific, not specific to each entry. The proposed tag field seems rather unsuited to this specific task, which is central to all the Wiktionaries. I would be interested in whether I am missing something about how this field could be used for this specific purpose.
    • Other needs at Wiktionary seem closer to what WP users need.
      • I wonder whether an indexed date field intended to be something like "watch expiration date" would be worthwhile. It could be a means of radically reducing the size of watchlists which often accrete items not of great interest to the user, especially after the first month/week/day after an edit. Such a date could be set by default to something the uses determines, with users having the option to manually select any date or "no-expiration" (not available as default).
      • I had thought that privately shared tags were desirable, but I am not so sure that it doesn't create the potential for cabals engaged in projects that have insufficient visibility to the community as a whole. Publicly shared tags seem better. The tag systems that I have seen usually show how an item is already tagged by others. DCDuring (talk) 15:59, 21 June 2013 (UTC)
  • How would the api.php API for this look, and how would existing API actions and queries for watchlist be affected? Matma Rex (talk) 21:42, 22 June 2013 (UTC)
  • The PHP watchlist code is currently quite awful and non-extensible (it's pretty much impossible to inherit from it to do something slightly different). How are you going to handle this? (Related: bug 48641.) Matma Rex (talk) 21:42, 22 June 2013 (UTC)
    • Throw it in the bin and start again, I hope. Any code which still uses export() is, IMO, unsalvageable... :D Happymelon 10:23, 24 June 2013 (UTC)