Requests for comment/Support for user-specific page lists in core

From mediawiki.org
Request for comment (RFC)
Support for user-specific page lists in core
Component General
Creation date
Author(s) Ori Livneh, Steven Walling
Document status stalled
See Phabricator.

This request for comment proposes to take MediaWiki's implementation of watchlists and abstract it in to a more generic facility for working with user-specific lists of pages. This could be done with only a modest refactoring of existing code, and without changing the watchlist feature itself. Finally, it illustrates how this generic facility could drive a range of powerful features, including some that have long been requested by users.

Schema changes[edit]

  • A new Lists table will be introduced that stores meta data around table.
    • list_owner
    • list_created (timestamp)
    • list_id
      • nice to have fields
        • list_private (boolean) - I would like us to build privacy into this meta data table e.g. isPrivate boolean field to allow a better infrastructure for public/private watchlists (see T9467)
    • Owner information will be moved into this table from the existing watchlist table in a migration script. Existing watchlist entries would get new lists created for each user.
  • We propose the following changes to the existing watchlist table:
    • The addition of a column wl_list_id which references an entry into the new lists table
    • Removal of wl_user
  • Tim Starling and Jon Robson to make a patch for proposed changes.
  • Impacts on Special:Watchlist
    • Special:Watchlist would default to the oldest list to the user has available (for all users this will be existing watchlist)
    • As a result of this change initially there will be no interface for viewing / creating other lists but this work will be enabled by all the above.

Future possible schema changes:

  • Add a timestamp field to all watchlist entries which stores the timestamp when the article was added to the list. This enables:
    • Users to expire older items on their watchlist after a certain threshold
    • Locate newer articles they have added to their watchlist

Earlier proposals:

  1. The addition of two columns to the watchlist table: a varbinary(14) `wl_timestamp` column, representing the time the entry was created, and wl_tag, a varbinary(100) column representing the type of the entry. Both columns would likely require an index.
  2. The tag "watchlist" would be applied to all existing rows in the table. Existing queries for watchlist items in core would need to be amended to add an additional constraint, WHERE `wl_tag` = `watchlist`.
  3. The `wl_user` key would be altered to enforce uniqueness on (`wl_user`, `wl_tag`, `wl_namespace`, `wl_title`)
Doesn't the proposed schema (with an additional wl_tag column) limit each watchlist entry to a single tag? A separate, normalized table may make more sense here. --MZ

API[edit]

User-specific page list would be managed by a new UserPageList class:

<?php
/**
 * Represents a user-specific list of pages. Each list is identified by a tag, 
 * which is a short, free-form string. Lists are unique per (user, tag) 
 * combination.
 */
class UserPageList implements Iterator {

	function __construct( User $user, string $tag );

	/**
	 * Return the string identifier of this list. Tags are unique per user.
	 * @return string
	 */
	function getTag();

	/**
	 * Check whether the supplied title is included in the list.
	 * @return bool
	 */
	function hasPage( Title $page );

	/**
	 * Add a page to the list.
	 */
	function addPage( Title $page );

	/**
	 * Remove a page from the list.
	 */
	function removePage( Title $page );

	/**
	 * Get the number of pages in the list.
	 * @return int
	 */
	function getPageCount();

	/**
	 * Return the user to whom this list belongs.
	 * @return User
	 */
	function getUser();
}

Code sample[edit]

Interactive shell

php > $list = $user->getPageList( 'watchlist' );
php > // Equivalent to:
php > $list = new UserPageList( $user, 'watchlist' );
php > var_dump( $list->getTag() );
string(9) "watchlist"
php > $title = Title::newFromText( 'Main page' );
php > var_dump( $list->hasPage( $title ) );
bool(true);
php > $list->addPage( $title );
php > $list->removePage( $title );
php > foreach ( $list as $title ) {
php {	echo $title->getText() . "\n";
php {   }
Barack Obama
Main page
Republic Records
php > echo 'You have ' . $wgUser->getPageList( 'watchlist' )->getPageCount() . ' items in your watchlist!';
You have 3 items in your watchlist!

Use cases[edit]

  • Tags could augment Special:Watchlist interface, for which there is a lengthy wishlist of features. In particular, this schema change would be a step toward the following ideas...
    • Tags could be used to create watchlist items that expire within a given time period (bug 6964) or produce reminders (bug 582).
      • One temporary watchlist could contain pages saved for offline reading.
    • Grouping of watchlist items (bug 5875, bug 20444)
    • Watching bundles of pages (bug 2308)
    • Creating multiple or sub-watchlists, some of which might be public (bug 7467)
  • Tags could be used to create a "read later" or favorites list.
  • Users could create custom tags for things they care about. (Ideally, users would be able to bulk edit, add, or remove tags, just like they can with watchlist items. bug 33888)

Aspects and limitations[edit]

  • All the use cases above seem to assume that it's possible to attach any number of tags to a page.
  • Like revision tags, wl_tags are metadata that could be used in a variety of ways. Tags could be added, edited, or removed; both manually by users or programmatically by extensions, gadgets, and so on.
    • Revision tags have a number of issues with regard to normalization: spaces v. dashes v. underscores, non-ASCII characters, length limitations, case sensitivity, etc. We should be cautious of following this approach. --MZ
      • Good point. We should try and solve some of these issues before they come and bite us in the butt. --SW
        • Let's avoid creating a schema that can't be effectively sorted, though. I'm thinking here about the externallinks table, where there's no good way to page through a long query since the partial column indexes don't support non-filesorting sorts. Anomie (talk) 13:48, 21 June 2013 (UTC)[reply]
  • If lists could exist on their own for everyone, outside a user watchlist
    • Widely-used community tools like bots or gadgets could create tagged lists (or add to lists) for users who wanted them. For example, a bot could deliver a list of new pages associated with a WikiProject to interested users. Bots that suggest tasks to do, like SuggestBot and DPL Bot, could store and deliver suggestions in better ways. Gadgets like Twinkle could more easily help users keep track of pages on which they have taken certain actions, such as nominating for deletion.
    • Tags could be used to create a to-do list for editors outside wiki pages.
  • If tags could be attached for you to things outside your watchlist
    • Extensions -- such as GettingStarted or PageTriage -- could keep track of tasks you've started or completed on a page.
    • Tags, such as for pages you've created or files you've uploaded, could be shared publicly in a list on your userpage or in other places.

Comments[edit]

  • I like the proposal. It's generally clear and to the point. Please specify that wl_tags is intended to be a human-readable string, and that only reason it's binary is because that's how core handles strings. I'd like to go further and suggest a naming convention for extension-started or extension-managed tags (e.g. "Extension:GettingStarted/tasks"), rather than every extension coming up with its own naming convention (like was done with i18n message keys). Superm401 - Talk 00:09, 21 June 2013 (UTC)[reply]
  • I think the timestamp is essential. The tag would seemingly enable users to have individual todo lists. That seems useful. What would it take to enable a user to share a distinctly tagged portion of such watchlist with others. If we do not do that we may pass up a chance to enhance collaboration. DCDuring (talk) 00:43, 21 June 2013 (UTC)[reply]
  • In its current state, the watchlist serves only one use case really well: I care about a page enough to monitor all changes to it for the indefinite future. It's not very intelligent, and doesn't help me distinguish between different types of pages I care about. There are numerous reasonable outstanding requests for feature enhancements around the watchlist, but the current schema for the watchlist table is inflexible enough to block many of these requests, because it was designed solely to denote the binary watch/unwatch state and single gargantuan list to present users with. With fairly minimal changes, I think it is possible to make the underlying structure meet the needs of a variety of features MediaWiki users need for organizing user-specific lists of pages. With the proposal in its current state, we tried to outline a way to gain more power and flexibility without needing to change the way watchlists behave as far as editors are concerned. Steven Walling (WMF) • talk 00:40, 21 June 2013 (UTC)[reply]
  • Could the tag be used for what users perceive as multiple tags? Could one tag indicate the sections (header or some CSS) that a user was interested in. Could such a tag be read by the watchlist software to offer the user only items changed in the designated sections. This matters for Wiktionary a great deal because we can have many language sections on a page with a given user normally only interested in one or a few of them. It has come up repeatedly at the top of desiderata for the watchlist. DCDuring (talk) 21 June 2013 (UTC)
  • Andre K. is a pretty big opponent of free-form tagging. He makes a reasonable case against its use (it came up in the context of Bugzilla's keywords field, which requires an administrator to add or remove a defined keyword). You should ask him to weigh in here. --MZMcBride (talk) 02:32, 21 June 2013 (UTC)[reply]
    • Andre's input would be great, but for the sake of accuracy, I think I should point out there is nothing in these schema changes that requires us to support free-form tagging by users. We could in fact set tags only using a predetermined set allowed by core and/or extensions. Unlike categories, which are set via free-form wikitext syntax, you could potentially disallow willy-nilly creation of new tags. Personally I think the success of projects like Delicious and Flickr strongly suggests tagging by users works though. Steven Walling (WMF) • talk 05:05, 21 June 2013 (UTC)[reply]
  • This sounds great. I'd also love it if this could be extended with a notion of private and public lists. It would be great to be able to group various articles and share them with a wider audience who can subscribe to them. This would open up so many features... Jdlrobson (talk) 02:44, 21 June 2013 (UTC)[reply]
  • Instead of implementing a new simple tagging system, why not just finish https://gerrit.wikimedia.org/r/#/c/16419/ ? Eran (talk) 05:27, 21 June 2013 (UTC)[reply]
    • That patch is pretty big, and enables just one feature. This schema change would enable many features. Steven Walling (WMF) • talk 06:28, 21 June 2013 (UTC)[reply]
      • I suspect you're using the verb "to enable" with different meanings here: one is "make active and available" (the paginated watchlist/multiple watchlist features, IIRC); the other "make it possible for future changes". --Nemo 07:10, 21 June 2013 (UTC)[reply]
        • The change is actually very similar to the proposed, but instead of "tag" it is called "group" and there is also a different entity for it (watchlist_group table). This could enable both private/public groups for watchlists. How will the tag enable "public" watchlists? Eran (talk) 15:12, 21 June 2013 (UTC)[reply]
          • Eran, thanks for flagging the patch. We discussed this within our team and decided to take some time to go over the patch and assess its suitability. We'll report back. --Ori.livneh (talk) 18:28, 21 June 2013 (UTC)[reply]
  • I like the concept, but the specific proposal seems a bit unfocused. First, "tag" seems like a poor name; "list" or "group" seems more accurate IMO. And then there's the comment that watchlists would use the "watchlist" tag, and later talk about watchlists using an unspecified (and possibly user-defined!) list of tags. I'm also not sure about the comment about todo lists using "structured data", as there isn't much structure possible here (you get date and page, but no comment on what it was you were intending to do). Anomie (talk) 13:43, 21 June 2013 (UTC)[reply]
    • If the comment about structured todo lists is confusing the matter feel free to ignore it. The point of the schema change isn't to serve the needs of that one particular feature, but to provide a structure that could serve multiple use cases for storing user-specific lists of pages. Steven Walling (WMF) • talk 18:33, 21 June 2013 (UTC)[reply]
  • There are different wishlist items that have different needs.
    • At Wiktionary the language-specific watchlist need would merit something specific: each user being interested in watching language-specific content from pages on her watchlist to filter out content in any of the other languages, each language on a page being under a specific L2 header from a well defined list of name which correspond to language codes specific to Wiktionary (close to !SO). Each page should be and usually is in one of a set of categories that indicate that the page contains content in that language. That information about language interest is user-specific, not specific to each entry. The proposed tag field seems rather unsuited to this specific task, which is central to all the Wiktionaries. I would be interested in whether I am missing something about how this field could be used for this specific purpose.
    • Other needs at Wiktionary seem closer to what WP users need.
      • I wonder whether an indexed date field intended to be something like "watch expiration date" would be worthwhile. It could be a means of radically reducing the size of watchlists which often accrete items not of great interest to the user, especially after the first month/week/day after an edit. Such a date could be set by default to something the uses determines, with users having the option to manually select any date or "no-expiration" (not available as default).
      • I had thought that privately shared tags were desirable, but I am not so sure that it doesn't create the potential for cabals engaged in projects that have insufficient visibility to the community as a whole. Publicly shared tags seem better. The tag systems that I have seen usually show how an item is already tagged by others. DCDuring (talk) 15:59, 21 June 2013 (UTC)[reply]
  • The PHP watchlist code is currently quite awful and non-extensible (it's pretty much impossible to inherit from it to do something slightly different). How are you going to handle this? (Related: bug 48641.) Matma Rex (talk) 21:42, 22 June 2013 (UTC)[reply]
  • So, just trying to parse this -- where in the UI on the watchlists page would one be selecting which of their lists to display? On a second note, I wonder if it doesn't make more sense (for simplicities sake and so that newer users don't make snaffu's with their private lists) to implement the multiple lists as a core feature and then, once this is stable, add the ability to make some of those lists public an option in preferences which one has to explicitly opt into. Seems to me new users (or even experienced editors tinkering with these new features) could otherwise end up publicly publishing their watchlists. I don't know that all that many editors would be too concerned about this, but as multiple lists have obvious utility separate from the subset of public "to do" lists, it makes sense to me to implement (and allow he user to enable) these features separately. Of course, this isn't my strongest area, so maybe I'm missing something obvious as to why this would be undesirable. Snow Rise (talk) 01:44, 26 January 2015 (UTC)[reply]
Hi Snow Rise as a first pass I would imagine the easiest thing to do would be a dropdown list / searchable list of your own lists. It would default to the first list you created (if we store a modified date along with the list we can work this out). We could then add UI elements to make a certain list your default view. I have a bunch of ideas about how this would look but right now I cannot prototype these without the necessary core change. The mobile web team is also exploring a UI for adding items to the list of their choice and we are interested in moving more towards public sharable lists. I would imagine for the existing watchlist concept when setting up a list there would be a clear way to help the user understand they are building a public or private list and change their mind later. Jdlrobson (talk) 21:28, 26 January 2015 (UTC)[reply]
Thanks for the response, Jdlrobson. On another note, it would be nice if there was a simple but deep interface for importing entries from one list to another, so that existing mega-lists of veteran editors can be broken down into more manageable chunks. I'm also trying to imagine what the UI would look like when you add a new entry and need to select which lists it will be added to. Checklist? A field with sequential entries that you can hilight? And that brings to mind that there also ought to be a binary setting as whether, when you click on the watchlist button, want to (as the default) to add files to your main watchlist or to choose multiple entries. I say this in that many editors utilize automatic-watchlisting of articles they edit, and I imagine there would be general vexation if every edit required this extra step. But I'm sure that's all occurred to you already and I'm sure you guys will figure it all out. Best of luck to you -- I can see a significant boon to workflow, reference and coordination coming out of all of this and I'll be looking forward to seeing what you guys have developed down the line! Snow Rise (talk) 08:32, 29 January 2015 (UTC)[reply]

See also[edit]