LiquidThreads 3.0/Back-end

From mediawiki.org
This is part of the 2011 LiquidThreads redesign project

This is a list of issues with the existing LiquidThreads architecture. It is proposed that substantial refactoring and code review work is done on LiquidThreads to alleviate these problems.

Schema Issues[edit]

The proposed schema

Talk Page —> Topic association[edit]

Currently, to associate a topic with a specific talk page, each row in the thread table stores the namespace and title of the page that it is associated with. This is inflexible, because it does not allow comments to be contained anywhere other than on a specific discussion page. It is also fragile; a complicated system has been developed to make sure that the association between threads and discussion pages is not broken by page moves or deletions, and is kept synchronised — working by keeping at least one of the page ID and page title in sync with the appropriate wiki page.

The new proposed architecture would obviate these problems by creating an intermediary object known as a "Channel". A Channel could (but would not necessarily) be associated with a specific wiki page, which would hold that Channel. Page moves and deletions would be simple, as they would require only a single database update. This is also a more flexible approach, and would allow Channels to be used in other contexts, such as for discussion of objects other than wiki pages.

Use of wiki pages to store comment text[edit]

When LiquidThreads was first designed, it was decided that wiki pages would be used as the back-end storage for comment text. This was done because, at the time, it was intended that only comment text would be subject to version control, and wiki pages provided a low-development alternative to developing an independent system of version control. However, since then, all comment metadata is versioned in a separate version control system. Accordingly, the original impetus for storing comment text in wiki pages no longer applies.

The use of wiki pages as the back-end for comment text is also the root cause of a number of bugs. It is common for a user to delete, move, or otherwise interfere with a Thread namespace page. This and other problems can result in the association being broken between the row in the thread namespace and the wiki page that stores its content — a database corruption issue that has plagued LiquidThreads.

It is proposed that comment text is migrated from wiki pages to dedicated database storage, to alleviate the database corruption and other issues associated with storing comment text in wiki pages. Consultation with Tim Starling has produced the recommendation that comment text is stored in the text table, as a reference to External Storage.

It will also be necessary to re-implement the following behaviour that came "for free" with wiki-page-based storage.:

Caching
A multi-level caching strategy is proposed, including memcached caches of the HTML representations of each comment and topic, with much of the display variation being implemented through JavaScript. The new cache invalidation behaviour upon template changes will need to be considered.
Deletion and revision hiding
A deleted flag will be attached to each comment revision — much the same as the way hiding is implemented on the revision table.

Topics versus Comments[edit]

Currently, LiquidThreads uses a hierarchical model for threaded discussion. In other words, a top-level topic has "subthreads" or replies, all represented by "Thread" objects. In the original vision, a "Thread" contained "Comments", but the developer originally assigned to LiquidThreads used a single "Thread" object to represent all types of comments.

This hierarchical data structure has a number of shortcomings, specifically that it does not support a linear comment structure, promoting unnecessary levels of indentation. It also requires that the association with a specific Channel (and other topic-specific information such as subject) is duplicated across all rows, which presents problems when that information needs to be changed or queried.

It is proposed that the concepts of a "Topic" and a "Comment" are separated into different entities (considering that, at least internally, the term "Thread" is now confusing). A comment will usually be a part of a topic, and will optionally have a parent comment. Accessing a talk page will require a simple left join on Topic and Comment.

History tracking and revision control[edit]

Currently, revision control is done in a haphazard way, somewhat reminiscent of the pre-1.5 cur/old split. Comment text is versioned using the standard wiki-page versioning system (using the page, revision and text tables, in conjunction with external storage). Thread metadata is versioned by saving serialised versions of a thread object along with a timestamp, every time an edit is made. This was easy to implement, but has substantial shortcomings.

Having separate systems for tracking the history of metadata and comment text is fragile and confusing — the two can easily become desynchronised, and users have to look in more than one place. Neither system is ideal for its purpose — the metadata storage system duplicates data and needs to be normalised, and the wikitext storage system is designed for large pages by multiple authors, instead of for pages associated with a single author.

It makes sense, therefore, to abstract the attributes associated with a comment into a single, versioned data store — all metadata fields and the comment text itself should all be transferred to a single post revision table, which would be used for ordinary access to thread data, including access to the current version.

General code issues[edit]

Separation of model, view, controller and interface[edit]

At present, while there is some level of separation between the Model (data objects), the View (interaction with the user), the Controller (business logic) and the Interface (interaction with MediaWiki Core and other extensions), there is a substantial amount of code that is in the wrong part of the codebase. The codebase should be split into these sections, and "utility functions" should be, if possible, moved into one of these sections.

The priority of this task is not necessarily as high as many of the other tasks, however there are substantial advantages to this separation in terms of code reusability and understandability.

Documentation[edit]

LiquidThreads documentation is incomplete and out of date.

The following tasks are necessary to bring the documentation up to date, once all refactoring work is complete:

  • Every class and function needs updated documentation comments.
  • High-level documentation needs to be written about the way that the classes themselves fit together.

User documentation needs to be written once the redesign has been completed.

Class representation of historical thread revisions[edit]

At present, when historical versions of threads need to be displayed, a special Thread object is created that denies any write operations to the database. This is a terrible and fragile hack, and results in fatal errors every now and then.

In conjunction with the redesign of historical revision storage, a new CommentRevision class should be created. Since this class would be used to access all data about a comment, displaying old versions of comments would be natively supported by the new system.

Reply Counting[edit]

Replies to a comment are currently calculated by adding the number of replies to each of their numbers of replies. Partly because of the Thread object's odd save routines, this is usually wrong. The system needs to be checked for consistency and the bugs that cause it to come unstuck need to be fixed.

Deep Links[edit]

The current permalink functions are buggy and fragile, and need conversion to a single, simple form. It is proposed that a new special page is created, so that Special:Comment/<comment ID> will redirect a user to a particular comment, in context.

Editing forms[edit]

Editing forms use EditPage currently, and need specific designs. EditPage is messy, and, considering that comment text storage will be migrated away from wiki pages, it no longer makes sense to use it. However, the potential for edit conflicts needs to be considered, since the EditPage conflict handling will no longer be available.

Error handling[edit]

In the current iteration of LiquidThreads, there is insufficient on-the-spot error handling. As a consequence, it tends to happen that NULL objects have a nasty habit of appearing in random places. In the new iteration, it will be necessary to make the code more robust, and to error out in the appropriate circumstances, so that these issues can be more easily tracked and resolved.