Topic on Talk:Structured Discussions

Please write a migration script for LQT and classic talk pages...

31
Gryllida (talkcontribs)

I can see all 3 of them — LQT, classic talk pages, Flow — in use on this wiki. Please write a migration script so that everything becomes readable in Flow, and migrate to it.

It'd of course be a pity to lose past talk page history (or not have it converted) for any Wikimedia project, and this task should be a priority before any kind of deployment (even beta). (Doing otherwise forces past discussions to be archived and not easy to contribute to, effectively saying a big 'we do not care' to all past collaboration, and this is unacceptable.)

WhatamIdoing (talkcontribs)

Gryllida: The last thing I heard was that 'classic talk pages' could only be migrated as entire section=single Flow post. There's no way to figure out who is replying to whom, or even to reliably determine where each comment starts and stops (e.g., if someone forgot to sign, quoted someone else, or used a different number of colons accidentally). I think that this would be good enough, though: a human can make sense of it even if it's all in one post (and ===subsections=== could be set as "replies" to the main section, I guess).

Jorm (WMF) (talkcontribs)

Gryllida: Having personally written a perl script that was designed to parse a talk page into a semblance of a Flow Board, I can tell you that This Way Lies Madness.

The regular expressions required to parse signatures alone are insanely complex. That doesn't even account for trying to understand colon-indented replies, which are rarely, if ever, trustworthy and are hopelessly broken with large threads that use out-dent templates.

It is theoretically possible, but then, all things are "theoretically possible". It would almost be easier to construct a Flow Board by using the history and revision tables and grind one forward. But then, we still have the "who is replying to who" problem.

Gryllida (talkcontribs)

Just have the server parse the entire message. Don't parse it in Perl. Then pick up the last link which is to a user page. ... This is complicated.

If you find it hard, please, consider opening a call for funding volunteer work for this through a PEG grant or a grant from a chapter.

Jorm (WMF) (talkcontribs)

Gryllida: Feel free to take this work up yourself, if you like.

I've got 20 years experience as a software engineer and I think it's a nigh-impossible problem. I can't imagine paying anyone to do it, let alone anyone volunteering to do so.

Jdforrester (WMF) (talkcontribs)

This work has been in progress in this script since March. I agree, it'd be really helpful to get this done.

DannyH (WMF) (talkcontribs)

LQT is going to be converted to Flow; we just need to talk some more about how and when. Thanks for nudging about it. :)

Converting a wiki talk page into Flow is really just asking for trouble. As WhatamIdoing says, there's no reasonable way to parse wikitext edits as discrete messages. We talked about a few different schemes for making it work, including the one that WhatamIdoing mentioned, and they all basically amount to creating sad Frankenstein monsters that are neither talk page conversations or Flow discussions.

That being said -- talk page conversations are an important part of the history of a wiki, and old conversations should never be thrown away, or put out of reach. In fact, one of my biggest problems with the sad-Frankenstein conversion ideas was that they would break the links in Contributions, which would make it harder to see what an editor has done and said on the wiki.

My current thinking about old wiki talk page conversations is that it's more respectful to move the existing talk page conversations to an archive page, and include a clearly visible link to the archive on the Flow board. This is common practice for old talk page conversations anyway -- going into an archive doesn't mean that they're forgotten. (We're starting to work on search for Flow boards -- I don't know if it makes sense to search a Flow board and a talk page archive using the same tool, but we do have the responsibility to figure out how to keep talk page archives at least as as easily searchable as they currently are.)

However -- on the day that a Flow board takes the place of an active talk page, there will be a moment when a current talk page discussion will need to pause and restart as a Flow conversation. That moment will be irritating for everyone. It's not the end of the world, obviously, and worse things happen at sea, but it will be an annoyance that we have to take seriously as we plan that process.

Gryllida (talkcontribs)

I guess that migrating old discussions has to happen before enabling creation of new threads and topics. See my message above, too - I should perhaps ask around and find who'd like to do this.

Gryllida (talkcontribs)

"The last thing I heard was that 'classic talk pages' could only be migrated as entire section=single Flow post." -- Can someone please add this to the migration plans? I gather it's searchable, it has means to add new posts -- which is fine. In theory, we could have some means to trust contributors with manually splitting such thread into separate messages, but avoiding vandalism here is a complex process.

Gryllida (talkcontribs)

"and still needs work to add history" from the linked gerrit - AUGH. I was planning to throw that away. I would really appreciate you doing this in signle-post mode and writing a means of semu-atomating the process of split. :-p

I.e. have people select a message, select the timestamp, and type in a name and its indent level.

In theory, no more vandalism potential than with classic talk pages. Granted Flow allows to move messages around, anyway, which it should and LQT does.

Gryllida (talkcontribs)
Important goals for refactoring

This means that

  1. people should be able to edit others' posts
  2. people should be able to change others' posts authorship
  3. people should be ableto change others' posts indent level and move them round
  4. we should assume that this is normal and is allowed, since both lqt and classic talk pages did this
  5. also useful for cases where I leave an msg and IP and want to sign that again
  6. possibly restrict to autoconfirmed but I would personally allow for all (since currently this is o, and we should firmyl assume good faith and willingless of IP folks to help out with refactoring conersations without breaking them)
  7. possibly put the interface for this into an obscure place. i.e. "enter refactoring mode" for a given thread
Gryllida (talkcontribs)

Jorm, you mentioned that you tried to write a markup-to-flow migration script in the past. How hard would it be to migrate entire sections without parsing them, while preserving history, and allow means for people (end users who would like to revive a discussion by adding their message somewhere in-between old messages) to manually split a Flow message into a few messages (with correct old timestamps) by hand? More details on this is mentioned above.

Where automating a task is hard, it might make sense to semi-automate it.

Gryllida (talkcontribs)

I guess this is also doable by migrating entire talk pages as a single post signed by 'Flow-dummy'.


Important goals for refactoring

Flow has to be able to allow people to refactor discussions, including letting people do things like these:

  • split a thread into two
  • split a message into two individual messages
  • sign a message for someone else ('posted by Flow-dummy, re-signed as Bob by Gryllida' or at least ability to display these in history without showing this info in signature)
  • change indentation level of a message
  • ...
  • whatever you find missing after you deploy this and try to manually parse a wiki talk page on your playground wiki
Diego Moya (talkcontribs)

I suggest everybody should take a look on how "refactoring of text stored in separate snippets" is done in Microsoft's OneNote. That's the model of editing content we should strive for IMHO.

Gryllida (talkcontribs)

Quiddity, please see the two messages above. They contain information crucial for planning of how Flow should be able to refactor conversations, and 1 big suggestion how to handle the issue of migrating old talk pages.

He7d3r (talkcontribs)

Please link to specific messages because the "two messages above" can change when other people comment on this.

Gryllida (talkcontribs)

Admittedly the history would all be left over with the first message: suppose we have a talk page containing sections '1', '2', and '3'. Its life is like this:

  1. someone writes sections 1, 2, and 3, and participates in discussions in wiki markup
  2. flow-dummy posts all that as one big post on a flow board, preserving history
  3. users split it into 3 topics
  4. in history of topics 2 and 3, it'll say 'split from topic 1' as a first entry — people interested in history of topics 2 and 3 will manually browse history of topic 1 (which will be the entire history of that whole talk page)
Gryllida (talkcontribs)

Are we still planning on moving existing classic talk pages?

To my understanding we just need to be able to migrate them as one post, with appropriate split functionality (I split a discussion in two, its history entry of one of them will say where I split it from, all history will remain in the oldest post and be linked).

I.e. I create topics "foo" and "bar" on a classic talk page which has archives. Then:

  1. We move it to a single Flow board post titled "Migrated".
  2. We move archives to single Flow board post titled "Archive 1", "Archive 2", etc.
  3. Users split the discussions by hand.
  4. For each thing they split out of the big post, the history says:
    Gryllida split Topic:43546554DSF from Topic:FDGFD34534
    Gryllida split Topic:FGH345 from Topic:FDGFD34534
  5. Anyone who wants to see history will consult a post history which will, in its first edit, link to the original Topic:FDGFD34534 post (the oldest one on the corresponding page) which holds the whole talk page history from the classic talk page.
  6. The end result is that all discussions are searchable, they are more or less correct headlines and proper TOC. Where people want to reply, they need to do some more splitting of a single thread into invididual messages.
  7. All this functionality is not redundant, because a lot of this will be useful for refactoring Flow-enabled discussions in big RFCs.

(Cross posted at Topic:S4z130y93gv9cbgu)

SPage (WMF) (talkcontribs)
Are we still planning on moving existing classic talk pages?

Yes, but as Danny says "move the existing talk page conversations to an archive page". The maintenance script will copy the talk page header to the Flow board header so all the templates and annotation are there, and there will be a template linking to the archive, something like {{Flow header from wikitext | date=2014-11-17 | archive = [[Page name/Archive 23]] }}

This is pretty much what Quiddity has done when we enable Flow on an existing talk page; when and if we do it for an entire namespace it'll be automated. As Flow/Converting talk pages says, the first place we'll do this is officewiki, WMF's internal wiki.

To my understanding we just need to be able to migrate them as one post, with appropriate split functionality

Someone could certainly copy and paste an old talk page section into a new topic, but it's not what the conversion will do.

As an aside, expert wiki users love split functionality, to me it seems odd, and crazy hard to undo when it involves editing one topic and creating another. A "Start a new topic quoting the selected post" is more e-mail like and I suspect would make more sense to casual commenters.

Hhhippo (talkcontribs)

I agree that developing an undo function for a post split is not trivial. But it can be done, and I think it's worthwile to do it, because a split function would not only support weird power user habits. It would also enable some interesting applications which could help closing the gap between Flow's abilities and current talk page usage (see my latest wall of text).

Hhhippo (talkcontribs)

That said, I agree that we shouldn't try to convert all existing archives to Flow. That would need enormous amounts of manual work, since our current wikitext talk pages don't have a consistent structure. As long as we have a convenient way to search the archives from the flow board, it should be fine to leave them as they are.

Quiddity (WMF) (talkcontribs)

The current plan is still to (very simplified):

  • move all existing classic talk pages, to /Archive n (with localized name, and incremented archive number if any older archives are detected)
  • Copy the classic talkpages' section=0 (banner templates and messages) to the Flow Board header (with link to history for attribution)
  • Add a link for the newest archive

For search, the plan (or at least desired goal) is to have the in-board search also search the classic archived subpages.

I personally really quite like the idea of importing just the currently active wikitalk page into Flow, and splitting it up with one Topic/Post per old Thread, much as you describe. I'm not sure where it falls on the spectrum of practicality, but I'm definitely urging that possibility forward, and it has been "batted around" a bit by the developers (staff and volunteer).

I'm hesitant about the idea of attempting to import all old archives, and even more hesitant about asking editors to split up or annotate any/all of the old threads into individual posts. It's a good "ideal" situation, but that would be a staggeringly large amount of effort, that could perhaps be better directed (partially for the work itself, and partially for all the accuracy-checking that other editors would inevitably feel obliged to do...). We (all the projects) are already backlogged in so many areas, that this seems like it might not be the best thing to add to that pile. You do make a good point that "refactoring" tools will be needed, in some form or another, but I don't think "splitting a single post" was one that had previously been considered.

Thanks again for describing this in detail. There's some damn good ideas in there, even if the flow team can't implement them all. And sorry again for the delayed reply. (I have many threads from Deltahedron, that still need answering, too. :( Eventualism is my only/ongoing hope.)

He7d3r (talkcontribs)
  • What if a wiki doesn't follow that convention for archiving talk pages?
  • Has anyone checked if there are any wikis in this situation?
WhatamIdoing (talkcontribs)

The English Wikivoyage might use a different (manual) system. It seems like "village pump"-type messages end up being swept to more topic-specific pages, rather than being archived in the place where the discussion happened. (Flow would be very handy for that system, because you could just re-assign the topic to another page.)

Gryllida (talkcontribs)

Quiddity, SPage, Hhhippo:

I strongly disagree with "move the existing talk page conversations to an archive page". In my understanding this means that

  1. Old conversations are not Flow'ized. Participating in them is hard.
  2. Old conversations are not searchable. (We can hack in a subroutine which does both a Flow search and a classic search, but this is ugly.)

While we get these disadvantages, we really don't gain anything, because:

  1. All the features I mentioned, which look hard to develop or implement, are required and desired in order for Flow to pass the "flexibility" requirement. I maintain that it is better to plan them early.
  2. There is not much effort about converting archives. They'll only be converted as people need them, gradually, not the whole thing at once.
Hhhippo (talkcontribs)

I'm not saying we shouldn't do any conversion, just "that we shouldn't try to convert all existing archives to Flow."

  1. Yes, re-opening an archived discussion by un-archiving rather than linking is one of the use cases for a conversion procedure. The most common case are likely the archives created from the active talk page the moment Flow is enabled. (Like Quiddity said, currently active discussions should be imported to Flow.)
  2. The main search function will be enabled to search an entire wiki including the Flow part anyway. Going from there to searching within a single Flow board and its wikitext sub/archive pages sounds like a very minor step.
  3. I agree we should have at least most of those features. I'm a bit undecided about the ability to change a post's author, but that might also be needed to fix accidentally logged-out posts. We might want some kind of log for such author changes (or other safeguard mechanisms), since there's quite some potential for abuse.
  4. Your second "2." doesn't play well with your first "2.": if the archives weren't searchable, and searchability is a need, then we would have to convert them all at once. But they will be searchable, so that's not a problem.

Btw: regarding a migration script for LQT, I understand that exists already.

Gryllida (talkcontribs)

Quiddity, SPage, Hhhippo:

Thanks much! I would like to see this done before we deploy Flow widely, because it's an important backwards compatibility for two reasons.

  1. Such flexibility of refactoring is provided by existing classic talk mess.
  2. People will want to unarchive stuff and keep talking in it, and not supporting that locks them out.

Could someone please confirm that the features this thread is talking about are given adequate priority?

WhatamIdoing (talkcontribs)

If people want to un-archive old discussions, then they can do that under Flow exactly like they are doing that in wikitext: manually copy the old discussion from the archive and manually paste it into a new post, with themselves listed as the 'author' in the page history.

Sänger (talkcontribs)

And all post will have the same indentations and name-tags as in the original discussion? Or will they be limited to the restrictions and inflexibilities of Flow as well?

WhatamIdoing (talkcontribs)

The wikitext formatting will not be changed; it will only be placed inside a Flow box. These boxes support association lists, and they even support the common (and painful to people using some w:screen reader software) abuse of association list formatting to produce indentations, like this:

One colon
Two colons
Three colons
Four colons

If you paste over an old discussion, complete with indentations and signatures, you will get the old discussion, complete with indentations and signatures, inside the Flow message.

Sänger (talkcontribs)

So the whole discussion will be treated as a single post? Or did I get this wrong?

If a discussion thread will be resurrected, it should look like a discussion thread, not like a post. I know, usual talk page discussions can be a bit hard to convert, as some may answer with good indentation in the middle of longer posts, and this stuff here doesn't even get indenting and linking right in any manner, see-saw answers should be a nightmare for those programmers, while totally acceptable and fine in a real discussion.

Something like this:

Long elaboration part 1

Answer to long elaboration part 1

Long elaboration part 2

Answer to long elaboration part 2
Answer to this answer again
And another answer more indented

Long elaboration part 3

Answer to long elaboration part 3
Answer to this answer again
And another answer more indented
And yet another answer more indented
Another answer to this answer again

Long elaboration part 4

Answer to long elaboration part 4
Reply to "Please write a migration script for LQT and classic talk pages..."