User talk:SSastry (WMF)

Jump to navigation Jump to search

About this board

Previous discussion was archived at User talk:SSastry (WMF)/Archive 1 on 2015-09-09.

ToBeFree (talkcontribs)
SSastry (WMF) (talkcontribs)
:-) Thanks!
Izno (talkcontribs)
SSastry (WMF) (talkcontribs)

Thank you very much for getting into the weeds and tidying it all up. :-) I'll take a look and we'll fix things as appropriate.

Reply to "Tidying the Tidy project"

Lint errors (tr.wiki)

2
Summary by SSastry (WMF)
Curious (talkcontribs)

Hi SSastry,

I'm trying to fix the lint errors at tr.wiki. Normally, after fixing an error on a page: 1) that page would disappear from the list (e.g. from "Special:LintErrors/multi-colon-escape"), 2) clicking "page information" would no longer show that lint error, and 3) lintHint would no longer show that lint error. But today, after fixing a lint error, 1) & 2) don't happen. Examples:

Do you know what's going on here? -- Curious (talk) 19:39, 3 March 2018 (UTC)

SSastry (WMF) (talkcontribs)

Thanks for flagging this. Others have raised this as well Topic:U8p5sxysk9eblcob and Topic:U8q07xrh4r67iql4. Something might be temporarily broken with the update mechanism -- normally, it takes a little while for updates to propagate, but 24 hours is too long. We'll investigate this tomorrow.

Summary by Sunpriat

answered

Sunpriat (talkcontribs)
SSastry (WMF) (talkcontribs)
Sunpriat (talkcontribs)

A little bit of something else. Is this normal when wrapping spreads even outside the parent block? The tag wrapping starts inside the ul, but does not end when the ul tag is closed. Continues to wrap everything up to the end of the page. Previously with Remex, it was not like this.

In the example, it affects the text "after"

before

*a
*<small>b
*c
*d
*e

after

Also, the unclosed tag S in the middle of the discussion page in the old parser maybe not line-through all other topics and text to the end of the page. Probably it should at least ever stop line-through or it is normal now?

SSastry (WMF) (talkcontribs)

Yes, this behavior is part of the HTML5 parsing spec. But, note that even Tidy did the same thing -- see below.


<ul>
<li>a</li>
<li><small>b</small></li>
<li><small>c</small></li>
</ul>
<p><small>after</small></p>

If you replace the <small> tag with <s> tag, the same thing happens. What is different is when there are multiple such unclosed tags. In Tidy, the effects are limited or it assumes the second unclosed tag was a closing tag. With Remex (like any HTML5 parser), the effects accumulate.

Preserve original transclusion's parameter order

9
Summary by SSastry (WMF)

Patch merged and deployed.

Alsee (talkcontribs)

Thanks for addressing the template parameter order dirty diffs! (From Phabricator T179259.)

I was looking at your Gerrit patch for it. If we have parameters A through Z, it looks like inserting B into XYZAC aborts as soon as it sees X, yielding BXYZAC instead of the more natural XYZABC. I think we can do a better job of keeping related parameters grouped together with a modest increase in effort.

Based on the variables i and j in your code, walk through all values of j and remember the j with the smallest absolute_value(i-j). If no such j exists, append the new parameter. If j>i then insert before j, otherwise we have j<i and we insert after j.

Inserting B into XYZAC would lock onto A or C, and put B between them.

Alsee (talkcontribs)

I just read the Gerrit replies and saw you mentioned this there. Cool :)

I also saw the comment about it being O(N^2). I'm pretty sure C. Scott Ananian's idea of using standard quicksort with a custom sort function will return garbage. The sort function generally assume the compare function is transitive. That's not true for the compare function he provided. Existing parameters could get scrambled when they get sorted around the new parameters. The lack of transitivity could result in a complete nonsense result.

If it's worth the effort to go for an O(N LogN) algorithm, I think I see how to do it. However I'm not sure which argList functions are O(1) or O(N), which is important. I'm currently trying to Google that.

Cscott (talkcontribs)

You're right, the algorithm as I sketched it does not provide a transitive comparison function.

But here's a version which is O(N ln N), using an initial linear pass over the templatedata arguments to do the equivalent of the "search and insert":

https://jsfiddle.net/ovwpwkr3/

It doesn't require the sort to be stable.

EEng (talkcontribs)

Not to be the grumpy old man, but as a software engineer with almost 40 years' experience, let me recommend that you start with an obvious, easy-to-verify n^2 approach, and think about clever n log n approaches only later IF performance turns out to matter, which it very well might not, and anyway you can use the n^2 implementation to test the clever implementation.

Alsee (talkcontribs)

I agree that fussing over n^2 efficiency for small n is probably pointless. And the most important thing is fixing the reordering of existing parameters.

However the current question is whether to fuss over the results when adding new parameters to existing parameters. The WMF is using a nice approach for simplifying the code by working at a higher level of abstraction. They basically just sort a list rather than fiddling with parameters individually. It's short, understandable, and it cleanly sweeps messy details under the rug. (Duplicated parameters can be an annoying corner case.) However it disregards the relationship between parameters added at the same time. If the existing parameters are ZA and you add MN, you get the ugly order NZAM. N has a weak pairing with Z, M has a weak pairing with A, and the strong MN pair was ignored and split.

It's possible to do better, but I hesitate to advocate an ugly solution. The most straightforward approach appears worse than N^2, and the efficient approach would be a hard-to-decipher algorithm. The current code solves the main problem, and generally gives good results when adding new parameters.

Cscott (talkcontribs)
Alsee (talkcontribs)

@Cscott I did a bit of testing. Assuming the existing parameters are ZA and assuming a full alphabet in template data:

var orig = "ZA";
var tmpd = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

The first version of the code produces ZABY (should be YZAB), and the second version produces MZAN (should be any of MNZA, ZMNA, or ZAMN).

The more apparent approach is to place the most closely-adjacent parameters first, and progressively add parameters. However it may be more big-O efficient to take an opposite approach. Consider parameters in a chain, in templatedata order. Cut the chain between existing parameters, leaving each new parameter attached to exactly one existing parameter. o = new parameter, * = existing parameter.

o-*-*-o-o-*-o

Walking down the chain from left to right:

  • There is nothing to do until we reach the *-*. Simply split at that point:
  • o-* *-o-o-*-o
  • As we walk from the second * to the third * we remember the step with the largest templatedata distance between parameters, then cut at the biggest gap. For example:
  • o-* * o-o-*-o
  • There are now three chains, each containing exactly one existing parameter. Put the chains in proper order for the exiting-parameters.
SSastry (WMF) (talkcontribs)

Thanks @Alsee and @EEng for your suggestions and @Cscott for your updated patch. Please see my latest comment on the gerrit patch. I personally prefer iterating towards the complexity that is needed by starting with something simple that is acceptable.

Cscott (talkcontribs)

@Alsee take a look at https://gerrit.wikimedia.org/r/389852. I'm pretty happy with MZAN as the output. You're always going to have some weird corner cases, but "sort with closest original parameter" is simple and understandable.

Alsee (talkcontribs)

Hi, I see you're listed as Parasoid lead.

Recently I've submitted quite a few bug reports, and some WMF staff have expressed puzzlement at why I have such a miserable Flow experience and why I run into a steady stream of problems, up to and including Unable to parse content due to a Parsoid failure. I think I've figured out a big issue. I was looking at Parsoid/Round-trip_testing. It looks like the WMF attempted to collect a large body of representative test data. parsoid-tests currently claims:

  • 100% parsed without errors,
  • 99.84% round-tripped without semantic differences, and
  • 74.56% round-tripped with no character differences at all.

But if your body of test data isn't actually representative of real world use, you're going to miss a lot of issues that rarely or never appear in the data you did test against.

The majority of edits are not saves.... the majority of edits are actually done within an edit-preview-cycle, possibly ending with a single save. I can tell you, you're going to find a *lot* of stuff that shows up within the edit-preview-cycle that rarely shows up in a final save. It seems you also didn't test against edits outside of article space, so there's another big chunk of stuff you're missing. (In Flow, merely clicking between the two editing modes generates a round-trip which routinely mangles what I had... and some of those no-semantic-difference changes are gross).

Is there any chance you can collect a large body of preview-diffs to do a round-trip test against? Preferably one test-set from article space, and one test-set sampled from all non-article space?

SSastry (WMF) (talkcontribs)

@Alsee Thanks for reaching out directly about this and your suggestions about possible gaps in our test coverage.

As of July 2015, the round trip testing tests against 160K pages coming from a collection of 30 wikis. enwiki has the largest representation @30K pages. In addition, 30% of the 160K pages were picked from the RC stream, so they are pages being edited. So yes, you are right that we did attempt to collect a large body of test data that also attempted to be representative.

I think you are making the observation that for Flow, this is not sufficient. For starters, I want to look at the bug reports you've filed so I know what is going on. It would be helpful if you point me to that. It is not necessary, but steps to reproduce them would be even better. Once I have a chance to look at that, it would help us figure out how to improve our test coverage (including but not limited to looking at non-article space wikitext usage).

Alsee (talkcontribs)

My other reply got mangled. I'll leave that post as-is, in case you want to investigate the mess. Flow is not reliable communication medium for discussing Flow bugs because it doesn't have a stable TrueText like wikitext does, and because trying to show troublesome Flow content generally triggers problems. I'll do my best build a non-damaged version of my original post here. Oh jeeze, I copy-pasted it all here and immediately got those little arrows, and a round-trip immediately resulted in new semantic changes. For example the text used to contain the word *was* and a roundtrip turned it into bulletpoint was*

Original post below:

----------------------------------------------------------

I'll probably split separate subjects into separate replies.

I think you are making the observation that for Flow, this is not sufficient.

Hmm. My first thought was that this issue would spill over to article editing too.... but on second thought maybe not much. I don't really use Visual Editor and I'm still trying to think through the actual-use implications. I think the fact that VisualEditor can't round-trip edit modes during an edit will mostly avoid the nasty trigger situations. But there are definitely latent issues you're not seeing, and I think real-use article-editing errors rates will still be a bit higher than your testing suggests.

The great thing about real wikitext is that newbies can type all sorts of malformed crap into it.... experienced editors can type a malformed half-edit... anyone can accidentally or experimentally type all sorts of malformed crap into it.... and it cheerfully gives you a reasonable best-effort render. No errors, no complaints, the worst that happens is you don't get what you hoped for. Then you play around with your wikitext until you do get what you want.... and you save a single relatively-clean generally-well-formed edit.

In Flow it's common to repeatedly flip edit modes while working up a single edit. That can hit Parasoid with all sorts of random malformed wikitext. And the worst part isn't a rare Parsoid Parse error. The worst part isn't even a fluke semantic change. The worst part is that Parasoid constantly throws your wikitext out the window. Your wikitext is gone, and Parasoid gives you a new mutated blob of wikitext.

Just a single <nowiki>

Becomes <span>&lt;nowiki&gt;</span>

which becomes  <span>&#x3C;nowiki&#x3E;</span>

I lost the edit that triggered the Parasoid Failure. I had reflexively tried to make italics while in visual mode and I got something like this:

<nowiki>'blah' blah blah ''</nowiki>blah blah

The weird placement of the first nowiki tag caught my eye, and I made some random minor edit to see what would happen. I might have unbalanced the nowikis. I toggled the edit mode through 4 or 5 round-trips, it mutated each time. On the final roundtrip Parasoid itself mutated it into something that triggered a Parasoid failure.

(Then I started talking about bugs in that post.)

Ok, I avoided any roundtripping on this post. What I see on my screen right now looks correct. If I *don't* add a new reply, it's because this save looks correct.

SSastry (WMF) (talkcontribs)

You can now remove that other mangled post. I know what is going on. I filed https://phabricator.wikimedia.org/T115236 to figure this out -- you uncovered a bug in Flow <-> Parsoid interaction. Once we figure this out, the weird normalizations you are seeing should go away.

Alsee (talkcontribs)

The post above came out correctly.

I can report a reproducible semantic-change test case:

  1. Starting with Flow in default Visual mode.
  2. Go to the damaged post below(permalink). Click edit.
  3. CTRL-A, CTRL-C to copy everything.
  4. Click on a REPLY link somewhere, and CTRL-V paste. (You should still be in Visual mode)
  5. Notice that there are no bullet points anywhere.
  6. Click for Wikitext mode. Click back to Visual mode.
  7. Notice that there is now a bulletpoint.
SSastry (WMF) (talkcontribs)

The arrow lines that you see (and reported) is a Firefox specific issue with copy-paste in Visual Editor (Visual Editor is embedded inside Flow). I cannot right away find the bug report for you. But, in any case, the scenario of the bullet being introduced is a bug in Parsoid (that should never happen -- we should have added a nowiki around the bullet because of the line-breaks introduced by the copy-paste. We'll investigate how to reproduce this and fix this.

Alsee (talkcontribs)

I just found this on Jimbo's talk page:

There are still lots of improvements yet to make, not least integrating wikitext and visual editing together properly and removing the hack of having a second edit tab that jumbles up the interface... Best, -Elitre (WMF) (talk)

It sounds like the plan is for general editing to work like Flow editing, where you get rid of the real wikitext parser and run everything through Parasoid. If so, then I need to revise my comment above, where I indicated that this issue isn't going to massively spill over to article editing. If you try integrating wikitext and visual editing together properly by running everything through Parasoid, three things will happen:

  1. It's going to start massively mangling articles the same way it mangles Flow messages. Trying to deal with typo-ridden fragmentary incomplete edits is going to hammer all of Parsoid's failure cases.
  2. Experienced editors are going to scream bloody murder. Even when there's no-semantic-changes, it's WTF why does it constantly destroy what I wrote.
  3. Newbies will be severely impacted and probably driven off, when Parsoid makes it impossible to figure out how anything works. Parsoid throws away the wikitext they entered, and gives them back new and different wikitext. They will have no clue why their wikitext changed. The re-written wikitext may use stuff they haven't figured out yet. For newbies trying to learn and experiment, a simple preview test turns into a WTF-just-happened-I-can't-understand-this-I-quit.
SSastry (WMF) (talkcontribs)

Note that this bug you noticed here is specific to Flow. https://phabricator.wikimedia.org/T115236 does not affect Visual Editor. In addition, there are other techniques that Parsoid uses for Visual Editor that prevents dirty diffs. Between VE and Parsoid, we will nevertheless be doing a lot of testing to make sure something like this does not happen, especially since you have specifically alerted us to this possibility.

Alsee (talkcontribs)

I reported copy-paste completely mangling everything. I assume it will get fixed.

I reported reverting-an-edit completely destroying the original message. I assume it will get fixed.

I reported that the page you see on first save doesn't match what everyone sees on future fresh loads. I assume it will get fixed.

I reported that the editing interface crapped out, making editing virtually impossible. I assume it will get fixed.

I reported on summarize-edits causing history links to generate fatal exceptions. I assume it will get fixed.

I reported on all of the menu items becoming inaccessible if the content in the post is "too big". I assume it will get fixed.

I reported that oversight was never implemented. (Because my private info got sucked into a copy-paste test post.) I assume it will get fixed.

I reported test-results that mixing top-posting and bottom-posting turns large discussions into incomprehensible spaghetti. I *hope* there's a plan to fix it.

I can't even begin to remember everything I've reported - every time I touch flow everything explodes faster than I can report it all.

When you say "making sure something like this does not happen", are you merely saying you plan to get the mangling rate down to the figures claimed for Visual Editor? Or are you saying it's actually going to work as well as our current wikitext editor? (i.e. evil gremlins never-ever-ever creep in and rewrite what we wrote?) Because as I understand it, there's not much prospect that will ever get fixed with Parasoid.

SSastry (WMF) (talkcontribs)

I can only tell you what we'll do on the Parsoid end, not about Flow.

As for "making sure something like this does not happen", I cannot promise "never-ever-ever". Bidirectional conversion between wikitext and HTML is hard, and we have come a long way there. The problems you saw in Flow posts that frustrated you is because of a bug (which I linked to above) where some information is not getting to Parsoid. But, I appreciate your confidence in our abilities :-), but let us wait till we get there perhaps before making final judgements. But, more seriously, I do appreciate your reports as they pertain to Parsoid.

This post was hidden by Alsee (history)
SSastry (WMF) (talkcontribs)

This post can now be deleted if you wish.

Alsee (talkcontribs)

I just realized, we both got so distracted by bug-flood that neither of us noticed that you didn't address the semantic-change bug that completely destroyed the post. The one that led me to make a second post. The one where Flow randomly changed it's mind about how to parse the nowikis and changed it's mind about having a line in italics. The nowikis were stable through roundtrips... until it wasn't.

SSastry (WMF) (talkcontribs)
Alsee (talkcontribs)

Ok. The Phab only mentioned the no-semantic-change rewriting of the wikitext view. I now take it that the Phab is intended to cover the changing rendered-view as well.

Alsee (talkcontribs)

I hid it, but my understanding is that the WMF rejected my report that Flow fails to support deleting.

SimonBrackley (talkcontribs)

Hello,

I am having an issue relating to the limitation I believe you were involved in correcting that is currently marked as solved: Parsoid/limitations

On my wiki I wish to set my paragraph margins to 0 and add lines manually (a bit like MS word's "no spacing" style). However, I come across the issue that whitespace is ignored in creating new lines. I believe this is due to the update you mention in the above article? Is there an older version of Parsoid that would better suit my needs or is it better to edit something on my parsoid?

Unfortunately, I am new to node.js and the parsoid service in general, so any advice would be much appreciated.

Thanks,

Simon

SSastry (WMF) (talkcontribs)

I am not following your question here.

Since we are all in mainland US at this time, can you show up on IRC sometime in the 8 am PST - 4 pm PST time range on the #mediawiki-parsoid channel? I saw you showed up around 5 am PST and asked your question.

Please provide feedback on suggested improvements to the Code of Conduct

1
MediaWiki message delivery (talkcontribs)

Thanks to everyone who’s helped work on the Code of Conduct so far.

People have brought up issues they feel were missed when working on "Unacceptable behavior" and "Report a problem". Consultants have also suggested changes in these same sections.

These are important sections, so please take a look at the proposed changes. I apologize that this feedback arrived later than planned, but I think this will create a better document.

If you prefer to give your opinion privately, feedback via e-mail is welcome at conduct-discussionwikimedia.org.

Thanks. Mattflaschen-WMF via

Gnom (talkcontribs)

Hey, I just saw that you were involved in the VisualEditor feature that automatically converts URL links into interwikilinks and I wanted to say thank you. Keep up the great work!

SSastry (WMF) (talkcontribs)

Glad to hear. :-)

There are no older topics