Topic on User talk:SSastry (WMF)

Preserve original transclusion's parameter order

9
Summary by SSastry (WMF)

Patch merged and deployed.

Alsee (talkcontribs)

Thanks for addressing the template parameter order dirty diffs! (From Phabricator T179259.)

I was looking at your Gerrit patch for it. If we have parameters A through Z, it looks like inserting B into XYZAC aborts as soon as it sees X, yielding BXYZAC instead of the more natural XYZABC. I think we can do a better job of keeping related parameters grouped together with a modest increase in effort.

Based on the variables i and j in your code, walk through all values of j and remember the j with the smallest absolute_value(i-j). If no such j exists, append the new parameter. If j>i then insert before j, otherwise we have j<i and we insert after j.

Inserting B into XYZAC would lock onto A or C, and put B between them.

Alsee (talkcontribs)

I just read the Gerrit replies and saw you mentioned this there. Cool :)

I also saw the comment about it being O(N^2). I'm pretty sure C. Scott Ananian's idea of using standard quicksort with a custom sort function will return garbage. The sort function generally assume the compare function is transitive. That's not true for the compare function he provided. Existing parameters could get scrambled when they get sorted around the new parameters. The lack of transitivity could result in a complete nonsense result.

If it's worth the effort to go for an O(N LogN) algorithm, I think I see how to do it. However I'm not sure which argList functions are O(1) or O(N), which is important. I'm currently trying to Google that.

Cscott (talkcontribs)

You're right, the algorithm as I sketched it does not provide a transitive comparison function.

But here's a version which is O(N ln N), using an initial linear pass over the templatedata arguments to do the equivalent of the "search and insert":

https://jsfiddle.net/ovwpwkr3/

It doesn't require the sort to be stable.

EEng (talkcontribs)

Not to be the grumpy old man, but as a software engineer with almost 40 years' experience, let me recommend that you start with an obvious, easy-to-verify n^2 approach, and think about clever n log n approaches only later IF performance turns out to matter, which it very well might not, and anyway you can use the n^2 implementation to test the clever implementation.

Alsee (talkcontribs)

I agree that fussing over n^2 efficiency for small n is probably pointless. And the most important thing is fixing the reordering of existing parameters.

However the current question is whether to fuss over the results when adding new parameters to existing parameters. The WMF is using a nice approach for simplifying the code by working at a higher level of abstraction. They basically just sort a list rather than fiddling with parameters individually. It's short, understandable, and it cleanly sweeps messy details under the rug. (Duplicated parameters can be an annoying corner case.) However it disregards the relationship between parameters added at the same time. If the existing parameters are ZA and you add MN, you get the ugly order NZAM. N has a weak pairing with Z, M has a weak pairing with A, and the strong MN pair was ignored and split.

It's possible to do better, but I hesitate to advocate an ugly solution. The most straightforward approach appears worse than N^2, and the efficient approach would be a hard-to-decipher algorithm. The current code solves the main problem, and generally gives good results when adding new parameters.

Cscott (talkcontribs)
Alsee (talkcontribs)

@Cscott I did a bit of testing. Assuming the existing parameters are ZA and assuming a full alphabet in template data:

var orig = "ZA";
var tmpd = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

The first version of the code produces ZABY (should be YZAB), and the second version produces MZAN (should be any of MNZA, ZMNA, or ZAMN).

The more apparent approach is to place the most closely-adjacent parameters first, and progressively add parameters. However it may be more big-O efficient to take an opposite approach. Consider parameters in a chain, in templatedata order. Cut the chain between existing parameters, leaving each new parameter attached to exactly one existing parameter. o = new parameter, * = existing parameter.

o-*-*-o-o-*-o

Walking down the chain from left to right:

  • There is nothing to do until we reach the *-*. Simply split at that point:
  • o-* *-o-o-*-o
  • As we walk from the second * to the third * we remember the step with the largest templatedata distance between parameters, then cut at the biggest gap. For example:
  • o-* * o-o-*-o
  • There are now three chains, each containing exactly one existing parameter. Put the chains in proper order for the exiting-parameters.
SSastry (WMF) (talkcontribs)

Thanks @Alsee and @EEng for your suggestions and @Cscott for your updated patch. Please see my latest comment on the gerrit patch. I personally prefer iterating towards the complexity that is needed by starting with something simple that is acceptable.

Cscott (talkcontribs)

@Alsee take a look at https://gerrit.wikimedia.org/r/389852. I'm pretty happy with MZAN as the output. You're always going to have some weird corner cases, but "sort with closest original parameter" is simple and understandable.