Extension:MirrorTools/Design decisions

Permanently stationary revisions vs. moving and being recalled revisions
Under the "permanently stationary" system, any revisions made on LocalWiki stay at that page ID, page title, and page namespace until they are moved, undeleted, etc. on Wikipedia. Under the "moving and being recalled" system, once a page is deleted on RemoteWiki, those revisions fall under LocalWiki control and can be moved about, but if the page is undeleted, the revisions are recalled to what their page ID, page title, and page namespace are on Wikipedia.
 * Argument
 * Decision: MirrorTools will use the "permanently stationary" system because of a principle that is analogous to "Cool URIs don't change".

Deleting revisions vs. not deleting revisions when pages are mirrormoved onto them
We shouldn't delete anything! On the other hand, redirects to the source page have no information we don't already have, when they're being moved onto.
 * Argument
 * Decision: It was decided that if the only remotely live revision is a redirect back to the page being moved onto it, to go ahead and delete that, but merge everything else.

Keep rev_parent_id value vs. change rev_parent_id value
Should those be set to their local or remote parent_ids? Probably the remote, and then the local ones should be left as they are. Reason being, we don't want to change the old and new lens. On the other hand, what happens when we add a new local edit? Then it would base the difference in page lengths off the imported revision length. Let's let the users sort this out.
 * Argument

Pros to "keep":
 * We don't have a bunch of wacky len changes
 * Performance, e.g. in mirrormoves we don't have to change all the parent_ids

Pros to "change":
 * Things could get theoretically get ugly with the page lengths, e.g. if we have revisions getting moved. But not really, because it will still base it off that revision, even if it's in a different page.

Keep the old parent_id
 * Decision

Add a null revision for deletion events
Deletion and restoration are at least as relevant as Wikipeda protection and unprotection events. Before going that route, maybe we should try to find out why some log events have null revisions and others don't. See Manual:Null revision. See.
 * Argument

For now, just imitate what Wikipedia does.
 * Decision

Have vs. don't have a page_mt_former
It promotes page_id stability, so that it doesn't change every time someone mirrormoves, mirrorcreates, mirrorpagerestores, etc. and then mirrormoves or mirrorpagedeletes. Thing is, we have page_id instability anyway, because of all these merges, so who really cares about changing it back to what it was before the merge. It also makes the developer's life more complicated, and challenging to keep track of in his head. It does, however, have a certain coolness factor. It could be a way to keep page IDs of mirrored pages below one quadrillion. Also, if the page ID changes, it could be a way to find the page whose page ID changed (although it would only be helpful for the merge, not the unmerge). It could also be a way to keep page_id equal to the rev_ar_page_id. On the other hand, there could be more than one rev_ar_page_id for a given page, so which to use? It also makes the page table less lightweight. Then again, who knows what that data might be useful for? We might need to know, at some point, what the former page_id was. Also, this legacy code (e.g. ApiMirrorMove.php) is challenging to rewrite.
 * Argument

Dump it.
 * Decision

Add vs. don't add null revisions and redirects when we don't have the ids, for the page moves
We should always have the real revision IDs before adding stuff. It's just going to conflict when it's pagerestored later and we do the MirrorEdit, or rather it'll add a new revision and we'll have two null revisions. On the other hand, without it, we won't have those null revisions and redirect revisions unless the page is mirrorundeleted.
 * Argument

Leave those out.
 * Decision

Set page_latest of mirrored pages to latest revision vs. latest remote revision
In theory, page_latest should always be the latest revision. But we can't go around displaying the latest local revision if it's remotely controlled.
 * Argument

Set it to the latest remote revision.
 * Decision

Have vs. don't have mbq_rc_id2
Something just feels messy about having rc_id set to the log event rc_id value rather than the rc_id for the revision row. It doesn't matter, though, because we can just select using multiple columns in the WHERE clause when we need to differentiate. There's a performance hit for having yet another indexed column, and we'll rarely use it. Thing is, we could have a new name for the action to add these mirroredits. Call it mirrornorcedit or something. Then it will map accordingly.
 * Argument

Dump it; use mirrornorcedit-needsrev and mirrornorcedit-readytopush.
 * Decision

Have vs. don't have mbq_extra_params
We have all this stuff like mbq_comment2, mbq_rev_id2, etc. that we don't need. It's just a matter of time before we need to stuff all this stuff in a parameters field. Unless you actually want to run queries on those fields. I can't imagine you would for stuff like mbq_comment2. On the other hand, a blob is a lot of space; what would the effect on performance be?
 * Argument

Keep procrastinating making this change. Maybe do it later, when we have like five of these fields.
 * Decision

Assume vs. don't assume that MirrorPuxxBot has access to the backend
It would be more efficient if we could operate under the assumption that MirrorPuxxBot has access to the backend. But that's not where the main concern about expense is; rather, it's with the pulling from RemoteWiki. Also, who knows where we'll need to operate this bot from. On the other hand, maybe it would impose a delay sometimes to pull from LocalWiki via API; but then again, we'll probably be pulling 500 records at a time.
 * Argument

Don't assume. Operate as if it doesn't have access. Use the API.
 * Decision

What order should mb_queue rows be handled in by the pullbot and pushbot? mbq_id or mbq_rc_id?
mbq_id is already indexed, and we know that every row will have it (would there be any mbq_rc_id rows that wouldn't?) mbq_rc_id, on the other hand, is in the correct order on the remote wiki. What if we add rows to the table through some other tool? (Why would we do that, unless it were old log entries, which we're going to put in the tables before anything else anyway? Why would anything get missed if we're using fail-fast?)
 * Argument

Use mbq_id.
 * Decision

Should we ever store any timestamps in the bot database that have undesirables in them?
We might be having to add those undesirables right back for queries.
 * Argument

Only store them with the undesirables when it's, e.g., mbq_params and we're making it look exactly like log_params on the remote wiki. Otherwise, e.g. if using mbq_params2, store it with undesirables removed.
 * Decision

Start class names with uppercase vs. lowercase
Legacy; botclasses
 * Argument

Switch to uppercase in mirrorPullBot and mirrorPushBot, but leave botclasses alone.
 * Decision

How should the rotation of functions work in mirrorPullBot?
It's important that the pushbot not get stuck waiting. On the other hand, we don't want to always be grabbing stuff only as it's immediately needed.
 * Argument

Check and see what's the next item in mbq_id; go to that function. But if it's "readytopush" then go with the rotation.
 * Decision

What do we want to do about mirrored protect events?
Maybe we should put "view source" for remotely protected pages rather than linking to the Wikipedia edit page? On the other hand, full protection is pretty rare. We'd have to figure out a way to implement informing users of the protection. An ApiMirrorProtect.php or something, making changes to, with a pr_mt_push_timestamp field. What a pain in the neck, and probably so seldom-used that it would be glitchy. It's definitely not part of the minimum viable product.
 * Argument

Let's let the users decide what they want; leave it out for now. Get the paid devs to implement that sometime in the future if we really want it that badly.
 * Decision

What do we want to do about mirrored patrol events?
Patrolling sucks. But apparently, people want to do it. Not sure if I want to switch on that feature or not. Those red exclamation points are annoying.
 * Argument

Let's let the users decide what they want; keep it switched off by default.
 * Decision

Should we send the file as a string or as an upload?
It sucks to have to create a temporary file. That's messier. But what other option is there? We haven't figured out how to send it as binary or whatever. Also, if we want to send a big chunk, isn't that a lot of memory? Maybe not; maybe it's no worse than saving a big revision.
 * Argument

Try to do the string option.
 * Decision

Should we use fail-fast or make it fault tolerant?
Fail-fast is needed because future rows are dependent on correct data in past rows. However, the mirroring could go down for long periods during troubleshooting.
 * Argument

Use fail-fast.
 * Decision