Talk:Requests for comment/Page deletion

To clarify, do these proposals call for the archive table to be removed and for deleted revisions to be kept in the revision table? Leucosticte (talk) 18:02, 23 October 2012 (UTC)

Conflicts
"CON: unique key conflicts with live pages and deleted pages with the same name" Is that a bug or a feature? It seems to me that we want to keep the live and deleted pages with the same name under the same unique key, unless there's a reason to split them up. I propose that there will be a field, e.g. a modified log_params or the proposed rev_logid, that will make it possible to easily separate them again, e.g. for deletions/undeletions of a group of edits. What specific issues/difficulties are people concerned about this "con" causing? Leucosticte (talk) 06:41, 19 October 2013 (UTC)

New table more secure by default
PRO: page JOIN already done in core. Also, most places want to join to get the page for other reasons anyway, so this has some "secure by default" nature to it. Aaron 20:23, 27 October 2011 (UTC)
 * How big of an issue is it? Don't these same issues arise with, say, the methods that allow raw revisions to be retrieved instead of seeing whether the user is supposed to have access to them? It doesn't seem like too much to ask that the extension writers select only the data that they're supposed to select given security concerns, and that system administrators not install extensions that have security flaws. Leucosticte (talk) 07:58, 19 October 2013 (UTC)

Page data
CON: What happens to the old page entry? Moved to another table? Aaron 20:23, 27 October 2011 (UTC)
 * What data are you trying to save from the page table? Presently, the only page table fields stored in the archive table are ar_namespace, ar_title, and Manual:Archive_table. With the new field scheme, ar_page_id would be obsolete (rev_page would suffice because the page ID wouldn't change across deletions and revisions). Nor do the namespace or title change. So it seems to me that there is nothing more to store.


 * Thus, the page row can just hold the new page's data or, if everything is still deleted, the old page's data. But if some data does need to be saved, perhaps it could be included in the serialized data in Manual:Logging table? Or, if that's unsuitable, we could add another log table field.


 * The proposed rev_logid would link to the log_id of the logging table row containing that metadata. I'm not sure what need there is for a whole new table; what would you use it for, that a field would be unsuitable for? Are there a lot of queries we anticipate running that would be more efficient if there were a new table instead of just a serialized field? Admittedly, it might be cheaper to query a relatively small table of deleted pages (what would we call it? deleted_page?) than the logging table, although I don't think it would be much cheaper. Leucosticte (talk) 07:58, 19 October 2013 (UTC)

Delete revisions on page deletion?
With either approach, we could mark as deleted all revisions or rely on the page change to not allow them to be accessed. The later favours fast deletion and undeletion. The first aims for consistency. However, deleting one revision, then deleting the full page and undeleting should have kept that revision gone...
 * This would be a bad idea: if some revisions of the page were deleted (e.g. for legal reasons), these would get inadvertently get restored when deleting and restoring the page. If we leave the revisions as they were, deleting and restoring the page will keep the deleted revisions intact. Duesentrieb ⇌ 08:36, 8 August 2013 (UTC)
 * I disagree; we could add another field, rv_logid, for the log_id of the deletion event. Then upon restoring the page, only the revisions pertaining to the pertinent deletion event would be restored. So, suppose you revision delete (log_id 1). Then you delete the whole page (log_id 2). Then you undelete the page. You only restore the revisions that have rv_logid 2, and leave the rv_logid 1 revisions deleted. Leucosticte (talk) 05:23, 7 October 2013 (UTC)

Semi-deletion, aka pure wiki deletion
It seems to me that the field option would work pretty well for semi-deletion, aka pure wiki deletion. Semi-deletion assumes that the average user will still want to routinely look at revisions of articles that have been removed from the AllPages listing for being not yet ready for prime time, e.g. because the information hasn't been verified or whatever. Collaboration can continue on these pages, e.g. if the user improves the article to the point that it is now ready to be undeleted. But he has to be able to see the old revisions in order to build upon the work that was done earlier rather than starting from scratch. Leucosticte (talk) 09:07, 19 October 2013 (UTC)

Moved from bugzilla:55398
Moved from 55398: Aaron: Would you consider it a bug or a feature that:
 * 1) When a page is deleted and then restored, it gets a new page ID;
 * 2) When a page is deleted and then recreated (i.e. a new page with the same page title is created), the new page has a new page ID (rather than the same page ID as the deleted page); and
 * 3) When a page "foo" (page_id 1) is deleted, and then a new page with the same page title "foo" (page_id 2) is created and then deleted, and then a new page with the same page title "foo" (page_id 3) is created, these three revision histories have different page IDs (in rev_page and ar_page_id)?

The "new field" proposal would change all three of the above, for good or bad. Any page recreations or restorations would put the revisions under the same page ID as the deleted page with the same page title. Thus, once a revision has a certain page ID, it will have that page ID forever. In this way, revisions deleted from a page that remains active (i.e. a revision deletion event) will be treated the same way as revisions deleted along with all the other revisions in the page (i.e. a page deletion event).

Relevant questions would be, what inconveniences are posed by having (a) page and (b) revision page IDs for a page title change with recreations and restorations; and what inconveniences are posed by having those page IDs *not* change? For example, are references to those page IDs stored in other database tables (of the core or extensions), so that those fields would need to be updated too when creation, restoration and deletion events occur? Are there some bots or other third-party tools that store page IDs and make API queries using them, whose work would be easier if the page IDs stayed the same? It might sometimes be desirable to query by page ID rather than page title, since page titles can change when pages are moved.

Despite all these revisions having the same page ID, it would always be possible to undo a deletion or undeletion event easily, because the revision IDs of the group of revisions deleted/restored in a log event would be stored in a logging table field (e.g. log_params). If it were desired to split off some revisions from the page and move them to another page's revision history, that could be done too, using that same data; and it could be undone just as easily.

So, for instance, suppose a vandal moves "foo" to "bar" and then the page is deleted; then "bar" is recreated, so that the two revision histories share a page ID. The "foo" revisions could be moved back to "foo" using the data in log_params. Also, because the logid of the deletion event would be stored in the revision table (in the indexed field rev_logid), one could easily select just those rows.

What are some examples of scenarios that would involve "title uniqueness annoyances" if the new field proposal were implemented? User:Leucosticte 2013-10-22 05:29:10 UTC
 * It's already the case that “When a page is deleted and then restored, it gets a new page ID”. That the new page has a different page_id than the deleted one will be required (and no problem). That the two threads of deleted pages have page_id looks actually as a feature (but note it will need to be merged if restored). User:Platonides 2013-10-26 15:30:33 UTC
 * It seems to me that as a general principle, it's better not to have primary keys change when they don't need to. In the case of pages, the page title can change when the page is moved and the page ID can change when the page is deleted and then undeleted, so there is no stable identifier of a particular page. This is a problem, e.g., when a template needs to refer to a particular page, because the template will have to revised either when the page is moved (if reference is made to page title) or when the page is deleted and undeleted (if reference is made to page ID).


 * I haven't seen a whole lot of use cases when this issue arises, but one example would be the exclude parameter at Extension:BedellPenDragon, which uses the page ID because it's matching it up against . Also, uses  and  rather than a watchlist.wl_pageid, probably because of the issue of page deletions/undeletions changing the page ID. This causes some quirky issues occasionally, e.g. what's described at https://encyclopediadramatica.es/Vandal/How-to#Wikia_Pagemove_Vandalism . Maybe the solution would be to implement the field proposal and then allow a watchlist row to be (1) the page ID or (2) either a page ID or a namespace and page title. I haven't fully researched yet what implications this might have, though. The second idea might just over-complicate things; the first is probably superior. Leucosticte (talk) 16:32, 20 November 2013 (UTC)

Undeletions
I was looking at the queries for undeletions. Currently, we have to insert the restored rows to the revision table and delete them from the archive table. Under the "new table" system, if someone deletes a page and then undeletes it, no changes will need to be made to the revision table. The new page will have the same that is already in.

But suppose there's a scenario in which a page is deleted, then recreated, then deleted, etc. and then all the revisions are undeleted. It would be necessary to change rev_page on the older revisions to the new page_id, so we'd still be doing some updates to the revision table. The same applies to situations in which a page is deleted, then recreated, and then the revisions from the deleted page are restored into the revision history of the existing page; rev_page needs to be updated then too, to the new page_id.

As long as we're already making updates, maybe along the way we would want to also update rev_deleted; there could be a const DELETED_PAGE = 16; if that would be useful. Then it would be possible when doing selects on that table to find out whether the revision is from a deleted page or not, without having to also query the page or deleted_page table. This could be helpful for the queries in SpecialContributions, if we want to merge it with SpecialDeletedContributions.

My guess, though, is that more commonly it will be a scenario of undeleting the most recently deleted page without there being a recreated page to deal with. So we could do a separate query to just select and update those rows with rev_page that need to be updated. Anyway, the "new field" option would probably be more efficient because the recreated pages would all have the same page_id, so it wouldn't be necessary to update the revision table when the revisions are restored. Leucosticte (talk) 01:07, 20 December 2013 (UTC)