Extension talk:PureWikiDeletion

To do list

 * Cause unblanked pages to show up in Special:NewPages
 * Have options in Special:Preferences for automatic watchlisting of pages that the user blanks or unblanks
 * Any API stuff?

Empty vs. #DELETED
Have you considered empty rather than a special string like #DELETED? Ashley Y 08:45, 10 March 2010 (UTC)
 * It's a possibility. Some editors have pointed out that #DELETED makes it clear that the intent is to delete rather than vandalize, but a vandal could just as easily replace all content with #DELETED. It saves keystrokes to use empty too. On the other hand, if we use #DELETED, then blanking is unnecessary; we can just add a #DELETED tag to the beginning of a page and leave the rest of the content, so that people don't have to dig through the history to find what was there. Then again, that possibility also exists with #REDIRECT, but in practice almost everyone replaces all the existing page content with #REDIRECT. Tisane 23:19, 10 March 2010 (UTC)

Blanking should be logged?
The current spec says: The blanking of a page already appears on watchlists and recent changes, so I think this is redundant and unnecessary. However, it would probably be useful to be able to get a list of all blanked pages and filter and sort it by user and time of blanking, as we do with the deletion logs. So, I will create a special page with that functionality, and I figure out a way to filter the blanked pages out of AllPages. (Right now it shows up in AllPages as redlinks.) Tisane 09:24, 11 March 2010 (UTC)
 * The replacing of all content on a page with "#DELETED," or the removal of "#DELETED" from a page should be logged, and listed on watchlists and recent changes like the other logs (the move log, the protection log, etc.)

Are all these features necessary?

 * Blank pages will not show up in searches using Special:Search, random pages using Special:Randompage, or the list of all pages using Special:Allpages.
 * The URL prefix to a blank page should be listed in robots.txt, and the page will emit "noarchive" and "noindex" and "nofollow" tags, to prevent caching in search engines.

I can see why people wouldn't want it to show up in RandomPage or AllPages (although I haven't figured out yet how to implement that). But given that these pages are blanked, won't the external search engines stop directing people there anyway? Also, it still shows up in the internal search as "page title matches," but as a redlink. Maybe it's not so bad for it to continue showing up there. If it's a search term people are looking for a lot, maybe the article should remain there, or at least be a redirect? Maybe it should show up in search accompanied by the edit summary explaining why it was blanked. Tisane 12:42, 13 March 2010 (UTC)

htmlentities
OK, is that good enough to clear up the XSS issues? Tisane 17:56, 13 March 2010 (UTC)

Code review
Since there is some interest in getting this used on Wikipedia, I'll provide some initial code review, though it will still need review from a staff developer before it can be enabled.

setup code
'version' => '0.0.0', If you aren't going to give a useful number, you might as well just leave it blank.

$wgGroupPermissions['*']['PWD']   = false; It isn't necessary to explicitly set it to false, as that's the default. Also, this right doesn't seem to do anything.

function PWDSaveHook
global $wgOut; This isn't used anywhere

require( "extensions/PWD/PWDConfig.php" ); All configuration should be done in LocalSettings

$con = mysql_connect($yourHost,$yourUsername,$yourPassword); You should use the wiki database rather than creating your own, see Manual:Hooks/LoadExtensionSchemaUpdates for how to set up the database using update.php. MediaWiki also has its own classes for dealing with the database. See Manual:Database access

$sql=sprintf("INSERT INTO blanked_pages (blanked_page, blanking_user, blanking_timestamp, After you switch to using MediaWiki's database classes, you should use the wrapper functions for queries, as these automatically handle quoting and escaping

mysql_real_escape_string($article->getTitle)); Article::getTitle returns a Title object, not a string. As written, it should use $article->getTitle->getPrefixedDBkey, but it should really be storing the title and namespace index separately or just the pageid.

function PWDLink
global $wgRRAnonOnly, $wgUser,$wgOut,$wgDBserver,$wgDBuser,$wgDBpassword,$wgTitle, $wgServer, $wgScriptPath;

AFICT, $wgRRAnonOnly doesn't actually exist and none of these are used.

$PWDActivated=true; if ($PWDActivated==false){ $PWDActivated will never be false because you set it to true in the line directly above. Also, you should use either !$PWDActivated or $PWDActivated===false, depending on whether you need type checking as well

if( $target->isKnown ) { return true; } else { return true; } Just "return true" is shorter.

if ( in_array( 'known', $options ) ) { The existence check may not have been done at this point. You should probably use the LinkEnd hook.

function PWDEditHook
$date = date('M d, Y H:i:s', mktime($hour, $minute, $second, $month, $day, $year)); You should pass the timestamp directly to wfTimestamp

$user_url=$wgServer.$wgScriptPath.$URLStructForUserPage.$userPage; Use $wgUser->getSkin->link to make a link. Same for all the other links

$output=" A former version of this page was blanked by ".$user_HTML This should set a message in the i18n file and call it with something like wfMsg

parser function functions
These should call and check $parser->incrementExpensiveFunctionCount since they need to do a database query for each usage.

return htmlentities($param2); You shouldn't escape the output from the parser functions, as it still needs to go through the parser

PWD_body.php
p_ID int NOT NULL AUTO_INCREMENT, This should use the same prefix "blanked_" as the other fields

$sql="CREATE INDEX userind on blanked_pages (blanking_user)"; As far as I can see, "blanked_page" is the only thing used in a WHERE condition, so none of the other fields need an index.

PWDConfig.php
As noted, all configuration should be done in LocalSettings, and it shouldn't use its own database.

date_default_timezone_set ( 'America/Denver' ); This shouldn't be set in an extension unless there's a really good reason

Code review discussion
You may want to look over pages like Manual:Coding conventions. Mr.Z-man 05:02, 14 March 2010 (UTC)
 * Thanks for the code review; I will get to work on these items immediately. Tisane 07:19, 15 March 2010 (UTC)

Known bug
There is a known bug involving the improper use of $wgUser that causes the user and talk pages on the edit screen to be incorrectly linked, but it will be fixed in version 1.0.0. Tisane 02:53, 19 March 2010 (UTC)

Draft code posted
I posted some draft code. Have at it, Kim. If you start doing a major overhaul, you may wish to use the Template:Inuse. And feel free to add yourself to the introductory comments as a co-author. Tisane 03:50, 19 March 2010 (UTC)
 * Hmm, you don't have svn access? --Kim Bruning 21:47, 20 March 2010 (UTC)
 * I applied for it about a week into my programming endeavors, but didn't get a response so I figured I'd wait a little while before making further inquiries. Tisane 07:25, 21 March 2010 (UTC)

More Code Review
Just a point about your SQL schema. Fields within a table should use a consistent prefix; currently some of the fields have "blanked" and some use "blanking". The prefix should also be short; I'd suggest something like "blank". Also, timestamps should be varbinary(14), and ids should be UNSIGNED INT:

Happy ‑ melon 13:14, 19 March 2010 (UTC)

I had to moved the UNSIGNED around to get it to work with phpMyAdmin:

Tisane 12:56, 20 March 2010 (UTC)


 * Groovy. Also, I just noticed the CamelCase in parentID; we prefer either blank_parent_id or just blank_parent (the fact that it's an id is obvious from the type).  Happy ‑ melon 18:45, 20 March 2010 (UTC)
 * Do you notice any security issues, or can MaxSem's XSS notice be taken down? By the way, I'm having a bit of trouble with the wfTimestamp and $wguser->getSkin->link changes suggested above. Do you have some code or documentation you can point me in the direction of? Thanks, Tisane 19:25, 20 March 2010 (UTC)
 * Yes. A user Foo" href="http://evil.site blanking pages pirates the user page links AFAICT, as would blanking a page of the same name.  Far and away the easiest way to cure that is to solve the other major problem, which is hardcoded messages with no possibility for localisation.  As you can see, the parser has no problem safely displaying the hostile link, and will safely display any internal page link.  So if you put your notice text in a sytem message in /PWD.i18n.php as:

A former version of this page was blanked by User:$1 (talk) (contribs) on $2 The reason given for blanking was: $3. You may [ view the article's history], [ edit the last version], or type new page into the white space below.
 * And bung that through the parser, it'll be sure to come out safely (or if it doesn't, there are much bigger problems with MW). And the message can be translated into all three hundred languages MW supports.  And to get that HTML back again you need to call


 * Note that the reason will be escaped by the parser; no need for htmlentities. So you don't really need the link* functions.
 * Other notes:
 * You can parse MW timestamps to Unix with wfTimestamp( TS_UNIX, $timestamp ), no need to use string splitting
 * Use $wgTitle->exists instead of the db call in lines 119-122, 251, 275, and all the doesPageExist functions.
 * What is the function of $whatElse on line 96?
 * Implement styling with CSS, not hardcoded styles. So instead of adding $blankLinkStyle etc, add a class and style it in a /pwd.css (lowercase filenames for outward-facing files).  Note that 'blue' is not the default colour for wikilinks; but you shouldn't need to style ordinary known/broken links, only blank ones.
 * Keep up the good work! Happy ‑ melon 20:57, 20 March 2010 (UTC)

Next round
A few more comments: -- Mr.Z-man 22:11, 20 March 2010 (UTC)
 * In PWDLink, you can special-case (i.e. ignore) things that aren't in editable namespaces so that you don't spend time looking up special pages, Media: links and system messages (these are treated as bluelinks even if they don't exist locally)
 * In PWDEditHook, you should use the user's preferred date format. . This will also adjust it if the user has a different timezeone set.
 * In efPWDParserFunction* the correct way to increment the expensive function count is to call $parser->incrementExpensiveFunctionCount - this will return true or false to say whether to proceed. The parser functions should also probably return something, even if the expensive function count is exceeded. I think #ifexist returns false.
 * This can also special-case special pages and the like as noted above.
 * You should also check that $title isn't null, which would indicate an invalid title
 * You could also eliminate the (direct) query on the page table by getting the pageid with $title->getArticleID. It will return 0 if the article doesn't exist. And if the pageid has already been loaded for some reason before, it may not have to do a DB query.
 * In the DB schema, you don't need to add _ind after all the index names. Unless its an index on multiple columns, you can just use the column name.
 * It also isn't necessary to have an index on columns that will never be used in a WHERE or JOIN clause - these take up unnecessary space and can slow down writes.
 * You may want to split the hook functions out to a _body.php or .hooks.php file and make them static functions of a class. Then you can load the class with $wgAutoloadClasses and call the static functions in the hooks. This is sort of the de facto preferred style for extensions (at least for longer functions).
 * You should also hook on ArticleDeleteComplete to remove deleted pages from the blanked_pages table.

Combined response to Happy-melon and Mr. Z-man
$html = wfMsgExt( 'pwd-blanked', 'parse', array( $blank_user_name, $date, $blank_summary ) ); $editPage->editFormPageTop .= $html; What I get is this string: "<pwd-blanked>" rather than the actual message. Any insight into what I'm doing wrong? Tisane 13:31, 21 March 2010 (UTC)
 * When I put:
 * When I put:
 * When I put:
 * When I put:
 * When I put:
 * When I put:
 * When I put:
 * When I put:
 * Worksforme. What version of MW are you using?  If pre-1.16, you need to call wfLoadExtensionMessages('PWD')</tt>. Happy ‑ melon 17:57, 21 March 2010 (UTC)
 * whatElse is used to make sure that the function doesn't attempt to turn links to history, contributions, etc. into edit links. Is there a better way?
 * I'm sure there is; currently the method breaks on links like [ changing languages] or [ adding editintros]. Happy ‑ melon 17:57, 21 March 2010 (UTC)
 * CSS: I wasn't able to figure out how to implement this in a situation in which the style is changed through $customAttribs['style']. Any more specific suggestions?
 * You shouldn't need to have a config setting for styles; you just apply default styles with CSS and wiki admins can customise them in MediaWiki:Common.css as desired. Happy ‑ melon 17:57, 21 March 2010 (UTC)
 * Split hook functions out to hooks.php file and make them static functions of a class: I wasn't able to figure this out either. If you show me, or point me in the direction of a simple extension that implements this, I can document it at Manual:Parser functions for the next person.
 * See, for instance, AbuseFilter.hooks.php and the hook declarations in AbuseFilter.php. Happy ‑ melon 17:57, 21 March 2010 (UTC)
 * Filtering by summary for anything except possibly exact matches and a prefix search would likely be too inefficient for use on Wikimedia. This is why there's no system for searching edit summaries and log summaries in core MediaWiki. For the messages, you'll need to call  in the function that calls the message, though I believe this is no longer required in 1.16. For the CSS, you wouldn't change the style attribute, you would change the class attribute and load a CSS file. For an examples of splitting the hook functions, AbuseFilter is set up like that. Mr.Z-man 17:20, 21 March 2010 (UTC)

Response
By the way, do you think it is necessary to have a blank_id primary key or can we just use blank_page_id? No pages should be showing up twice in the table, but I don't know if there are other uses of blank_id that we might anticipate. Thanks, Tisane 23:40, 21 March 2010 (UTC)
 * Still working on CSS.
 * Still working on CSS.
 * Still working on CSS.
 * Still working on CSS.
 * Still working on CSS.


 * Blank_id would be useful if it were a history table: one that preserves rows and is only added to, like revision</tt> or logging</tt>. So if you were using it as a log of who blanked pages when, it would be useful to have an id to refer to.  You're only currently using it as an attribute table; rows are deleted when the page is unblanked and so things like the summary are lost on unblanking (which might be a concern in itself).  For transparency, the log of blankings should probably be permanent; probably the easiest way to do that is to log the action, user, timestamp, and summary to the logging</tt> table and store the log_id in the blanked_pages table.
 * Essentially, if the table is a permanent record, it should have an id field; if it's transient (which it currently is) it doesn't need one. Happy ‑ melon 00:04, 22 March 2010 (UTC)
 * I've thought of a couple ways permanency could be implemented. There could be a boolean field to indicate whether the page is currently blanked, and the rest of the table could be kept the same. Then each page would still have a maximum of one row in the table, but we would also still lose the historical data of past blankings/unblankings. I'm not sure whether that change accomplishes anything, beyond making sure that a particular blank_id permanently refers to the same page. In which case, blank_id can still be dispensed with and blank_page_id used instead as primary key.


 * Or (preferably) it could be a log, as you suggest, and the PWDLink function would either check for something like that above-mentioned boolean field to see if the page is still blank, or it could look up the most recent revision text to see if it's empty, or it would see which log entry (blanking or unblanking) is the most recent. I agree transparency is good because some blanked/unblanked pages may end up getting deleted eventually, and the only record non-sysops will be able to access is the log. I'll probably take a closer look at Extension:GlobalBlocking to see how new types of log entries were added to Special:Log.


 * By the way, should we be concerned about page blankings triggering Extension:AbuseFilter with lots of "references removed," etc. tags? What about oversighting - do we need to provide for the purging of blanking logs? Tisane 02:43, 22 March 2010 (UTC)


 * AbuseFilter filters can and will be rewritten with PWD in mind, and there's no reason why they shouldn't trigger on blankings if they want to. Oversight will definitely be needed, especially if the log is permanent.  If you use the logging</tt> table, oversight is already built in; otherwise you'll need to implement it yourself, which would be a pain.  If you did it that way, you can just use your page_blanked</tt> table as an attribute list; the logs for the page would contain a permanent record of who blanked what, when, and why.  Happy ‑ melon 12:12, 22 March 2010 (UTC)


 * I may have been a little confusing when I talked about returning something if the expensive function count is exceeded. I meant that ifexist returns the "false" condition (i.e. acts as if the page doesn't exist) if the count is exceeded, not that it actually returns false. Mr.Z-man 04:04, 22 March 2010 (UTC)
 * OIC. Either way, the user of the page is kinda hosed if the count is exceeded. Tisane 05:11, 22 March 2010 (UTC)

Logging
OK, I added logging but there are a couple issues. It shows up in Special:RecentChanges but as two separate entries - the edit and the blank log entry. Should it only show up once? Also, I haven't figured out yet how to get the log entries to actually display in Special:Logs, although the blank log does show up in the drop-down menu. I did verify that the log entries are being added to the  table. Tisane 03:17, 23 March 2010 (UTC)
 * You need to add a handler to $wgLogActionsHandlers</tt>; in much the same way as hooks. You could find a hook which would let you set the RC_SUPPRESS flag on blank edits so they didn't show up in RC duplicated (or on the log entries, which might be easier).  Happy ‑ melon 11:38, 23 March 2010 (UTC)

Now just the blank logs show up, and the blank log works properly. Do we really need a blanked_pages table, by the way, or would it be just as well to check page blankness with: Is it less expensive to use a blanked_pages table? Also, I seem to have solved half of the caching problem by using  whenever an article is blanked. (Hopefully that won't have any unintended consequences.) But I haven't figured out how to solve the other half, which is making links to articles turn blue again after the article is unblanked. If article 'foo' links to blank article 'bar', and 'bar' is then unblanked, 'foo' will still have a red link to 'bar' until 'foo' is edited. Tisane 12:36, 24 March 2010 (UTC)
 * You'll need a blanked_pages table if you want to add a list of blanked pages. You should have a look at what the undelete interface does to avoid caching issues. Happy ‑ melon 20:01, 24 March 2010 (UTC)

This code did the trick: Tisane 20:27, 24 March 2010 (UTC)

I notice that it only logs logged-in users' blanks/unblanks; blanks/unblanks done by IP addresses go unlogged. I guess we could limit blanking/unblanking to logged-in users. I don't like that for policy reasons, but it's not unheard-of to limit the rights of non-logged-in users; they can't upload or move pages, for instance. In fact, I don't think they can do any of the actions that are logged events. Tisane 01:46, 25 March 2010 (UTC)

I used ArticleSave to tell non-logged-in users "You must be a registered user and logged in to blank or unblank a page" and prevent the edit from being consummated. But it also gives them an edit conflict notice, which is inappropriate. Also, I was hoping that a way could be found to not have to test the same data twice; if there were a way to pass data from the ArticleSave function to the ArticleSaveComplete function, it would only be necessary to test that data once. However, it appears that flags are done through a bitfield that doesn't allow other flags (e.g. BLANKING and UNBLANKING) to be made up. Also, the messages break when the ArticleSave function is moved from PWD.php to PWD.hooks.php. I was hoping to put all those hooks in PWD.hooks.php. Tisane 03:27, 25 March 2010 (UTC)
 * What's the $messages</tt> global in PWDSaveHook? Happy ‑ melon 11:17, 26 March 2010 (UTC)
 * Another forgotten remnant of fruitless debugging attempts that has no business remaining in the code. Good catch. Tisane 14:29, 26 March 2010 (UTC)

NewPages
It has been suggested at w:Wikipedia:VPPR that pages, as they are unblanked, should show up in Special:NewPages. Is this necessary/desirable, given that such pages already show up in the blank log? Undeleted pages don't show up in NewPages, although I can see why people would view that as not being a big deal since only sysops can undelete pages. Page moves from userspace to mainspace don't show up in NewPages either, though; isn't that also a loophole by which people can evade NewPages? I guess people want the ability to see which unblankings have been patrolled and which haven't. Tisane 04:22, 25 March 2010 (UTC)
 * Since undeletions don't show up in Special:NewPages, I don't think there's a need for it. There's no guarantee an unblanked page will be the same as the previous though, so it may be a good idea to provide a configuration setting in case an individual wiki does want unblanked pages to show in Special:NewPages. Reach Out to the Truth 15:17, 27 March 2010 (UTC)

AllPages
It looks like it's going to be a pain in the neck to implement Special:AllPagesExcludeBlank by modifying code from the SpecialAllpages class, which implements Special:AllPages. I expect that the easiest way would be to add some $join_conds to the database queries to exclude those pages that are in the blank_table. But SpecialAllpages uses a bunch of functions like estimateRowCount and selectField that don't have $join_conds as a parameter. selectField shouldn't be a big problem; I can probably work around that using selectRow (which has join_conds). But estimateRowCount?

This would be a lot easier if the page table had a boolean field indicating blankness; then I could just add that to $where. I'm not sure it would be worth the expense to make an extra write to the database every time a page is blanked/unblanked, though. Hmm, I guess blankings/unblankings will probably be sufficiently rare that it won't matter that much, eh? Are there any other disadvantages of adding a field to the page table? Tisane 08:23, 26 March 2010 (UTC)


 * The crippling cost of trying to alter a table that already has thirteen million rows in it will mean the extension will probably never be installed on enwiki or other large WMF wikis. Is that a disadvantage?  :D</tt> Happy ‑ melon 11:15, 26 March 2010 (UTC)
 * Didn't they add a whole field, page_random, just for the sake of Special:Random? Maybe it'll be an easier sell after PWD is merged into the core... Tisane 14:31, 26 March 2010 (UTC)
 * Special:Random is a core feature, and was added in 2003 when enwiki was about 1% of its current size. Extensions should not generally make changes to core tables. Happy ‑ melon 23:24, 26 March 2010 (UTC)