User talk:Pinkgothic/DPL

"If your wiki encourages non-Template page transclusion, you'll want to run refreshLinks.php as a cron job (crontab -e, enter 0 * * * * /path/to/your/mediawiki/maintenance/refreshLinks.php for an hourly run). This is still a better solution than using the job queue for this purpose. Seems like better advice would be to just run the job queue via runJobs.php and disable running jobs from web interface. Also, wouldn't simply using the dplcache parameter accomplish everything you do here, but without the side effects of making the job queue not work for non-templates? Bawolff 21:12, 28 January 2011 (UTC)


 * Hi, Bawolff, sorry for the late reply, it didn't notify me of the change to the discussion page ! (Found out why; apparently I set my preferences not to ages ago. Epic fail! *harakiri*)
 * We use the cache parameter on all our DPLs, it doesn't make a difference to this scenario. mediawiki seems to bypass that setting entirely when it rebuilds the links.
 * Regarding your  comment, that may actually be necessary.* The if-statement comes with the downside that the main link DPL generates is also lost (the not-(transclusion) one). :[ I'm still trying to figure out exactly why that is or how to fix it. I think that might be a case of the cached-results setting actually being a bad idea. v_v I'm going to see what I can find out there.
 * Point being, refreshLinks is unfortunately pointless. I'm not sure why that is, but it doesn't seem to do anything at all. :|
 * * The reason I didn't suggest, by the way, is because the jobs put into the Job Queue are largely null edits to begin with. My aim was to get rid of the unnecessary overhead.   on our wiki would run for about seven minutes and then be done. A   cron would... actually probably never terminate, since at least during mid-day we'd get in jobs faster than the jobs were finishing (back when I was trying to figure this problem out, I was manually triggering   to see if taking jobs off their request-trigger was speeding things up considerably (it was!), that ran absolutely forever - it did terminate, but usually well after running for an hour). Additionally, running other jobs, like "proper" transclusion (read as: templates) I still wanted to have web-triggered.
 * -pinkgothic 10:33, 10 February 2011 (UTC)


 * Turns out the problem isn't my fix (directly). Commenting the if-Statement back out still has the same issue with newly created pages - but a DPL-query page -Job evidently eventually fixes it. So each time an established page was edited, we got the new pages in. Now we just get the effect when we null-edit the DPL-query page(s). &action=purge does nothing, interestingly enough. *headdesk*
 * Guess it's back to the drawing board. -pinkgothic 11:03, 10 February 2011 (UTC)
 * I'm not really sure how DPL does caching, but I do know it invents its own way instead of using the mediawiki way, and thus its possible its caching might not be implemented that well (it looks as if it wouldn't work well on multi-server setups, it looks kind of scary when considered with apache content negotiation). refreshLinks.php doesn't render extension hooks (including DPL), which is why it might be faster then the jobQueue. The jobQueue does some other stuff to (like email notification) which might slow it down compared to refresh links, but really should not. Bawolff 13:41, 10 February 2011 (UTC)

* nods* "refreshLinks.php doesn't render extension hooks (including DPL)" - That'll be why it doesn't heal the DPL query pages, and indeed why it's faster. (After doing some math,  on our wiki, if it loaded extension hooks, would take at least ~15 minutes, which is more than the seven-ish I've observed.) And presuming purge does nothing to the link list?

Anyway, I went back to the drawing board and came up with this on short notice (it desperately needs cleaning (read as proper mediawiki DB class usage, more OO, et cetera) so it sucks less, but I needed something to patch things up rather quickly *grimace*):

&lt;?php include(dirname(__FILE__) . '/../includes/WebStart.php'); // why are we WebStarting if we want to CLI? FIXME if ($wgDBtype !== 'mysql') { throw new InvalidArgumentException('fixDPL.php only supports mysql'); } //$wgDBport = "5432"; $dbConnection = mysql_connect($wgDBserver, $wgDBuser, $wgDBpassword); mysql_select_db($wgDBname, $dbConnection) or die('Could not connect to database.'); $dplQueryCategory = 'DPL_query_pages'; $result = mysql_query(    'SELECT '         . 'p.page_title AS title, p.page_namespace AS namespace'     . ' FROM '         . $wgDBprefix . 'page AS p LEFT OUTER JOIN bpdp_categorylinks AS cl ON cl.cl_from=p.page_id'     . ' WHERE '         . 'cl.cl_to=\  . mysql_real_escape_string($dplQueryCategory) . '\       . ' ORDER BY '         . 'p.page_title ASC',     $dbConnection ); while ($row = mysql_fetch_assoc($result)) { $map = array(        'job_namespace' => $row['namespace'],         'job_title'     => $row['title'],         'job_params'    => 's:0:""',         'job_cmd'       => 'refreshLinks'     ); foreach ($map as $c => $v) { if ( !is_numeric($v) || ( ((string)(int)$v) !== $v) ) { $map[$c] = "'". mysql_real_escape_string($v). "'";        }         $map[$c] = $c. '=' . $map[$c]; }    $sql = 'INSERT INTO '. $wgDBprefix. 'job SET '. implode(', ', $map); echo $sql. PHP_EOL; mysql_query($sql, $dbConnection); } echo 'Executing runJobs.php ['. date('Y-m-d H:i:s'). ']...' . PHP_EOL; ob_flush; passthru('php "' . dirname(__FILE__) . '/runJobs.php"'); echo 'Done with runJobs.php!'. PHP_EOL;

It's a hackish thing that requires DPL query pages to be in a category 'DPL_query_pages', because that was quicker to do than change DPL around to include the category automatically or create another table to track DPL-usage (or rather, quicker than trying to understand mediawiki extensions quick enough to do either - I literally have zero experience in writing one of my own).

This solution is very similar to your suggestion to cronjob  - except that it collects and executes the DPL-specific jobs separately, so other jobs can still run through the web, and you don't end up with a ridiculously long job queue in the meanwhile. It has the caveat that some of its jobs will be caught via the web if the cronjob doesn't run, say, at night (providing your wiki is in local use only).

I'm going to have to look into writing a little patch for DPL, though, obviously I can't release our, that doesn't work 'out of the box'. :)

By the way, I found 76 DPL query pages on our wiki, clustered about eight templates (with includeonly-categories). The associated categories contain the following amount of pages: 38, 81, 199, 238, 268, 280, 322, 941. So that's the exact numbers at time of writing of what was killing our wiki. *cringechuckle* We really can't live without the if- fix, sadly.

-pinkgothic 15:01, 10 February 2011 (UTC)


 * Hmm, the caching code in DPL is really weird. When it caches the dpl, it disables mediawiki's normal cache for the rest of the page. (The caching code might also allow an attacker to execute code on the server for some configurations of apache). Bawolff 19:48, 10 February 2011 (UTC)


 * * grimaces* Ouch, that doesn't sound promising. Thanks for the heads-up! Means I should take a look at it, I was supposed to analyse some of DPL's security vulnerabilities anyway. Oh, to have time for things! v.v &lt;/truckload of woe&gt;
 * I'll probably get back to this tomorrow (the 'provide a proper patch for new/deleted pages' part), providing I can sneak in some time for it then.
 * -pinkgothic 20:04, 10 February 2011 (UTC)
 * The issue probably would be easy to fix. The issue is that it asks the user for the cache filename (really it should do something like md5sum of the dpl so its not a user issue. Really it should use mediawiki's internal cache support.). It doesn't validate this filename properly, allowing user to put it in any directory they want. Furthermore, default is web accessible. It appends .txt to the end of the user's filename, however apache takes all extensions into account depending on how things are setup foo.pl.txt might be interpreted as a perl script (or other scripting language). The XSS issues would be significantly harder to fix. Bawolff 22:00, 10 February 2011 (UTC)