Extension talk:InterwikiIntegration

From mediawiki.org

To do list[edit]

TODO:

  • Figure out what stuff, if any, will be moved to the core.
  • Create new wiki table containing: wiki_id (primary key), wiki_db, wiki_db_prefix, wiki_prefix (maybe), wiki_local, wiki_trans.
  • Maybe create a new global indicating the wiki_id (e.g. $wgLocalWiki or $wgLocalID) or deduce the ID from $wgDBname and $wgDBprefix.
  • Add wiki_id-related field to interwiki table.
  • Get rid of integration_prefix table; rely instead on interwiki table and wiki table. PWD settings will be contained in a configuration setting (an array with a key for each wiki) or in a PWD configuration table.
  • Get rid of integration_iwl_from_url from the integration_iwlinks table; rely instead on wiki table.
  • Replace integration_dbname in integration_namespace table with wiki_id.
  • Interwiki linkages in watchlist: User, user talk, contribs, block for both edits and logs entries. For log entries, move log and where the page is moved to.
  • Get it to work properly with PWD (right now, duplicate entries are appearing, and some entries are not appearing at all.)
  • Add wiki_id-related fields to interwiki table, recentchanges table, watchlist table, page table if that's the route we're taking; otherwise
  • Add wiki_id-related variables and interwiki functions to RecentChange.php, ChangesList.php, etc.


PWD[edit]

This should not implement PWD; that's a totally separate and unconnected functionality. By all means develop the PWD extension to work effectively with this, but slipping PWD into an unrelated extension is only going to lead to more work and difficulty at code review. Happymelon 22:03, 31 May 2010 (UTC)Reply

I'll look at the code and see if I can figure out a clean way to keep the PWD content in the PureWikiDeletion extension and the cross-wiki content in this extension. Tisane 23:05, 31 May 2010 (UTC)Reply
Remember that you can put hooks in extensions, too, if you need to link functionality together. Happymelon 10:31, 1 June 2010 (UTC)Reply

Caching[edit]

When a page is created/deleted/undeleted/etc., the caches of the pages on other wikis that link to it interwiki will need to be cleared, so that their interwiki links will turn red or blue, as appropriate. I'm pondering two possible approaches to this:

  • Have the extension read the iwlinks table on each of those wikis to find interwiki links to that page and then clear the caches for those pages (presumably using SquidUpdate->purge on the urls, which will be generated using url data from the interwiki table), or
  • Create a shared integration_iwlinks table that will combine all the data from the iwlinks tables and that will be updated every time that the local iwlinks tables are updated; and whenever a page is created/deleted/undeleted/etc., have the extension read that shared table and then clear the caches for those urls.

The former might require reading from hundreds of local iwlinks tables on large wiki farms whenever a page is created/deleted/etc., but the latter would require more writes, since it would be duplicating all those local iwlinks tables. Which would be more efficient from a performance standpoint? I assume that all other things being equal, it's better to read than to write, because there can be a lot of slave databases. Tisane 01:25, 5 June 2010 (UTC)Reply

Are you following the discussions on this issue on wikitech-l? Happymelon 10:10, 5 June 2010 (UTC)Reply
Are you talking about the "Reasonably efficient interwiki transclusion" discussion? I've been keeping an eye on it, since whatever ends up being implemented could end up making easier, or interfering with, or rendering obsolete, the work I'm doing on this extension. Anyway, I guess I'll just implement the first option since it's probably easier to troubleshoot. This extension does a lot of modification of other wikis' tables as it is (e.g. inserting the local wiki's prefix and url into other wikis' interwiki tables when Special:PopulateIntegrationTable is run). Tisane 14:03, 5 June 2010 (UTC)Reply

Database vs. globals[edit]

Mindful that globals are evil, I've tried to minimize their use through this procedure: (1) Set global configuration settings in InitialiseSettings.php, (2) have the extension's initialization functions (triggered by accessing a special page) read them and store the data in the database, and (3) have all other functions refer to the database. However, I realized that this isn't all that great of a solution, because (a) I'm still relying on globals for initialization/modification of settings, (b) the database needs to be updated via the special page whenever those configuration settings change, requiring more work than just changing InitialiseSettings.php, and (3) I'm introducing overhead by putting those extra database reads in there. So really I get the worst of all worlds. My thought was that the use of the globals might cease if some other means of populating those tables is developed, e.g., through API by a bot that hits all the wikis whenever something needs to change globally. (Not a great solution for wikis that don't want to fool with bots, though...)

I suppose, though, that the ultimate solution will be some sort of object, containing configuration settings, that will be passed to extensions via the hooks, right? I saw the discussion on the listserv about it, and though it would indeed break a lot of extensions, it seems like the cleanest solution and probably worth biting the bullet now to implement rather than later. So I suppose it's best to just design this extension to continue relying on global configuration variables whenever it will save database reads, in the expectation that later the code can be redesigned to get the data from that object? Tisane 23:04, 5 June 2010 (UTC)Reply

There's nothing wrong with storing configuration settings in code, rather than in the database; it is as you say significantly more efficient. One of this summer's GSoC projects is to create an on-wiki configuration interface, which will store config in the database, but which will retrieve it on first initialisation and cache it; it won't change the way the configuration is retrieved by other functions. For interwiki integration the problem is having only one global $wgArticle entity, one $wgParser, etc; that will probably be fixed in a major code overhaul, yes. So yes, I would design the code around the curent globals structure, but make it easy to convert with the rest of the codebase (ie, don't write globals anywhere except the extension's own config file). Happymelon 08:53, 6 June 2010 (UTC)Reply

Implementing Special:InterwikiRecentChanges and Special:InterwikiWatchlist[edit]

I can see that it's going to be kind of a bear to implement Special:InterwikiRecentChanges and Special:InterwikiWatchlist, mostly because SpecialRecentChanges and SpecialWatchlist.php, both of which I shamelessly ripped off (much as I ripped off the recentchanges table and watchlist table, adding only integration_rc_global_id, integration_rc_db and integration_wl_db), are pretty complicated and make reference to a lot of different functions and tables. E.g., I had to modify modifyDisplayQuery and add the revised version to my SpecialIntegrationRecentChanges class since it was hard-coded to use the recentchanges table rather than integration_recentchanges. No doubt, many such changes will need to be made. Understandably, the massive duplication of code troubles me, and I hope that at some point, we will be able to merge this extension, or at least parts of it, into the core, or at least make the core flexible enough to be able to work with the extension. My thought was that it would be better to produce a proof of concept to get people interested, however inefficient and ugly that proof of concept might be, and then they might support some core revisions when they see how badly they are needed. The other thing is that until I really delve into it and get something working, I might not even know what core revisions would be optimal.

My first step will be figuring out how the heck the existing RecentChanges and Watchlist classes and functions work to produce the existing functionality, and then I'll try to figure out the best way to implement the integrated versions of them. And whatever our GSoC friend ends up implementing for his reasonably efficient transclusion project, I'll have to work around/with, but I'll figure that out when I get there, I guess. I was originally tackling Special:RecentChanges, but I think I'll switch to working on Special:Watchlist since it's the shorter and perhaps easier project. I've refrained from committing the code for these classes until it actually works. Tisane 03:16, 16 June 2010 (UTC)Reply

Ugh! RecentChange and ChangesList will need to be revamped too. Tisane 02:03, 17 June 2010 (UTC)Reply

Wiki identifier[edit]

It occurred to me that using a database name as a wiki identifier is probably not that great of an idea, since some wikis on the same wiki farm may use $wgDBprefix and share the same database. On the other hand, one wouldn't want to use the site name as the identifier, since some sites may change names (e.g. Disinfopedia became SourceWatch). Interwiki prefixes (iw_prefix) aren't that great of an identifier either, since the same wiki may have more than one interwiki prefix leading to it (e.g., en:, w:, Wikipedia, etc.) So I think it's best to just go with an int NOT NULL PRIMARY KEY AUTO_INCREMENT as primary key, and link other tables to that. Tisane 13:12, 16 June 2010 (UTC)Reply