Talk:Revtagging

Separation of OLTP/OLAP
"No, in order to answer the above questions we will need to combine revtagging data and data from the MediaWiki database"

When combining revtagging data with online data in the database, can this be done through the API interface or something of this nature? I'm thinking that there is no need to record at-event-time the data that is online as long as the revtag is recorded with the event since the information can be extracted later.

Are there any other requirements here that require intermixing of transactional data and analytical logging? Tychay (talk) 00:22, 20 June 2012 (UTC)

Tagging using Bitmaps
It'll be fast and space-efficient to use bitmaps to implement tagging. enwiki has ~17m users.

MediaWiki core already has a revision tagging feature
I'm a little confused by this page. MediaWiki core already has a revision tagging feature. It's described at Manual:Tags. It involves the following database tables:


 * valid_tag;
 * change_tag; and
 * tag_summary.

Is this tagging system insufficient? If so, why? --MZMcBride (talk) 19:22, 13 November 2012 (UTC)


 * Good point! However, there are a couple concerns:
 * some analytical data doesn't need to be stored (A-B test campaigns) as RecentChanges tags. (You don't want Special:Tags to get all cluttered ;-).)
 * extracting data during analytical processing is expensive, especially when it is buried inside the RecentChanges db structure (not only that, it needs to be extracted from the ts_tags blob in the data field)
 * data is not efficiently stored in tags (the tag text is stored each time), these are more efficently stored as a bitfield or something similar, at least for analytics. (This is efficient for RecentChanges,FlaggedRevs,AbuseFilter, etc. since it is only one blob covering all those cases and the only time it needs to be accessed is when the revision is accessed.)
 * I think at times when this information is coincident, we can (and should) put it in the transactional database using ChangeTags::AddTags. I think what RevTagging is asking is that the Analytical system also be "pinged" with the same information via a call to the pixel service so that the database doesn't need to be joined with LIKE %tagname% query --Tychay (talk) 04:01, 16 November 2012 (UTC)
 * Note that if there was a use for it we could easily switch to a database format that allows individual tags to actually be queried. Daniel Friesen (Dantman) (talk) 04:30, 16 November 2012 (UTC)
 * Note that if there was a use for it we could easily switch to a database format that allows individual tags to actually be queried. Daniel Friesen (Dantman) (talk) 04:30, 16 November 2012 (UTC)


 * I'm not sure what you mean by "buried inside the RecentChanges db structure". --MZMcBride (talk) 04:37, 16 November 2012 (UTC)