User:Jeblad/Temporal statistics

Temporal statistics are an adaption of the mw-core to allow calculation of temporal statistics for the last hours, days or weeks.

In the basic configuration it is intended for low traffic sites without Squid servers in its configuration without caching of traffic data, that is $wgHitcounterUpdateFreq less than or equal to 1. If the traffic is large there will be internal caching to reduce the load impact on the database server if $wgHitcounterUpdateFreq is larger than 1.

In the basic configuration without squids no external infrastructure is necessary to manage external logfiles. If there are squids in front of the web servers a maintenance script must be used to populate the database tables with statistics data. The data from the squids will then be integrated into the same presentations as usual traffic.

The adaption uses ring buffers to hold temporal statistics, as default either no ringbuffer or a ring buffer with one slot with collected statistics (that is a ring buffer with two slots). One slot in the ring buffer accumulates statistics, then will move on to the next one when the unix time epoch goes from one hour to the next, from one day to the next or from one week to the next. The length of the ring buffers, less one, will give the maximum length of the statistics with the given resolution. In the default configuration that means the previous whole hour is collected in a single slot, the previous whole day or the previous whole week.

Only real pages gets statistics, that is no special pages will have statistics. This should have few practical consequences.

Special pages
It is possible to build several special pages for various use. Actually they will be one and the same underlaying system but with some additional filtering mechanisms. Each of the pages can produce statistics from a single column in each ring buffer, accumulated statistics backwards in time from the last completely filled slot, difference trends, etc.

Single page statistics
This will be a special page to get statistics for a single page. The call syntax is Special:Pagestatistics/pagename  and it will display the collected statistics for both global statistics and temporal statistics for this page. Presumbly the page will show all forms of statistics on a single page.

This page is most likely the only one that can't use a common layout.

General page statistics
This will be a special page to get comparative statistics for the most viewed pages. The call syntax is Special:Pagestatistics  and it will show the number of page views as a number and as a bar for each page. The user will have to choose one of several pages, given the timespan for the statistics, and it will be possible to add several filtering mechanisms like those on Special:Recentchanges.

Watched pages statistics
This will be a special page to get comparative statistics for those pages that a user has on the watched pages list. The call syntax is Special:Pagestatistics,type=watchlist  and it will show the number of page views as a number and as a bar for each page. The user will have to choose one of several pages, given the timespan for the statistics.

User created pages statistics
This will be a special page to get comparative statistics for those pages that a user has created. The call syntax is Special:Pagestatistics,type=newpages,user=user  and it will show the number of page views as a number and as a bar for each page. The user will have to choose one of several pages, given the timespan for the statistics.

Extensions
Several extensions could possibly use temporal statistics to allow for adaptive changes. Especially Extension:Intersection and Extension:DynamicPageList. By using temporal statistics the lock in effect of global statistics can be avoided and any listings will dynamically change over time.

Note that those extensions has to implement the same ring buffers, and can't assume that any specific column in the database are set and available at any given moment.

Update script
To import data from the Squid logs, there will be a script to filter and preprocess them. This script will typically run each hour.

Maintenance script
There will be a maintenance script to ease adjustments of the database scheme. If possible this script will try to rearrange data in the tables so to keep as much as possible of the collected statistics. This is done by altering the table to add temporary columns, then moving data into those columns, and then finally altering the table again to remove the original columns and renaming the new columns as the original ones.

In this process the tables will either loose slots, then the statistics will be truncated, or will gain additional slots, then the statistics will have empty trailing slots.

Configuration
Additional configuration in the, with adaptions in

The configuration are used for modulus operations for calculating the present slot. That means any changes of the configuration will trash previous collected statistics. A maintenance script can be made to recalculate slot indexes, and then reorganize the database accordingly.

If the length of the ring buffers are changed, the table has to be kept temporarilly, the new table created and the statistics reorganized and moved back. This is the only way to generally keep the data in the slots. If the ring buffer is shorted, then old data are forgotten but the ring will be filled. If the the ring buffer is lengthened the ring will only be partially filled.

Typical configuration
There are a few typical configurations which are especially interesting.

One configuration of the hourly ringbuffer is to use a ring buffer with three slots, allowing two slots to be filled at a time. This is the smallest ring which allows differences and therefore can produce trends.

Another configuration uses 25 hourly slots, allowing one to compare a day from a sliding window with a fixed day.

Altering of page table
The following shows how the database table page are changed to accommodate the new functionality. If it is necessary to use longer time series the number of slots for,   and   are increased accordingly.

Note that some code should be added to verify that the number of slots stays in accordance with the definitions of,   and  .

Note that the previous reflects the database scema used for. If the numbers are changed the database scema must be changed accordingly.

Altering the site_stats table
The following shows how the database table site_stats are changed to accommodate the new functionality. If it is necessary to use longer time series the number of slots for,   and   are increased accordingly.

Note that some code should be added to verify that the number of slots stays in accordance with the definitions of,   and  .

Note that the previous reflects the database scema used for. If the numbers are changed the database scema must be changed accordingly.

Altering the hitcounter table
The following shows how the database table site_stats are changed to accommodate the new functionality.

Patch for Article.php
The patch is an adaption of  to use additional columns in the database table for the page, typically named mw_page.

Note the incViewCount -function does not include the code for bulk updates.