Hit stats aggregation

From mediawiki.org

Page views[edit]

What's available from each hit:

  • page name
  • referrer:
    • local wiki/page
    • foreign URL
  • client:
    • geoip lookup (country or city-resolution?)

Image views[edit]

What's available from each hit:

  • image name
  • thumbnail pixel width
  • page number [for pdf, djvu]
  • referrer:
    • local wiki/page
    • foreign URL
  • client:
    • geoip lookup (country or city-resolution?)


  • Aggregation time resolution?
  • View style?
  • How much to combine / make available?


Note that Domas has tended to recommend flat files for this kind of info; it can eat a lot of database space, as you've got lots of per-row overhead.

If we're sticking these in MySQL might want something like this...

  • img_stats
    • is_id (int) primary key
    • is_img (varchar) -> img_name (or we could add a damn id to the image table!)
  • img_stats_period
    • isp_id (int) primary key
    • isp_img (int) -> is_id
    • isp_timestamp: start time
    • isp_period (int): number of seconds covered by this time period [5 minutes, 1 hr, 7 days, whatev]
    • isp_hits (int) -- total hits

Now for regional breakdowns:

  • img_stats_region
    • isr_id (int) -> isp_id
    • isr_country char(2)
    • isr_hits (int)

Breakdown by thumb size:

  • img_stats_size
    • iss_id (int) -> isp_id
    • iss_size_min (int) <- break down into ranges since we allow open-ended sizes :P
    • iss_size_max (int)
    • iss_hits (int)

Breakdown by source?

  • img_stats_referer
    • isr_id (int) -> isp_id
    • isr_referer -> sr_id
    • isr_hits (int)
  • stats_referer
    • sr_id (int) primary key
    • sr_url varchar(255)
    • potentially some annotation abilities?

Counter history view[edit]

Survey of existing hit history UIs:

trendingtopics.org (not available as of April 2014):

stats.grok.se (linked at the top of some Wiki's History pages):

Further aggregation and use cases[edit]

  • Aggregate page/image hits per category
    • (eg for letting GLAMs know how much their files are being used)
  • Identify the most active new pages
    • ^ and compare against lists of failed searches to see how well new activity is serving people on the smaller sites