Talk:Analytics/Hypercube

Phrasing
I suggest to have the first four bullet points over what the tool eventually should be able to do rephrased as: Right now you may get the impression that you may never get the Top 10 articles by pageviews from e.g. eswiki. Ainali (talk) 21:16, 9 October 2013 (UTC)
 * Top 10 articles by pageviews (optional: by project(s))
 * Top 10 countries by pageviews (optional: by project(s), by monthly timeseries from 2012-01-01 to 2012-12-31)

What questions are we trying to answer ...
I think the key question is indeed the questions. What questions do people really want answered about WP (or other projects)? In my observation, while the "whole of wiki" questions e.g. "what are the top 10 articles on en.WP" might be interesting, they are usually not "mission-critical". I think what we often want to know is whether some strategy is working. Some examples might be:


 * For a GLAM partnership, there will be a set of pages that have some "connection" to the GLAM, e.g. use a photo provided by the GLAM, have edits done by editors associated with the GLAM (might be staff or volunteers), contain external links or citations back to the GLAM's own web site. Metrics around these pages "with connection to GLAM XYZ" are important to show the GLAM that this is an effective strategy for them. Such metrics might be number of page views, number of click-throughs on images/media and on links (external or references) provided by the GLAM. They may be interested in knowing how their "performance" in the world of WMF compares with other GLAMs. Clearly if one GLAM adds 1000 photos and gets more hits as a result than another that adds 1000 external links or whatever, then the GLAM wants to know this.


 * For a chapter or thematic organisation, they may be interested in knowing how content about their country or topic is "doing". By "doing", we might be interested in number of articles, length of articles, quality of articles, number of links into those articles, etc, and the rate of change of these. For example, does quality of articles (as assessed, or measured by some citation-to-length ratio) in any way relate to the number of page views it gets, or is page views inherently related to the popularity of the topic irrespective of the merits of the WP article?

My point is don't build a tool to ask "imagined questions". Do it in true "agile" style with some customers in the loop who have *real* questions they want to ask about.

One of the things that I suspect people will generally want is metrics over categories. One area where our tools seem very weak is working with categories recursively. If WMAU want to know about what's happening in Australia, it wants to know about articles that are directly or indirectly in Category:Australia, not just those few that are directly in Category:Australia. Kerry Raymond (talk) 21:52, 9 October 2013 (UTC)
 * Hear hear. I completely agree with using 'real customers' aka 'real questions' instead of hypothetical usecases. I could definitely give a few usecases for questions (although i think most are already covered by Kerry and Magnus beneath), but please do contact me if you need a sparring partner or help with defining a few user stories. Husky (talk) 13:57, 10 October 2013 (UTC)

GLAM
So here are my GLAM-related needs: I'm so glad and excited that this finally takes shape! --Magnus Manske (talk) 09:02, 10 October 2013 (UTC)
 * Get page views for a large number of pages (say, 100 sets, 1K-10K pages per set, sometimes more) in many projects
 * Need monthly views
 * Would be nice-to-have daily views as an option
 * Because of the large number of queries, SQL access would be nice-to-have
 * Page names need to be exactly as the page_title in the page table
 * Page names should not be prefixed by namespace, rather have numerical namespace ID as separate property
 * nice-to-have: Web API having cross-domain exception for Tools Labs, to allow POST queries from JavaScript (GET with lots of page titles is a problem...)


 * I use the existing grot stats a lot. It would be great to have categories and category trees. Easier long terms stats also. Also the ability to combine two or more articles in a graph (especially given the effect of a name change). Some simple analysis would be useful, given the annual rhythms most articles have. Ideally the origin of views, if that's possible. Johnbod (talk) 16:45, 10 October 2013 (UTC)