Topic on Talk:Wikistats 2.0 Design Project/RequestforFeedback/Round1/Site dashboard

Aubrey (talkcontribs)

Hope this is the right section... My issue is that current metrics are based on Wikipedia, while other projects (e.g. Wikisource) do have a different architecture. For example, Wikisource "should" have edits at the "book level", which is unfortunately impossible with the current software architecture. But a feasible hack could be summing up edits of different subpages and give them to the main page. This is doable both in main namespace and Index/Page namespaces. This alone would be a major improvement, for the Wikisource community.

Milimetric (WMF) (talkcontribs)

After following up with @Aubrey, here are some details:

Wikisource has this "book" entity, which is a logical entity but it's not baked in the software; so, every book has, often a set of main namespace pages, one or more Index pages, several Page pages.  The simple, more common model: 1 Index, n pages, 1 ns0 page, m subpages.  Good to have: stats relative to the 1 ns0 page and its m subpages, and possibly stats about 1 Index page and its n Page pages.  The ns0 pages are for "readers" to read, pure text.  The Index and Page pages are for "editors" to proofread and validate, this is the main difference between WP and WS.  We have "two" places: one for readers, one for editors.  Of course there are many corner cases, but this is the main structure.

Aubrey (talkcontribs)

Also, I may add that a snapshot of how much *all books* are read would be informative. For example, a graph that shows the very likely power law of Wikisource books could inform editors to give priority to certain books/topics/subjects/authors over others. Please bear in mind that Wikisource is 100s times smaller than Wikipedia, so many things are possible for us that are not possible (due complexity and "cost") for Wikipedia. For Wikisource, it would not be impossibile to have charts/graphs regarding *all books/all users/all authors*, one by one. That would be overkill for Wikipedia. It's just a reminder that the little scale of the project works in our favor, I think.

Milimetric (WMF) (talkcontribs)

It's really important to us to get metrics that work for all projects, so thank you for the insight. So I think I understand what you mean about sub-page edits belonging to their main page, but does that influence the kinds of metrics we're looking at? I figure numbers like "active editors" would be the same, while numbers like "pages created" might be different based on how we count (just the main pages or all subpages). If you have time, and no pressure, but if you have time, we would love to have a chat and find out more about which specific metrics would be useful for wikisource and which are wrong or could be adjusted. Thanks again either way.

Billinghurst (talkcontribs)

For main ns we structure works to be sequential in subpages, so they basically link, as mentioned there is a relationship to a book. So thinking of the top of my head ...

If we are talking about a novel a reader will/should read these in chapter order, presumably from start to end.

  • We don't know how many people start a work and wind their way through to the end, or read a bit and then stop (so are all suubpages read, or how far are people progressing)
  • We don't know whether a search for a work throws people to a subpage, and they work their way back to the top. (so what is the landing point for a work)
  • We don't know how people arrive to read literary works, external search, internal search, or links
  • We don't know whether for literary works whether we need a search that presents top level pages only, or whether we should also present chapters (chapters that are named creatively Chap. 1, 2, ...n)
  • Counts matter for the work
  • Long pages versus short pages, does a long chapter scare them away?
  • Collectively does poetry get read, or is it our novels?

For a biographical/encyclopaedic work it is more likely that they will dip in and out, often back to the main page before delving down, looking for something

  • We don't know whether our biographical works are entered from a search engine externally, or internally, or from Wikipedia links; or from a biographical entry, or from the root of a work.
  • For something like the 63 volume DNB with its 000s of biographical works (not following a normal naming pattern) how many of this are read, so collection data

For official works, we can guess that the US records are most used, but who knows. If we categorised these works differently, ie. used wikidata cross-referencing what can it tell us about what our visitors want to read, or what they read? Does WDering assist our analystical skills?

How effective is our namespace structure? Are pages found, from where, and are links followed?

  • author pages (lots of curatorial work)
  • portals (some curatorial work)
  • categories (where we don't do much work)

Tell me the % of pages visited per namespace, or maybe dwell time.

Can you tell us which works are read in mobile, versus which are read by desktop. Or are they all read in mobile, and we need to rethink our presentation componentry? Do vistors just arrive via mainpage?

Lots of questions for which there haven't been answers. I could not tell you that I know anything about how a visitor arrives and travels through our sites.

Samwilson (talkcontribs)

Aubrey raises some good points here. A "book's" edit (and viewership) count could be the summation of all edits done on all of its pages (including subpages) in main, Index, and Page namespaces. That's probably quite hard, and certainly different from other wikis (although, similar to Wikibooks in some ways).

At the very least, is it possible to customize things like where the metrics refer to 'articles'? Because Wikisource has 'books' as the most insteresting unit of how-many-things-do-we-have — and doesn't really have 'articles' at all, or if it does then other namespaces should be included in that count.

We could bring this up at the next Wikisource hangout.

Milimetric (WMF) (talkcontribs)

Thanks @Samwilson, we will see what we can do about making a project-dependent notion of "article". I guess it would have to have a customizable definition and name and in wikisources's case refer to the top level book. Is this the exact same in wikibooks? I guess I should spend some time getting more familiar with all our projects.

Samwilson (talkcontribs)

Yes, I think you could probably reasonably accurately not worry about particular projects, but just work with the concept that subpages can be considered "part of" their parent page and included as such in some sorts of metrics. That's a concept that's tied to MediaWiki design, and not particular communities (if you see what I mean?). :-)

But most of all, thank you for working on this stuff! It's brilliant.

Aubrey (talkcontribs)

Hi Millimetric, I don't have any problems for a chat, the only issue is how and when ;-) Were I live the connection is not excellent, but an hangout/skype is probably doable.).

Jayantanth (talkcontribs)

I don't know where is the right place to discussed about the stats of Number of Article( NS0) and its sub page. The sub-page structure is depends on book structure. Some Wikisource have used one word one sub-page for 60000 word dictionary. So Subpage should not be counted as Total Numbers of Stats. It should be counted and presented in nested with its main page. And its most needed for Wikisource, stats for proofreading ( validate, proofread, No text and Problematic )of each user.

Milimetric (WMF) (talkcontribs)
Billinghurst (talkcontribs)

I would say that phetools is an internal reflective piece of data, and maybe a little bit of interwiki tease.

Milimetric (WMF) (talkcontribs)

@Jayantanth, see the discussion just above, Aubrey had the same thought. From the technical point of view, this can be tricky, but I promise to do my best :)

Erik Zachte (talkcontribs)

For comparison: a higher aggregation level also exists on wikibooks (probably different). I built a set of reports to cover that aspect of Wikibooks. But I have to say few ever used it or at least I received hardly any feedback. https://stats.wikimedia.org/wikibooks/EN/WikiBookIndex.htm Maybe I didn't present the things people cared about. Or maybe people didn't know it was there.

Samwilson (talkcontribs)

That's pretty cool! :) Will it still be updated? I guess it's harder to do page-views of each book. (Or did I mis-read the page you linked?).

Erik Zachte (talkcontribs)

Page views per book, could be done, but it would require custom code invoking the page view API. And sorry I have no intent to maintain that code. I only gave the link to serve as example of what per book metrics could entail. Right now the large index of chapters colors each chapter title based on amount of text in the chapter.

Aubrey (talkcontribs)

Another stats that would be cool: number of active users per day. I can't stress enough the fact that for sister projects we can think of stats that are maybe meaningless or too expensive for Wikipedia. Also, we love to have graphs/charts, to better understand the picture.

MusikAnimal (WMF) (talkcontribs)

If you just want the pageviews of the current page and subpages, you could try Massviews with "Subpages" set as the "Source". E.g. http://tools.wmflabs.org/massviews/?platform=all-access&agent=user&source=subpages&target=https%3A%2F%2Fen.wikibooks.org%2Fwiki%2FMuggles%2527_Guide_to_Harry_Potter&range=latest-20&sort=views&direction=1&view=list [Edit]: My comment is merely to help you if you need this data right now. I do not work on Wikistats and do not mean to portray Massviews as some sort of replacement :)

Reply to "Wikisource stats"