Talk:Wikistats 2.0 Design Project/RequestforFeedback/Round1/Site architecture

About this board

Wikistats 2.0 Design Project/RequestforFeedback/Round1#Site Architecture describes our design thinking for this part of Wikistats 2.0. This board is meant to collect feedback from current and potential future users of this community tool.

https://stats.wikimedia.org/EN_Africa/Sitemap.htm rotting quietly

4
Kipala (talkcontribs)

Hi guys, why is nobody updating this? I used to use it over and over but now it has become useless.

~~~~

Milimetric (WMF) (talkcontribs)

Hi @Kipala, the new Wikistats has been live for a while and has a lot of the old statistics being updated hourly or monthly, depending on what kind they are. Can you say more about what you found valuable on this page? We can then add it to our roadmap to include in the new version of Wikistats. You can browse here to see what's already available: https://stats.wikimedia.org/v2/#/all-wikipedia-projects

Kipala (talkcontribs)

I have done a bit of language comparisons on African language versions. For wiki purpose these languages have a lot in common (mainly/more used orally, speakers educated in different languages like en-fr-pt). So I find it helpful to see them at a glance, and how article count - editor figures - views (as langviews per country compared to other official language, PLUS countryviews per language from countries; many AF countries have large view numbers from outside) develop

Kipala (talkcontribs)

See my interest here and here. Helps me to get a better feeling who our readers are, and how we can reach them better. If we could look up WHO (from where? as with our small figures we have so many hits from outside - some AF language absolute majority of views from USA/EUR) is reading which articles, it can help to aim better for our target group interest.

Reply to "https://stats.wikimedia.org/EN_Africa/Sitemap.htm rotting quietly"

Do Contributing, Reading, and Content work as top level categories?

10
Milimetric (WMF) (talkcontribs)

If not, what structure do you think would better organize Wikistats metrics and content?

NickK (talkcontribs)

Yes, sounds (almost) good. Contributing and Reading might have some interactions however, e.g. I may want to compare # of edits and # of pageviews

Strainu (talkcontribs)

Yes.

Baba Tabita (talkcontribs)

Works for me.

MCruz (WMF) (talkcontribs)

Works for me, it makes a lot of sense.

EGalvez (WMF) (talkcontribs)

@Milimetric (WMF) - from a WMF teams point of view, I need think about what my goals are/what I would need to come to the stats page for. I typically come here to get a general sense of population statistics about the projects. One of the things that is often missing, for example, is number of editors by country or region. I am not sure if/how that fits into your design?

Another thing I could think about that I would really like to know that is important for our work: How many users are administrators or other special user types? This information is useful for having a high-level understanding of curation/governance side of the projects in terms of community engagement. Each project has a different list of user types as well. If this is data is too granular for stats.wikimedia.org, perhaps we might think of a separate dataset to get this data.

Milimetric (WMF) (talkcontribs)

@EGalvez (WMF), the WMF is not one of the primary audiences, to make sure Wikistats keeps its community focus. Editors by country is a sensitive metric because used in combination with other data we provide it could allow de-anonymization attacks.

Good note about community makeup in terms of user types. This will be interesting to look into, though probably hard to figure out cross-wiki.

Erik Zachte (talkcontribs)
Neil Shah-Quinn (WMF) (talkcontribs)

I think these top level categories are perfectly sensible.

Jan Dittrich (WMDE) (talkcontribs)

For me, the distinction between "Content" and "Contributing" was not clear initially. It now makes sense, though.

Reply to "Do Contributing, Reading, and Content work as top level categories?"
Neil Shah-Quinn (WMF) (talkcontribs)

Since I'm the data analyst on the Editing team, I'm generally responsible for movement-level metrics related to editorship. I have a lot of ideas for how to improve or expand these metrics (such as a session-time-based active editors count and a robust breakdown of edits by tool used to make them), which I hope to have time to develop at some vague point in the future :)

Will the Wikistats 2.0 structure be elastic enough to accept these kind of modifications if, after all due consultation, they are ready to be adopted?

Milimetric (WMF) (talkcontribs)

Definitely yes, but it would be nice to riff a bit on how that would look.

One thought is that new metrics should be carefully placed in the way we envision this structure with like top level categories and question-driven navigation. So like, if your metric answers a question better, it can be bubbled up as a metric/view in that screen. But if it answers a new question, vet that more and see who cares about the question.

Another is to have a staging area for new metrics and have them organically evolve into the Wikistats structure. We could allow everyone to edit the wikistats navigation structure on-wiki and manage it like we manage anything else. This was kind of the idea behind Dashiki in the first place.

Thoughts?

Neil Shah-Quinn (WMF) (talkcontribs)

I don't have a strong opinion about how exactly the change process should look, other than feeling there should be some stage for user consultation if the change is significant. Now that I know we can change the lineup, I think we can safely leave the question of process until I have (or someone else has) a specific change in mind.

I'm not really wild about editing the navigation on-wiki; in my experience with Dashiki, it adds complexity (as one more place you have to go to configure the board, in addition to the code repo itself) without much benefit (because you can't really understand or use the wiki page without understanding the code repo too). It's a good idea in theory, but in practice I don't think there's enough interest to justify it.

Milimetric (WMF) (talkcontribs)

Well, so far dashiki configs have been cryptic and undocumented. I think we should give it a chance once the Dashiki extension deploys and it's formatted a little nicer plus has a nicer editor. If that plus some good documentation doesn't help, then I'll agree with you and give up this notion. I do think it has merit where people from different communities on different wikis might want different default views for their projects on wikistats. Having control over that seems interesting.

Good on the user consultation, agreed. I mean the main point of this whole project is to be a useful tool with minimal noise. Whatever we have to do to make that happen, we'll do it.

Reply to "Adding new metrics"

Are the metrics and breakdowns you're interested in included here?

9
Milimetric (WMF) (talkcontribs)

We prioritized metrics that people spoke up for on Analytics/Wikistats/DumpReports/Future per report. Breakdowns allow us to put different types of data under the same metric, for example very active and active editors are different categories of editors. If breakdowns are hard to understand, let us know.

NickK (talkcontribs)
Strainu (talkcontribs)

Breakdown per country is definitely useful, as described by NickK. For wikis targeting a specific country, it is always useful to see where do people in that country go.

Baba Tabita (talkcontribs)

Same about breakdown per country - very interested in that! Otherwise, it's fine.

MCruz (WMF) (talkcontribs)

+1 to breakdown per country.

I would also like to see:

- downloads for media on Commons

- survival / rolling survival new active editors.

Erik Zachte (talkcontribs)

Breakdown per country / language reports are still operational Wikistats reports, and have already been updated to hadoop data stream. My understanding is that they are kept as is for now, and possibly migrated later.

https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryOverview.htm

https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm

https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm

These are all page views reports. Whenever a similar hadoop based data stream appears for page edits I can revive those as well.

FYI: as for Trends report: I discontinued https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryTrends.htm because some of the Year over Year fluctuations seemed too large to be plausible. Hopefully with new hadoop data things will improve (1:1 instead of 1:1000 sampling makes a difference, especially for smaller countries). But some major anomalies could (and maybe can) be explained best by MaxMind database lagging behind reality (swaps of ip ranges between top level providers, without informing MaxMind).

Neil Shah-Quinn (WMF) (talkcontribs)

I think there's a lot of interest in country-level information, so I definitely support including as much as we can within our privacy limitations.

I also think it would be great to see Wikistats include the editor model breakdowns of active editors by tenure; they add a lot of depth to the active editor numbers and are certainly more useful than newly registered users (since we care about new editors much more than new registrants who might have no interest in actually editing). On the other hand, this lineup of metrics is keeping pretty close to the existing Wikistats lineup, so I understand if you want to focus on getting the new design and architecture in place rather than on updating the metrics offered.

As a side note, what is "mean articles"?

Milimetric (WMF) (talkcontribs)

We definitely focused on the editor model and things like "new editors" more as part of the back-end of wikistats, the Data Lake and upcoming edit metrics API. So we'll have the support for those and more metrics. As far as including those perspectives into wikistats, my hope is that a community will build around the new interface and approach and they'll have that conversation. I don't see it as our place to decide, except to create the infrastructure and make it easy to execute.

"mean articles" is a placeholder name for "edits this month / total articles". It's how wikistats refers to it, but we'll think of something better.

Erik Zachte (talkcontribs)

Wikistats knows of mean edits ("total edits over all time / total articles") and mean bytes. Not metrics that are often quoted, but trivial to calculate.

Reply to "Are the metrics and breakdowns you're interested in included here?"
Neil Shah-Quinn (WMF) (talkcontribs)

I'm not sure which feedback page this fits on (or whether this exactly falls under the heading of "design"), but will Wikistats 2.0 include the ability to download tabular versions of the data shown on the graphs and dashboards?

I think there's a lot of demand for combing this data with other datasets; just to give one example, some editors on the English Wikipedia have been very concerned with low numbers of administrator promotions and have collected data on it. I could see them wanting to correlate this with data on the number of new active editors.

Milimetric (WMF) (talkcontribs)
Reply to "Tabular data"

Do you feel represented by the audience we are designing for?

5
Milimetric (WMF) (talkcontribs)

The primary audience for the Wikistats 2.0 design project are contributors (editors). Our secondary audience is community/project organizers and tertiary audience as media/the press.

NickK (talkcontribs)

Yes, I agree with this.

Strainu (talkcontribs)

Yes.

Baba Tabita (talkcontribs)

Yes, falling into your primary audience.

MCruz (WMF) (talkcontribs)

Yes, I work with community/project organizers.

Reply to "Do you feel represented by the audience we are designing for?"
There are no older topics