Analytics/Visualization, Reporting & Applications/GlobalDevFeedback

Laundry list from of notes from my first full graph creation experience
Note: I was creating this graph: http://global-dev.wmflabs.org/graphs/ar_wp
 * Automatic sorting would be great (e.g., I was making a stacked line chart, and I want the smallest to go on the bottom, but I had to do it all by eye/hand for ~20 rows)
 * Coloring of lines: if you add too many fields into the graph, the color swatch does not reloop, it just starts making the lines/segments black! (note: if I clicked the color swatch, it would then apply a color, but if I did not, it would display as black)
 * automatic saving: yes - i learned the hard way that this wasn't like google docs :)
 * Confirmation of successful "save": it would also be great to get a little confirmation at the top if my saving attempts are successful
 * significant figures: can't find a way to change the sig figs that are being displayed (e.g., so, instead of "300.0" to just have "300," since editors are always a whole number)
 * Similarly, once the numbers get above 1000, they start to round ("1.2K"). Is there a way to make this optional, and/or eliminate it? For the smaller numbers our languages are working with, the exact number matters a lot (e.g., if the number of active editors is 1549 in July vs 1551 in August, which currently would just report as "1.5K" and "1.6K").

Overall, I'm really glad I have something to log feedback about - thank you for building this :) Jwild (talk) 21:05, 16 August 2012 (UTC)

Some future visualization suggestions

 * 1) MAPS! We need to be able to look at saturation of editors by geography
 * 2) Bar charts (Cluster, stacked)

Global Development Limn Use Cases
In order for Limn to replace current ad-hoc workflows for data analysis and visualization for Global Development projects, it needs to support the most common and important activities that those projects perform with data. In order to make sure Limn does what we need it to to work for us, rather than trying to make it do everything, we will attempt to create strong, representative use cases and actionable recommendations for the Limn team.

Please feel free to contribute the following:
 * additional use cases
 * additional examples (links, scenarios, or thumbnail images) for exiting use cases
 * edits to the existing use cases to improve their clarity, or make them more actionable
 * comments on the talk page

Note: If you feel that any of these use cases do not accurately reflect Limn's current capabilities, Global Development's needs or a realistic set of expectations for how Limn might be able to meet those needs, please improve them by editing or commenting.

UC #1: Annotating visualizations
Currently, it is not possible to annotate a chart or graph directly in Limn. The ability to annotate charts is especially important when the charts are being used to communicate research findings to a broader audience: such as in a project status report or a presentation slide deck.
 * Examples
 * See this presentation on Arabic Wikipedia and this presentation on Portugese Wikipedia for an example of Limn visualizations that have been exported to PNG and annotated in a presentation software tool. Supporting direct annotation of Limn charts would make it easier for project teams to have presentations and discussions around their data without exporting it from Limn.
 * Recommendations
 * Provide a palette of basic chart annotations such as callout (text) boxes, arrows, as well as brackets (i.e. {, [), highlight boxes and highlight shading for emphasis. Additional options may be added as the need arises.
 * Comments

UC #2: Grouping related data
Currently, it is not possible to visually 'group' similar data. Since Limn will frequently be used to compare/contrast cohorts of editors under different conditions (different edit counts, different wikis, etc), allowing users to present different cohorts are related or different would be useful.
 * Examples
 * I want to visualize the number of scholarships (broken out into partial and full scholarships) across WMF, WMFUK and WMFR. I would like to be able to easily compare the relative proportion of full scholarships, and the total amount of money disbursed across the three different chapters.
 * Recommendations
 * Limn should support visualization of categorical data with stacked and/or clustered bar charts.
 * Comments

UC #3: Including rich contextual metadata and links to related resources
Limn currently supports creation of one caption per visualizations. While this is a very useful feature, it is often desirable to make more textual information available to aid interpretation of a visualization by readers. Two types of information that can be especially useful for interpreting visualizations are contextual metadata (information about individual variables, axes or sample categories) and external links. In the "Active Editors by Global North/South" chart on the Global Dev Dashboard, it would be useful to be able easily find out which countries are included in the Global South sample. In this case, a short description of the sampling criteria could be made available as a tooltip when the user hovered over the words "Global South" in the key (contextual metadata), and a link to a wiki-page which contained a full list of global south countries could be provided in the graph's caption.
 * Examples
 * Recommendations
 * In order to avoid cluttering the interface, provide a mechanism for adding short descriptive metadata in way that is only visible when a user wants to see it. Tooltips are a common technique for providing such contextual information, but other techniques (accordion menus, lightboxes) may also be used.
 * To allow Limn visualizations to be connected with other information resources (such as study documentation pages), allow users to easily add hyperlinks to the chart caption field.
 * Comments

UC #4: Saving a version of a report card visualization
The ability to create Report Cards is one of the most powerful features of Limn. However, these data change over time. If the report card is set to provide a moving 'window' on a continuous dataset (such as edits over time), earlier data might even be pushed off the visualization completely as time marches on. What if a project members want to capture the state of a visualization in a report card at particular point in time in order to refer to it later? Currently, it seems as though exporting that visualization to an image format is the best way to accomplish this. But that renders the visualization static and severs its link to the source data, making it less useful as an analytic tool.
 * Examples
 * Many Global Development projects generate benchmark visualizations: graphs of a particular dataset at a particular time (like the beginning of the project, or for inclusion in an incremental status report). In such cases, it is useful to be able to maintain a stable, interactive visualization of that dataset separate from the regularly-updated version that exists within the report card.
 * Recommendations
 * Limn already appears to have some built-in version control mechanisms, and allows for the creation of one-off visualizations. Building a user-facing feature into the Report Card interface that allowed a visualization to be frozen and made available under a separate, unique URL along with its underlying data file would facilitate the use of Limn visualizations in maintaining a rich, interactive record of project benchmarks that could be re-visited and explored at a later date.
 * Comments

UC #5: Discussing an individual visualization
Currently, it is not possible to comment on a particular visualization in Limn. If a researcher wants to capture feedback or wider input on the data, they can export the visualization and upload it to another source (such as a wiki page). However, discussions around data can be less productive if the data isn't readily at hand. Therefore, it could be useful to provide a mechanism to allow users to comment on a visualization within Limn itself.
 * Examples
 * The comment thread on Wikimedia blog posts (like this one!) allows viewers to leave comments while looking at source data. Allowing similar threaded discussion around specific visualizations (in a report card or a benchmark chart) could lead to more focused discussions, grounded in empirical data rather than opinion.
 * A member of E3 is looking at a visualization of edit sessions by Teahouse visitors, generated by a member of a Global Dev project. They want to know if the Global Dev researchers used the same edit session criteria that they themselves do, to facilitate comparison with their own findings. An embedded commenting system would make it easier to track questions like this, and their answers.
 * Recommendations
 * More discussion about the purpose of Limn visualizations is needed before the value of this use case can be assessed. Some of the obvious trade-offs and dependencies are:
 * should anyone be allowed to comment?
 * Should comments only be allowed on 'frozen' data samples (to avoid losing the context of the comment when a visualization is updated).
 * how well will this scale to a large number of comments?


 * Comments

UC #6: Creation and editing of custom cohorts by end users
Evaluating the success of event-based initiatives or offline programs involves tracking contribution/activity/retention by event attendees/program participants--and potentially even those affiliated with with participants. In some cases, a participant list is available before the event; in other cases, it is not possible to generate a list of participants (and connect that list with the participants' online presence, such as a user account or a geographic location) until after the event is over or program is already in progress. In order to demonstrate impact, project leaders or grantees need to be able to create custom cohorts, track the overall contribution/activity/retention metrics for those cohorts over time, and update them.
 * Examples
 * A professor has taught three different classes of college students using MDennis's Article Assignment Module or Coursera's Wikipedia Challenge module. Accurate lists of participating students will not available until after the courses have concluded: assume some students will have dropped, others joined, during the semester. A Global Ed project leader needs to create a custom cohort and track total number of edits, number of articles created, total contributions by namespace and months-editing for all three classes combined. This cohort may need to be updated at a later date (for example, adding students from the professor's Wikipedia class for the subsequent semester) to demonstrate the sustained longitudinal impact of this course or the instructional materials.
 * A grantee has put on a conference to boost participation on a small language wiki (like this one 2012 Armenian WikiConference). The grantee may not have the local infrastructure necessary to Usertagging participants at their conference: they may lack access to database software, or the technical expertise to generate a digital list of participants' usernames beforehand. A Global Dev representative agrees to assist the grantee in creating a custom cohort after the event has ended in order to assess the impact of the event on new accounts created, content contributed and # active editors on that wiki. However, because an accurate list of attendees was not created at the time of the conference, it is difficult to determine which edits and user accounts to assign to this cohort.
 * Recommendations
 * Allow end-users to create custom cohorts and upload them to Limn in an easy-to-use spreadsheet, webform or wiki format.
 * Make these custom cohorts persistently available.
 * Allow users to select a standard sub-set of analyses/metrics to track for these custom cohorts over time (pageviews, # active editors, bytes added, etc.)
 * Allow custom cohorts to be created post-hoc.
 * Allow custom cohorts to be created around identifiers other than usernames or pre-determined user tags. Possible candidates might be IP addresses, pageview/timestamp data, edits to particular wiki-pages, or some triangulation of these sources.
 * Allow custom cohort datasets to be edited (adding or removing members, fixing inaccurately entered usernames, usertags or IP addresses) within Limn, or allow end users to edit custom datasets offline and re-uploaded them (replacing the original) without having to specify their desired metrics all over again, or having to create a whole new report card.
 * Comments
 * "this is the kind of analysis I would have loved to be able to get in a self-service system, where I could define a time range, identify the datasets (target projects), pick pre-programmed metrics from a menu, and get data and graphs. If I had such a tool, I'd be using it almost every day." ~ Asaf Bartov (email to Jtmorgan)
 * Post-hoc cohort identification can be challenging, and triangulation from multiple data sources is tricky and may be out of scope for the interface. However, facilitating some form of usertagging by staffers without specialized technical expertise, and making those data easy to get into and out of Limn seems reasonable and would be very useful for impact assessments.

UC #7: Comparing editing activity by a single cohort across Wiki projects
Many events (such as edit-a-thons, translation drives, or WLM), involve getting users to contribute to two or more projects. The success/impact of such initiatives can only be accurately measured if contributions to both wikis are taken into account.
 * Examples
 * The 2012 Armenian WikiConference encouraged participants to both edit Armenian Wikipedia and upload relevant media to Wikimedia Commons. This is a grant-funded project, and the grantees need to be able to demonstrate the impact of the conference on participation (new editors, active editors on hy.wiki), and on contribution (hy.wiki articles, commons media file uploads) during August-November 2012. Currently, Global Dev lacks effective mechanisms to visualize contributions by a particular cohort (identified by username, geographic location, IP address, ???) across multiple wikis, making it much more difficult to quantitatively assess the full impact of the event, and therefore very difficult to effectively evaluate the success of the grant.
 * Recommendations
 * Allow contributions by "custom cohorts" to be tracked across wikis.
 * Allow contributions-over-time by "custom cohorts" across different Wikis (and even different namespaces on the same wiki) to be easily overlayed in the same graph to facilitate direct comparison.
 * Comments
 * Many Grantmaking/Program/Global Dev initiatives involve contribution across wikis, and would benefit from the ability to track and visualize contributions by the same users across projects. How do we link user contributions across wikis, for users who do not have unified login? How does the usertagging proposal, as it's currently scoped, help address this?