Community health initiative/Metrics kit
The Trust and Safety team, in collaboration with other teams within the Wikimedia Foundation, researched a Community Health Metrics Kit to help volunteers understand the relative health of their communities. We invite you to read more about our plans here, and to give your opinions on the talk page. |
The Community Health Metrics Kit is a project being investigated by the Trust and Safety team, in collaboration with the Community health initiative, at the Wikimedia Foundation. The ultimate goal is a public suite of statistics and data documenting the relative health of Wikimedia communities on a per-project basis. This project was researched in the 2018–19 financial year, and will inform further development of community metrics in the future.
Background
[edit]As a movement, Wikimedians have always measured aspects of their communities. Data points such as editor activity levels, new users, and editor retention have been regularly collected. While these metrics provide some useful indications about the health of a project, they do not give major insights into challenges and specific areas needing improvement. The Community Health Initiative wants to build on the metrics work already done by individual contributors, affiliate groups, academics and researchers, and the Wikimedia Foundation.
This project has two primary goals:
- to have regularly updated quantitative statistics that provide useful insights into aspects of a community’s health, and
- to provide better qualitative options for finding insights that can’t effectively be measured through quantitative approaches.
This kit and the data it will contain will be targeted towards two major audiences:
- The local and global communities, in order to ascertain the relative health of their projects and to more easily identify the early signs of an unhealthy community; and
- The Wikimedia Foundation, who may make use of the data (now in one centralised place) to better identify trends in community health and direct development efforts towards solving problems.
Where we are now: Consultation
[edit]This page is outdated, but if it was updated, it might still be useful. Please help by correcting, augmenting and revising the text into an up-to-date form. |
Date | Stage |
---|---|
July–August 2018 | Metrics definitions; brainstorming; internal publicising |
September 2018 | Design involvement; determining location; community consultation on metrics |
October–December 2018 | Further work on design; community discussions to continue |
January–March 2019 | Technical implementation, prototyping |
Before end of June 2019 | Launch of Metrics Kit |
This project is currently at the design and consultation phase. At this point, we have a rough idea of who this project is for (local community members interested in judging the relative health of their communities/projects, and Wikimedia Foundation staff interested in monitoring this health for things like tool or policy development). We also have a rough shortlist of metrics we'd like the kit to include. That is where we need your help.
How to give feedback
[edit]While we have set up the framework for this and put down our ideas, this project will suffer without the community's knowledge and expertise. Here, we are asking you to give us feedback on our ideas and to offer your own.
Please use the discussion page of this project for your feedback. To make it easier, we have set up a number of sections there for the aspects of this project about which we are most excited to hear your opinions. One of those sections is titled Other feedback, because we might miss things otherwise. Please feel free to leave your comments and suggestions in one, several or all of the sections! Thank you in advance for your thoughtful insights!
How to engage deeper
[edit]In addition to asking for feedback on what we do, we are also looking for volunteers who like to work more closely with us on the project. There are several ways you can engage, if you are interested:
- You can work with us to refine the way we gather the data in one or more fields for all the Wiki communities
- You can help find bugs and mistakes we may make
- You can get our support to understand your own community better through community health metrics
- You can help spread the knowledge about those new metrics throughout the wikiverse
If you are interested in investing time into this project on one or more of those issues on a mid to longterm basis, please sign up under the corresponding section of the discussion page!
How we will use the results
[edit]Your feedback at this stage will directly affect our thinking with regards to this project. As one of the major audiences for this work, the community's insight is naturally invaluable. As such, it will all be taken aboard and used to guide the future of this project as we focus on its design and usecases.
Metrics
[edit]We're currently looking at metrics that reflect community health. These will be informed by comments and suggestions made during this consultation, as well as by individual user interviews and other design research methods. Right now, we're looking at metrics reflecting things like:
- Active administrator and user statistics;
- Backlog statistics (compared with the number of people working on them);
- Statistics already used to compare Wikimedia projects with each other (such as Wikipedia article depth);
- Rates of vandalism, blocks, and other administrative tasks;
- Other metrics we haven't yet come up with :)
Design
[edit]We've spent a little time thinking about how this kit might ultimately look and be used by members of the community. Our requirements for this project are that it is:
- Findable: It should be easy for community members to find and make use of in their work.
- Readable/Accessible: It should be simple to read and absorb the information it is attempting to impart.
- Up-to-date: It should be updated live (or at least regularly), automatically and with minimal maintenance.
- Translatable: It should be available to users in as many languages as possible (ideally beginning with the UN standard languages), and be available for users to supply translations where none exist.
Right now, we are thinking of hosting this data on Toolforge, as that option would cover most of the above requirements. We are aware that many of the metrics points we'd like this kit to hit are already collected by Wikistats, and we imagine the finished product would likely turn out quite similar to this in look, feel, and target demographic.
Original plans
[edit]Building on existing work
[edit]Some top-level community insights are part of the WikiStats portal. This platform provides data on:
- Total unique devices
- Top-viewed articles
- Newly registered users
- New pages
- Edits
- Editors
- Edited pages
- Net bytes difference
- Absolute bytes difference
The Community Engagement Insights survey provides a window into a number of community health aspects, and provides useful demographic information as well.
Individual projects and contributors have used various approaches and API queries to gain specific insights about specific workflows and issues. Examples include the English Wikipedia’s AdminStats and this analysis of Adminstrator numbers and ratios on the different Wikipedias.
There is a large body of past research projects that deal primarily with topics of community health, or look at related aspects and issues. We have collated some of that here in this table; if we have missed something, please add to this subpage!
Research name | Date begun | Date published | Qual. or Quant.? | Topic | Contact/Author | Relevance |
---|---|---|---|---|---|---|
How much do Wikipedians value editing Wikipedia? | 2017-12 | 2018-05 | Qualitative | Editor satisfaction | Jana Gallus; Avinash Gannamaneni; Aaron Halfaker | High |
Growth and diversity of Technology team audiences | 2017-11 | 2018-05 | Qualitative | Technical contributors satisfaction | Jonathan Morgan | High |
Holder Administration in anderen Sprachversionen AdminCon 2018 | N/A | 2018-03 | Quantitative | Administrators | User:Holder | High |
The Keilana Effect: Visualizing the closing coverage gaps with ORES | N/A | 2017-08 | Quantitative | Gender gap | Aaron Halfaker; Emily Temple-Woods | High |
Wikimedia CH Community Survey 2016 | 2016-12 | 2017-02 | Qualitative | Community Survey | Wikimedia CH | High |
New editor support strategies | 2016-05 | 2016-10 | Qualitative | New Editors | Jonathan Morgan; Joe Matazzoni; Pau Giner | High |
Becoming Wikidatians: evolution of participation in a collaborative structured knowledge base | 2016-05 | 2016-10 | Qualitative | New Editors | Alessandro Piscopo, et al. | High |
Impact of The Wikipedia Adventure on new editor retention | 2014-11 | 2016-10 | Qualitative | New Editors | Sneha Narayan, et al. | Medium |
Online Community Conduct Policies | 2016-03 | 2016-06 | Qualitative | Conduct policies in other online communities | Patrick Earley; Karen Brown | Medium |
Teahouse long term new editor retention | 2015-11 | 2016-05 | Qualitative | Editor satisfaction | Rebecca O’Neill | High |
Survey of user in Ireland | 2015-09 | 2016-05 | Both | Editor retention | Jonathan Morgan; Aaron Halfaker | High |
Wikimedia Nederlands survey (2015) | N/A | 2016-04 | Qualitative | Community Survey | Wikimedia Nederland | High |
Wikimedia CH Community Survey 2015 | 2016-01 | 2016-02 | Qualitative | Community Survey | Verena Linder | High |
Wikimedia Deutschland Editor Survey 2016 | 2015-06 | 2016-02 | Qualitative | Community Survey | Wikimedia CH | High |
Monthly wikimedia editor activity dataset | 2015-04 | 2015-08 | Quantitative | Editor activity | Aaron Halfaker | Medium |
Active editor spike 2015 (July update) | N/A | 2015-07 | Qualitative | General - Community Health | Haitham Shamma; María Cruz | High |
Community Health learning campaign | 2015-07 | 2015-07 | Quantitative | Editor activity | Aaron Halfaker | Medium |
VisualEditor's effect on newly registered editors (May 2015 study) | 2015-06 | 2015-05 | Quantitative | New editors | Aaron Halfaker | Medium |
Active editor spike 2015 | 2015-03 | 2015-04 | Quantitative | Editor activity | Aaron Halfaker; Dario Taraborelli | Medium |
Global South User Survey 2014 | 2014-09 | 2015-02 | Qualitative | Community Survey | Haitham Shammaa | High |
Talk Sentiment Analysis: Editor Retention and Editing Prediction | 2014-10 | 2014-12 | Qualitative | Editor retention | Sergio Martinez-Ortuno; Lars Roemheld; Deepak Menghani; Leila Zia | Medium |
The Role of Wikipedia Mentorship Programs in the Newcomer Experience | 2014-07 | 2014-12 | Qualitative | New Users | Gabriel Mugar with Chris Schilling and Aaron Halfaker | Medium |
Wikipedia Gender Inequality Index | 2014-06 | 2014-12 | Quantitative | Gender Gap | Piotr Konieczny | Medium |
Communicating on Wikipedia while female | 2014-11 | 2014-11 | Qualitative | Gender gap | Laura Hale | High |
Framing Support for Newcomers | 2014-07 | 2014-10 | Qualitative | New Users | Yla Tausczik; Jonathan Morgan | Low |
Asking anonymous editors to register | 2014-03 | 2014-10 | Qualitative | New Users | Aaron Halfaker | Medium |
Non-finite Processes in Human Social Phenomena | 2012-10 | 2014-10 | Qualitative | Cooperation | Simon DeDeo | High |
Women and Wikipedia | 2014-01 | 2014-09 | Qualitative | Gender Gap | Amanda Menking; Jonathan Morgan | High |
Women and Wikipedia: Contributions in a Collaborative Online Space | 2012-03 | 2014-05 | Quantitative | Gendergap | Melanie Kill | Low |
Wikimedia Community Visualization | N/A | 2013-12 | Quantitative | Editor activity | Haitham Shammaa | High |
Newcomer survival models | 2013-09 | 2013-09 | Qualitative | Editor retention/New editors | Aaron Halfaker | High |
Onboarding new Wikipedians/OB6 | 2013-09 | 2013-09 | Quantitative | New Editors | Aaron Halfaker | Low |
Exploring editing dynamics in different language Wikipedias – Towards a substantive grounded theory | 2013-05 | 2013-08 | Qualitative | Social dynamics | Pasko Bilic | High |
Getting to know the grassroots | N/A | 2013-07 | Qualitative | Community Survey | Wikimedia Nederland | High |
VisualEditor's effect on newly registered editors/June 2013 study | 2013-06 | 2013-07 | Quantitative | New Editors | Aaron Halfaker | Low |
Gender micro-survey | 2013-06 | 2013-07 | Quantitative | Gender | Howie Fung | Low |
WikiProjects as virtual Teams | 2013-01 | 2013-07 | Quantitative | Social dynamics | Jonathan Morgan | High |
Notifications | 2013-05 | 2013-05 | Quantitative | Impact of Technical Feature | Aaron Halfaker | Medium |
Anonymity and conformity over the net | 2012-01 | 2013-03 | Quantitative | Effects of anonymity | Aaron Halfaker | Medium |
Post-registration editor survey | 2012-11 | 2012-12 | Quantitative | Editor motivation | Steven Walling | Medium |
Wikipedia Editors Survey 2012 | 2012-06 | 2012-11 | Both | Community Survey | Tilman Bayer | High |
Wikipedia Editor Satisfaction Survey | 2009-09 | 2012-10 | Both | Editor satisfaction | Denis Barthel; Manuel Merz | High |
MoodBar/Email confirmation | 2012-07 | 2012-08 | Quantitative | Communication | Dario Taraborelli | Low |
Editor Lifecycles | 2012-06 | 2012-07 | Quantitative | Editor retention | Shilad Sen | High |
Master Thesis Cultural differences in motivations to contribute to Wikipedia | 2012-05 | 2012-07 | Quantitative | Editor Motivation | Sjarlot Stal; Nick Geurts | Low |
Editor milestones | 2012-05 | 2012-07 | Quantitative | New Editors | Shilad Sen; Aaron Halfaker | Low |
Shallow Entry: A problem-centric approach to new editor orientation | 2012-04 | 2012-07 | Quantitative | Editor Motivation | Maryana Pinchuk | Medium |
[1] | 2012-04 | 2012-06 | Quantitative | Editor Motivation | Aaron Halfaker | High |
Motivations to Contribute to Wikipedia | 2012-03 | 2012-06 | Unclear | Editor Motivation | Audrey Abeyta | Low |
Necromancy | 2012-04 | 2012-05 | Quantitative | Editor Retention | Steven Walling | Medium |
Wikipedia Editors Survey 2011 November | 2011-04 | 2011-11 | Quantitative | Community Survey | Barry Newstead | High |
Women and Wikimedia Survey 2011 | 2011-05 | 2011-10 | Qualitative | Gender gap | Sarah Stierch | High |
Summer of research 2011 | N/A | 2011-08 | Both | mostly New editors, but also Editor retention | Medium | |
Wikipedia Editors Survey 2011 April | N/A | 2011-04 | Quantitative | Community Survey | Mani Pande | High |
Expert participation survey | 2010-12 | 2011-02 | Quantitative | Expert participation | Dario Taraborelli | High |
Hebrew Wikipedia satisfaction survey | 2007 | 2007-07 | Qualitative | Community Survey | Wikimedia HE | High |
100 Questions for Wikipedians | 2005 | 2005-02 | Qualitative | Community Survey | Aphaia | Medium |
Teahouse group dynamics | 2017-05 | Ongoing | Quantitative | Impact of harassment | Yiqing Hua; Dario Taraborelli; Leila Zia; Lucas Dixon; Nithum Thain; Cristian Danescu-Niculescu-Mizil | Medium |
Detox | 2017-01 | Ongoing | Quantitative | New Editors | Ellery Wulczyn | High |
Study of harassment and its impact | 2016-05 | Ongoing | Qualitative | Social dynamics | Jonathan Morgan | Medium |
Good Faith Newcomer Prediction | 2016-05 | Ongoing | Qualitative | Social Dynamics | Anna Filippova, et al. | High |
Understanding the Dynamics of Hackathons for Science | 2016-04 | Ongoing | Qualitative | Time commitment and workflow | Kevin Schiroo | Medium |
Measuring editor time commitment and workflow | 2016-01 | Ongoing | Quantitative | Toxic language | Ellery Wulczyn | Medium |
Beyond the Gender Gap: Understanding Women's Participation in Wikipedia | 2015-10 | Ongoing | Qualitative | Gender gap | Danielle McDonald Corple | Low |
Online harassment resource guide | 2015-07 | Ongoing | Qualitative | harassment | J. Nathan Matias, et al. | Medium |
Editor Behaviour Analysis & Graphs | 2015-06 | Ongoing | Quantitative | Editor statistics | Jeph Paul | High |
Wikipedia Equality Sweden | 2014-10 | Ongoing | Quantitative | New Users | Aaron Halfaker | Low |
HHVM newcomer engagement experiment | 2014-08 | Ongoing | Qualitative | Gendergap | Björn Helgeson; Jonathan Morgan | Medium |
The sudden decline of Italian Wikipedia | 2014-05 | Ongoing | Quantitative | Editor Activity Decline | Nemo with Aaron Halfaker | Medium |
Dynamics of Online Interactions and Behavior | 2012-01 | Ongoing | Both | Roles and Functions | Ofer Arazy; Dario Taraborelli | High |
Wikipedian and Internet addiction | 2010-09 | Ongoing | Both | Social dynamics | Dario Taraborelli | Medium |
Measuring editor labor hours | 2016-04 | N/A | Qualitative | Addiction? | OppidumNissenae; Geoide; Alexmar983 | Low |
Roles and Functions within Online Production Communities | 2015-08 | N/A | Quantitative | Editor activity | Aaron Halfaker | Low |
Focus areas
[edit]We began this project by looking at what is already collected and what isn’t, and identifying which data points and areas should be prioritised for this project. This will involve both collecting new metrics and using existing metrics differently (e.g. providing new ratios and comparisons of already-collected information). Some of the broad areas we looked at to include in the Metrics Kit:
- Demographics
- Administrator statistics
- User statistics
- New accounts
- Activity levels
- Retention rates
- Prevalence of vandalism, trolling, harassment
- Abuse filter, vandalism reports
- Cultural, ethnic, gender barriers
- Content bias
- Language barriers
- Access to tools
- Internet access
- Libraries and knowledge resources access
Future plans: More and easier qualitative surveying
[edit]Building on the Community Health initiative's analysis of the English Wikipedia's Administrator's Noticeboard/Incidents. This project surveyed users on their experiences with an important noticeboard. We want to do better survey assistance - helping communities survey themselves through easily adaptable survey forms, good techniques for getting respondents, and help analysing the results.