1 Introduction
Online services currently handle unprecedented amounts of user-related data [
129]. Machine learning algorithms extract value from large amounts of data by recognizing hidden patterns, links, behaviors, trends, identities, and practical knowledge, which has given birth to a “big data economy” [
9,
152]. This has opened a “Pandora’s Box” of privacy concerns [
113,
141,
151]. But the privacy policies that are meant to address these concerns are often lengthy, legally worded documents written to protect the provider [
15,
59]. Even the interactive permission system found on modern smartphones fails to provide a sufficient understanding of the privacy risks involved with using an application [
24,
39,
78].
To communicate privacy risks to users in a clear and concise manner, researchers, regulators, and industry have called for a more visual representation of how online services handle personal data [
14,
15,
55,
74,
122,
164]. Since 2001, the United States
Federal Trade Commission (FTC) has been encouraging standardized, tabular privacy policies similar to nutrition labels [
13]. The more recent European
General Data Protection Regulation (GDPR) also suggests using “standardized icons” to provide a meaningful overview of the intended data processing [
106]. The Digital Advertising Alliance displays a YourAdChoices button on their ads [
47] and the Entertainment Software Rating board has introduced icons indicating whether or not games share personal information with third parties [
68]. At the same time, a variety of privacy icons, labels, and notices designed to convey how personal data are handled have been proposed by researchers [
60,
62,
66,
77,
123,
145,
147] and industry [
63,
115]. However, these visualizations differ with regard to the privacy attributes they cover, as well as their level of detail. Furthermore, the comprehensibility and effectiveness of the visualizations remains questionable as most of them have never been tested with users [
122].
Whereas visual representations of privacy attributes are intended for users,
Privacy by Design (PbD) guidelines are intended for developers. They determine to a significant extent how user privacy is handled. Because developers are not privacy experts, they need clear and unambiguous instructions with regards to how personal data should be handled [
36], and they need to know which privacy attributes are considered important by users. While guidelines for what was once referred to as “fair information practice” go as far back as the 1970s [
64], technological developments have prompted a renewed interest in developing privacy-aware information systems [
127]. However, there is currently no generally accepted PbD standard or best practice. Rather, multiple regulators and industry stakeholders have each elaborated their own PbD principles that, similarly to privacy visualizations, differ significantly in terms of the privacy attributes they consider.
As a result, developers are confronted with diverging and sometimes contradictory guidelines and lack a universal privacy communication language that is understandable to end-users. We address this problem by systematizing knowledge surrounding privacy from relevant approaches in academia, industry, and government and by considering the opinions of both privacy experts and users to compile, validate, and rank a complete list of generally applicable privacy attributes. As a first step, a list of privacy attributes was derived by means of a systematic review of existing privacy visualizations and PbD principles. Second, this list was refined and extended in collaboration with information security professionals via interviews. Third, we distributed an online questionnaire among predominantly European privacy experts and users of online services, resulting in a ranking according to perceived importance from both perspectives. Finally, based on the results, we explain notable differences and patterns and identify trends. Together, our results form a foundation for understanding, communicating, and discussing privacy, and inform the development of user-oriented privacy-aware online services. We present practical recommendations for the development of future privacy visualizations and PbD guidelines, as well as outline research challenges toward facilitating the analysis and comparison of privacy policies and investigating the context-dependency of privacy attributes.
2 Background
The debate around privacy started in the late 19th century with the launch of the telephone and intensified throughout the “cybernetic revolution” of the 1970s [
94]. In his landmark 1967 book, Westin defined privacy as “the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others” [
153p. 7]. Fundamentally, modern privacy is about information [
31]. However, the concept kept expanding in both scope and significance with the emergence of the Internet, mass surveillance, terrorism threats [
151,
154], and, more recently, with the development of big data and the Web 2.0 [
9,
113,
141,
152]. Currently, privacy, and in particular online privacy, remains hard to define [
107] or, in the words of Solove, “a concept in disarray” [
136]. Smith, Dinev, and Xu [
133] notes that historically, privacy was seen as a right, a commodity, a control, or a state. Martin [
88p. 557] sees privacy as a “social contract around what, to whom, and for what purpose” information is gathered or disclosed within a given community and context. Nissenbaum posits that privacy is shaped by social boundaries and norms [
99] and, because individuals cannot provide truly informed consent, she suggests articulating context-specific norms that govern the collection and sharing of data online [
100]. According to her theory of “contextual integrity,” whether or not an action constitutes a violation of information privacy depends on variables related to the context, the nature of the information, the actors involved and their relationships to the data subject, as well as the terms for collecting and sharing information. Acquisti, Taylor, and Wagman [
5] discuss the economic value of privacy and also find that in some situations data sharing can be beneficial for the user, while in other situations it can be damaging. Nevertheless, in his landmark articles, Solove [
135,
136] points out that while it is not feasible to arrive at an overarching definition of privacy, the concept can be understood by isolating common “essential” or “core” characteristics. According to Morales-Trujillo et al., to address privacy during software development and to be able to respond to user’s privacy concerns, a conceptual framework is needed that goes beyond data minimization and access control [
95].
Solove [
137] approached this from a legal perspective by developing a taxonomy of privacy violations pertaining to information collection, information processing, information dissemination, or invasion. From a technical perspective, privacy metrics are often used to compute the efficacy of privacy-enhancing technologies [
150], but these are of little use to people without a background in statistics. Martín, del Alamo, and Yelmo [
90] highlighted a lack of technical privacy requirements and criticized disagreement between high-level privacy principles. Anwar, Gill, and Beydoun [
16] found commonalities between privacy laws and standards but noted differences in nature and scope that require further investigation. Wilson et al. [
155] identified 10 categories of data practices by annotating 115 privacy policies. However, Morel and Pardo [
96] found that natural language privacy policies required by legislators often cover only a fraction of those categories. They also found significant differences in terms of coverage compared to privacy policies expressed graphically (usually proposed by privacy advocates) or in machine-readable form (usually proposed by academics).
Acquisti et al. [
4] saw potential in efforts toward assisting users with online privacy decisions by helping them reflect on their actions before the fact or by “nudging” them toward certain behaviours. But Rossi and Palmirani [
122] concluded that existing privacy visualizations vary in terms of the privacy attributes they cover and criticized that the majority were not user tested. They suggest a visual layer summarizing the privacy policy with special focus on the privacy principles of transparency and informed consent but to date, no new system has been developed. Hansen [
69] compared privacy pictograms and found most to be of limited practical relevance, noting a lack of international consensus on syntax and semantics. Motti and Caine [
97] reviewed icons related to privacy and classified them as either data collection, data transmission, data storage, data sharing, or access control.
Overall, there appears to be a lack of agreement in terms of decomposing privacy into its core attributes. To help understand online privacy, we identified a list of unified privacy attributes and ranked this list based on importance. We did so by systematically comparing proposals for conceptualizing privacy aimed at users (privacy visualizations) and at developers (PbD guidelines), considering all sources (academia, industry, and government), and accounting for the perspectives of users as well as privacy experts.
3 Method
Our goal is to distill, validate, and rank a complete list of privacy attributes. The first step toward achieving this was to perform a systematic review to identify privacy visualizations (Section
4.1) and PbD principles (Section
4.2) relevant for online services. We then extracted a list of privacy attributes by coding the results until reaching satisfactory inter-coder reliability and then refining it with practitioners (Section
4.3). Finally, we used online surveys to understand and compare the perceived importance of these privacy attributes to experts and users (Section
4.4). The research methodology behind each of these three steps is described in more detail below.
3.1 Systematic Review
The goal of the systematic review was to identify proposals from academia, industry, and government that can be used as sources of privacy attributes relevant for online services. We limited the scope of the review to documents that include either (a) an original visual representations of aspects related to privacy or data handling by online services or (b) a concrete list of high-level principles related to privacy or data handling for developing online services. While privacy is context dependent, the goal of this article is to extract a general list of privacy attributes that are applicable to any kind of online service. Therefore, we are not interested in privacy attributes that are only relevant for a specific technology (e.g., mobile applications or IoT devices), domain (e.g., healthcare or social networks), or specific target-group (e.g., children).
We started by searching Scopus using the following queries, selecting papers published between 2001 and 2019.
•
TITLE-ABS-KEY(privacy AND (label OR icons OR symbols)) resulting in total of 2063 papers;
•
TITLE-ABS-KEY(“privacy by design” AND (principles OR guidelines)) resulting in a total of 185 papers.
We then followed a systematic review process [
132] using a “snowballing” approach [
157] described below to iteratively extend the search query and the sample by examining references (Figure
1).
3.1.1 Privacy Visualizations.
We read the abstracts and titles of the 2,063 papers retrieved from Scopus and identified 23 that might include an original visual representation of aspects related to privacy or data handling by online services. When scanning these papers, we learned of other terms used to describe privacy visualizations so we assembled these into an extended Scopus and Web of Science query, this time using phrases to reduce the amount of irrelevant results:
•
TITLE-ABS-KEY (“privacy symbol” OR “privacy label” OR “privacy icon” OR “privacy graphic” OR “privacy visual” OR “privacy pictogram” OR “privacy indicator” OR “privacy indication” OR “privacy badge” OR “privacy emblem” OR “privacy image” OR “privacy motif” OR “privacy mark” OR “privacy token” OR “privacy stamp”) resulting in a total of 82 papers.
We read the titles and abstracts of these results and identified 10 more potentially relevant papers. We then read the full text of the 23+10 papers selected thus far and found 17 other relevant papers among their references, which we also read. In total, we were able to find 41 papers containing privacy visualizations. We also learned about a 2016 Workshop on Privacy Indicators but found none of the papers published there satisfied the inclusion criteria described in Section
3.1. To make sure we did not miss anything, we performed several Google searches using all of the keywords we identified and found five more proposals for privacy visualizations coming from
Non-Governmental Organisations (NGOs) and industry.
Finally, we analyzed and discussed all of the 41+5 results to select suitable candidates for extracting privacy attributes applicable to online services in general. We therefore excluded papers that were technology specific [
57,
65,
76,
131], domain specific [
75], or target-group specific [
49,
116,
134]. These are marked with an asterisk (*) in our reference list. We also excluded papers that only include an overall rating [
18,
46,
71,
138,
163], as well as other papers where no individual attributes could be distinguished [
56,
83,
118]. Finally, we excluded papers that evaluate and classify existing visualizations [
69,
96,
144,
164], because the visualizations they cover were already in our sample.
After removing duplicates, we ended up with a final sample of 13 privacy visualizations that we discuss in detail in Section
4.1 and that served as input for our coding process. Of these 13, 7 come from academia, 5 from industry, and 1 from government.
3.1.2 Privacy by Design Principles.
We read the abstracts and titles of the 185 papers retrieved from Scopus and selected 39 that appeared to include a concrete list of high-level principles related to privacy or data handling for developing online services. When scanning these papers, we learned of other related terms so we assembled these into an extended Scopus and Web of Science query:
•
TITLE-ABS-KEY (“privacy by design” AND (principles OR guidelines OR conventions OR fundaments OR rules OR strategies OR methods OR procedures OR protocols OR guide)) resulting in a total of 330 papers.
We read the titles and abstracts of these results and identified 18 more potentially relevant papers. We then read the full text of the 39+18 papers selected thus far and found 25 other relevant papers among their references, which were also read. In total, 69 papers containing high-level PbD principles were found. We also ran a Google search based on the original Scopus query and found two other proposals for PbD principles coming from industry.
We analyzed and discussed all of the 69+2 results to select which are suitable for extracting generally applicable privacy attributes. We therefore excluded papers that discuss technology-specific principles [
2,
61,
109,
110,
111,
112,
114,
130,
148] as well as papers that translate generic PbD principles to specific domains [
21,
27,
28,
35,
38,
48,
82,
117,
124,
146,
149]. These are marked with a dagger (
\(\dagger\)) in the Reference list. We excluded papers that discuss and compare existing PbD principles [
22,
70,
85,
120,
121,
128] as well as those that refine or operationalize PbD principles [
6,
10,
19,
26,
37,
41,
42,
43,
50,
52,
91,
92,
139,
140,
143], but added the PbD principles they reference to our sample.
After removing duplicates, we ended up with a final sample of 14 PbD guidelines that we discuss in detail in Section
4.2 and that served as input for our coding process. Of these 14, 2 come from academia, 5 from industry, and 7 from government.
3.2 Coding
To analyze the results of the systematic review, we followed an iterative coding process.
First, the second author of this article analyzed the privacy visualizations and PbD guidelines selected during the systematic review. The content was divided into passages, and each passage was coded with one or more terms related to the handling of personal data. This resulted in an initial list of 13 privacy attributes.
Second, we discussed the initial list of privacy attributes with two information security professionals from a large software solutions provider in a 1-hour unstructured interview. Both security professionals deal with information privacy on a daily basis. As a result of the interviews, two attributes were split-up and the definitions of the attributes were clarified.
Third, to validate the refined list of 15 privacy attributes, three other coders coded 60% of the sample. After three rounds of discussions, refining the definitions of our codes, and re-coding the documents, Cohen’s kappa reached .93, which indicates an almost perfect agreement between the coders and therefore validates our final list of attributes.
Fourth, the final list and corresponding description of attributes was used as a coding scheme for analyzing the full sample of 13 privacy visualizations and 14 PbD guidelines.
3.3 Online Survey
To understand which attributes are most important, we designed an online survey to take the opinion of privacy experts and users into account. A convenience sample of users was recruited via universities, online social networks, and two commercial subject pools. We recruited privacy experts via LinkedIn by first asking approximately 500 members with “privacy officer” in their profile description to connect. The ones that accepted the invitation were asked if they perceive themselves as suitable privacy experts for this study and, if so, were directed to the questionnaire.
The survey, approved by the ethical committee of The University of Twente, collected demographic data about gender, education, occupation, nationality, and the type and frequency of online service usage. We asked the subjects how important on a scale from 0 (not at all important) to 10 (extremely important) they considered each of the 15 privacy attributes. The full phrasing is offered as the supplementary material. Since we want to obtain an overall ranking, we did not select a specific scenario. To assess the sensitivity of our findings, we asked participants whether or not they would rate these attributes differently for different types of services. Finally, in open questions, we asked if any of the descriptions were ambiguous and if they felt any attributes were missing.
By the December 5, 2019, 646 adult participants (148 privacy experts and 498 users) had responded to the questionnaire. To clean the data, we removed all 86 incomplete responses. A further 75 responses were removed after being considered invalid due to (1) questionable completion times (less than 2 minutes or more than 20 minutes), (2) pattern answering, (3) uncertainty (by their own admission) as to what the question was asking, or (4) no usage of online services. The number of valid responses was \(N=485\), of which 20.6% were privacy experts and 79.4% users. Of these, 49.7% were women and 48.9% men. EU nationals made up 91.8% of the sample. All adult age groups were represented: 18–24 (35.1%), 25–34 (10.7%), 35–44 (13.2%), 45–55 (22.7%), and 55+ (15.9%). Many of the respondents were well educated, with either under-graduate degrees or post-graduate degrees (24.3% and 29.3%, respectively). All respondents used online services at least once a day, and 66.2% did so several times a day. We also calculated the overall attribute importance, as the average score of the 15 privacy attributes. It was found to be reliable (Cronbach’s \(\alpha =0.90\)).
5 Discussion
Our literature review (Table
1) revealed notable differences between privacy visualizations and PbD guidelines in terms of the privacy attributes they cover. And, on average, PbD guidelines cover more attributes than visualizations (8.6 vs. 5.6 attributes per proposal). This result is not surprising if we consider that privacy visualizations are mostly designed to provide simple, user-friendly information about the handling of personal data [
93,
105].
Additionally, experts and users rated some attributes differently in the survey of Section
4.4. And, overall, privacy experts assigned a higher importance to most attributes. This is to be expected, because as privacy officers, they are not only concerned with privacy as users, but also professionally.
Sale and
sharing were rated as the most important attributes by most users and privacy experts in our sample. However, while icons related to the
sharing of data were included in all but one of the privacy visualizations, only half of the PbD principles dealt with this issue. Studies find that willingness to exchange personal data is strongly mitigated by secondary use [
89,
137] so it makes sense that almost all of the privacy visualizations we reviewed describe data sharing, since they are aimed at users. Sale, the attribute consistently ranked as most important in our survey, is covered by just two privacy visualizations and zero PbD guidelines. This is evidence of a growing discrepancy: While the sale of personal data remains an intrinsic part of the business model for online service providers [
87,
125] it is one of the major concerns of users [
81].
Collection and
purpose are arguably the most fundamental privacy attributes, because they describe which data are to be collected and why. The privacy experts we surveyed consider both collection and purpose to be of very high importance (closely following sale and sharing). Other studies confirm this observation [
3]. Users in our sample, however, rate purpose as less important. We speculate this is because users consider certain types of data as sensitive regardless of purpose [
20]. Nevertheless, collection and purpose were the most frequently occurring attributes in both privacy visualizations and PbD guidelines. Therefore, they appear to be the most important attributes to consider when discussing online privacy.
Transparency was mentioned in all PbD guidelines but only half of the privacy visualizations, even though users value insights in data handling practices [
144]. We speculate this is because privacy visualizations are themselves a tool for transparency. Nevertheless, complete transparency can only be achieved by having access to the source code or raw data streams [
84].
Security of personal information is mentioned by all of the 14 PbD guidelines we reviewed, but less than half of the visualizations, mostly those published after 2012. In our survey, privacy experts ranked security as the fifth most important attribute (users ranked it as sixth). This suggests that the security of personal information is considered critical for developing privacy-aware online services, but is also of increasing concern to users.
Accountability is also mentioned more often in proposals for PbD guidelines than for visualizations (almost 80% vs. 23%). This is not surprising, since accountability increases the magnitude of potential losses for the service provider in case of data breaches and PbD guidelines are aimed at developers. Nevertheless, accountability was ranked as the seventh most important attribute by users in our sample.
Retention is ranked significantly higher by privacy experts compared to users and also covered by most PbD guidelines and privacy visualizations.
The right to be forgotten, however, was perceived as more important by users and is rarely mentioned in the privacy visualizations or PbD guidelines we reviewed. The right to be forgotten and retention both relate to the ability of an organization to delete privacy sensitive data. Retention has always been a technical consideration, but the right to be forgotten is a relatively new, user-driven initiative. This is supported by the fact that in our literature review, we only found one mention of it before 2011. However, managing legacy data sources in a GDPR-compliant manner is a major challenge [
112]. Knowing how hard it is to completely remove data from all sources might cause privacy experts to rate the importance of the right to be forgotten lower than retention. However, users are likely more interested in the benefits such a right would provide rather than the technical constraints.
Anonymization was ranked as the fifth most important attribute by users and the eighth most important attribute by privacy experts, but received surprisingly little attention in the literature reviewed here. Anonymization is technically challenging [
95] and privacy experts know this. Because true anonymization is seldom achievable [
108,
158], various degrees of pseudonymity are implemented instead. Although the information security practitioners in our initial focus group felt that
pseudonymization should be differentiated from anonymization, several privacy experts in our survey indicated that the two attributes are difficult to distinguish. We speculate users are also not familiar with this distinction and ranked pseuodnymization as less important, because it implies less protection. Nevertheless, taking steps to remove personal identifiers from user data is of interest to users, which also implies this should be given more careful consideration by developers. However, from a practical perspective, pseudonymization can be viewed as partial or imperfect anonymization.
Control and
correctness were ranked relatively low by both users and privacy experts but were often encountered in PbD guidelines. Furthermore, correctness was represented in 30% of visualizations. Online services increasingly gather and aggregate user data to glean insights into habits, trends or behaviors not directly related to the actual exchange of the product or service [
9,
87,
152], but privacy controls are widely perceived as overly complex by users [
25,
119]. The resulting difficulty in managing personal data results in privacy fatigue: a sense of not being in control of the collection and sharing of data online [
40,
73]. This weakens the perceived utility and therefore importance of privacy settings and controls. Nevertheless, such mechanisms enhance privacy both proactively (preventing unauthorized collection or collection of incorrect data) and reactively (consent withdrawal and correction of previously collected data). Therefore, providing control over data collection and maintaining correctness of user data is an inherent part of online privacy [
87].
5.1 Trends
Although we reviewed PbD guidelines published or updated after 2001 (see Figure
15), our initial search returned many older PbD guidelines. The FTC Fair Information Practice was the first set of PbD principles, forming the foundation for many of the newer principles and legislation. In 1990, the
United Nations (UN) published similar guidelines and in 1995, the
European Union (EU) introduced its first Data Protection Directive. Throughout the first decades of the 21st century, the publication rate of PbD guidelines slowly increased and after 2009 we saw an increase in domain- or technology-specific PbD guidelines. Since most of the PbD guidelines we found are either regulation or industry standards, we conclude that PbD by design has made its way into practice.
However, all of the privacy visualizations we found were published after 2007, with the majority being published by academics after 2012. This coincides with an increase in privacy awareness. Although the need for communicating online privacy is not a new discussion [
93], research into empowering users to make informed disclosure decisions has recently started to gather steam [
55,
74,
105,
122]. We are starting to see industry initiatives as well. However, despite the fact that both the European GDPR and the U.S. FTC recommend standardized privacy labels, no official standard has yet been defined.
Disclosure,
correctness,
accountability, and
the right to be forgotten are increasingly common in recent privacy visualizations. This trend likely reflects increasing concerns regarding safe harbor [
44] and data breaches [
54]. Even though correctness and accountability are covered by many PbD guidelines, disclosure is not covered by recent initiatives such as PIPEDA, the Privacy Company, and privacylabel.org.
Sale, the right to be forgotten, anonymization and accountability were rated as very important by our sample of users. However, accountability and anonymization are missing from most privacy visualizations while sale and the right to be forgotten are missing from PbD guidelines as well. But sale of personal data is of increasing concern to users, EU law mandates the right to be forgotten, anonymization is becoming an industry standard, and service providers have been receiving record fines for privacy infringements. These developments lead us to believe that, while current approaches to communicating and implementing privacy do not yet take the needs and preferences of users into account, this situation will (hopefully) change in the future.
5.2 Limitations
Because the entry point of the literature search was Scopus, it is possible that not all relevant proposals from industry were considered. We mitigated this by performing auxiliary Google and Web of Science searches. Furthermore, even though we ran several searches using nine synonyms for principles and 14 synonyms for visualizations, important keywords may have been missed. We do believe, however, that our literature sample of 27 proposals is sufficient to reach saturation in terms of privacy attributes. This is supported by the fact that each privacy attribute was encountered in at least two documents and that over 93% of privacy experts and users we surveyed indicated the unified list was complete and unambiguous.
Some of the documents selected for our systematic review were ambiguous and many differed in terms of granularity and scope. Therefore, multiple attributes were sometimes attached to the same principle or visualization and multiple principles or visualizations sometimes corresponded to a single privacy attribute. Nevertheless, after three rounds of coding we reached almost perfect agreement between coders. This indicates an inherent overlap between the attributes that is to be expected, because they are inter-dependent and refer to the same overarching concept. Nevertheless, while the attributes on our list could be grouped, broken up, or renamed for practical applications, the list itself is complete and understandable.
In our online survey, the expert sample was smaller than the user sample. This is because privacy experts are a specialized group and a larger sample was hard to obtain. A disproportionate number of the respondents were young and have attended higher education. However, age and gender were not found to be confounding variables. Finally, while the list of attributes is international, almost all respondents were European, which makes our ranking European.
The results might be influenced by response bias. However, the topic of our questionnaire is not socially sensitive and therefore the risk of giving socially desirable answers is small. Furthermore, by screening the raw data rigorously and removing superficial and incomplete responses, we are confident that we have managed to keep any potential response bias to a minimum.
Last, differences between the perceived importance of most attributes were small and many respondents indicated that their rating depends on the type of application and data. We mitigated this by also considering the occurrence rate of each attribute in the literature we reviewed.
5.3 Practical Recommendations
5.3.1 Privacy Visualizations Should Be Legally Mandated.
Except for CLEVER
\(^\circ\)FRANKE’s, DAPIS and privacylabel.org, which are currently under development, all of the other privacy visualization projects have been abandoned. We speculate that adopting such labels—and more importantly, getting a good score—provides a non-functional benefit to the user but comes at great costs for the provider, as is often the case with safety and security. Indeed, third-party privacy seals are not correlated with trustworthiness [
53] and crowd-sourcing efforts such as
Terms of Service; Didn’t Read (TOS:DR) have so far been unsuccessful. Providers should therefore supply an understandable summary of their privacy policies themselves [
151]. However, since similar endeavors such as the EU energy label, movie ratings, and even seatbelts had to become mandatory before they were adopted, privacy visualizations will only become wide-spread if they are legally mandated.
5.3.2 Privacy Visualizations Should Go Beyond Data Collection and Processing.
We find that most privacy labels align with Nissenbaum [
100] and Martin and Shilton [
89] in that they primarily communicate what information is collected, how this information is shared and for what purpose. However, our ranking suggests that sale of data must also be made explicit. Furthermore, although most current visualizations do not include an indication of the level of security and accountability, this is important to both privacy experts and users and actually mandated by the GDPR [
106]. Trustworthy online data exchange relies on obtaining truly informed consent [
86], but this requires providing the end-user with relevant information in an understandable form. This could be achieved by grouping the information across multiple layers [
55,
145]. Our ranked list of privacy attributes serves as a basis for a user-centric privacy visualization that covers all important aspects of privacy.
5.3.3 PbD Guidelines Should Be More User-centric.
One of the most striking findings was the fact that the two attributes rated as most important by both privacy experts and users (
sale and
sharing) were rarely covered by PbD principles. To avoid anxiety, uncertainty, or even fear [
100], the gap between privacy concerns and guidelines aimed at addressing them must be reduced. PbD is aimed at taking the privacy concerns of the end-user into consideration during development, and so issues related to data sharing (in particular sale of user data) must be part of PbD guidelines. Ideally, since the lowest average importance rating was six on a 0-to-10 scale, PbD guidelines should cover all of the attributes on our list, with the possible exception of functionality. This is because functionality was ranked as one of the least important attributes and was sometimes marked as confusing by both privacy experts and users.
5.3.4 The Right to Be Forgotten Should Not Be Forgotten.
The right to be forgotten was rarely mentioned in the PbD guidelines we reviewed. In 2014 however, the European Court of Justice ruled that European users can request the removal of personal data from online service providers and the GDPR mandates this as well (despite the fact that the right to be forgotten is not one of the GDPR’s PbD principles). Newman questions whether the right to be forgotten is financially and legally feasible [
98]. Still, according to Ausloos [
17], the ability to demand the erasure of personal data can and must be available in data processing situations where consent was required and more widely assuming normative, economical, technical, and legislative changes. Even though most PbD guidelines already recommend obtaining consent (i.e.,
control) and recommend removal of data when it is no longer necessary (i.e.,
retention), the right to be forgotten goes a step further by giving users the ability to withdraw consent. Therefore, the right to be forgotten (or its diluted form, the “right to erasure” [
11]), should be an integral part of future PbD guidelines.
5.4 Research Challenges
5.4.1 Structuring Privacy Policies.
Privacy policies often focus on
collection,
sale, and
sharing of user data, but our survey revealed that the
right to be forgotten and
security are of increasing concern. Furthermore, regulation increasingly mandates that privacy policies provide information about potential
disclosure to (foreign) government entities,
accountability in case of breaches, and the ability to
correct one’s data. The Unified List of Privacy Attributes of Section
4.3 is based on extensive review and comparison of privacy attributes covered by privacy visualizations and PbD guidelines aimed at online services in general. Therefore, it represents a complete and technology-/domain-independent checklist of aspects related to online privacy. A valuable research direction is to investigate whether such a checklist can be used to verify the completeness of privacy policies [
8], to structure (or even automatically restructure [
161]) privacy policies, to make automatically generated privacy policies more readable [
162], or to automatically analyze privacy policies [
12].
5.4.2 Developing a Privacy Rating System.
Similarly to PrivOnto [
104], the privacy attributes on our list can be operationalized so that they can be used measure and compare the privacy level of online services on multiple metrics. The privacy attributes could also be used to (semi-)automatically annotate privacy policies [
156]. In the long term, the privacy attributes could be used to produce or generate standardized, understandable, machine-readable summaries of privacy policies that enable both providers and users to assess, communicate, and compare the privacy of online services. To explore this direction, we started developing a free online service that implements some of these ideas:
www.privacyrating.info. However, usability testing is critical, and the user testing performed on the DCI approach [
147] and by Fox et al. [
62] serves as a starting point for evaluating current and future proposals.
5.4.3 Investigating Context-dependency of Privacy Attributes.
The Unified List of Privacy attributes in Section
4.3 is a first step toward a standardized list of privacy attributes that can function as the foundation of a privacy visualization. However, the work of Nissenbaum [
100] and Martin [
88] showed that information privacy is discriminate, embedded in the context, and based on a social contract between the various stakeholders involved in the information exchange. It seems that privacy perception is not universal, but depends on the contextual factors such as the type of data and the disclosure scenario [
99,
113,
159]. For instance, the importance of some attributes might differ for an eHealth service compared to an online shopping website. Nevertheless, Solove [
135] suggests a certain congruity between situations of personal data disclosure online.
We excluded mobile- and IoT-specific papers from our survey, because the domains are subject to different privacy concerns, and constrained in terms of privacy communication. Mobile apps are among the most privacy intrusive means of interacting with an online service [
12] and privacy policies are almost always incorrect, incomplete, imprecise, inconsistent and/or privacy-unfriendly [
160]. Even health apps are often not GDPR compliant [
58].
The unified list of general attributes in Section
4.3 can be used as a reference and starting point for domain-, technology-, or target-group-specific guidelines or visualizations. However, the extent to which privacy is context dependent remains an open problem. Are specialized privacy labels needed or is a universal privacy visualization effective? How specific should PbD guidelines be?
6 Conclusions
We performed a systematic review of current approaches to communicating privacy issues to users (privacy visualizations) and to developers (PbD guidelines). It revealed significant gaps in terms of the aspects of data processing these approaches cover. To understand these differences, we distilled a Unified List of Privacy Attributes and ranked it based on perceived importance by European privacy experts and users.
Our study revealed that some attributes are considered important by both privacy experts and users: what type of personal data is collected, with whom it is shared with, and whether or not it is sold. The PbD guidelines we reviewed also emphasize collection, but mention purpose more often than sharing or sale. Furthermore, PbD guidelines often focus on ensuring information security and transparency while providing users with privacy controls. Privacy visualizations take a user-centric perspective, focusing on collection, purpose, and sharing. Overall, we see an increase in publications pertaining to PbD and privacy visualizations. The right to be forgotten and accountability of service providers are increasingly mentioned in both regulations and guidelines. Both were found to be important in our survey. Disclosure to law enforcement, retention periods, and correctness of data are also mentioned increasingly often in publications covering online privacy, although these were ranked as relatively unimportant by our sample of privacy experts and users. Pseudonymization, anonymization, and the tradeoff between functionality and privacy are mentioned in a minority of the literature we reviewed and were perceived to be relatively unimportant by the users and privacy experts we surveyed.
The results serve as (1) a ranked list of privacy best practices for developers and providers of online services, (2) a foundation to visually communicate the most relevant aspects of a privacy policy to users, and (3) a taxonomy for structuring, comparing, and, in the future, rating privacy policies of online services.