survey

Open access

Understanding Online Privacy—A Systematic Review of Privacy Visualizations and Privacy by Design Guidelines

Authors:

Susanne Barth,

Dan Ionita,

Pieter HartelAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 55, Issue 3

Article No.: 63, Pages 1 - 37

https://doi.org/10.1145/3502288

Published: 03 February 2022 Publication History

All formats PDF

Abstract

Privacy visualizations help users understand the privacy implications of using an online service. Privacy by Design guidelines provide generally accepted privacy standards for developers of online services. To obtain a comprehensive understanding of online privacy, we review established approaches, distill a unified list of 15 privacy attributes and rank them based on perceived importance by users and privacy experts. We then discuss similarities, explain notable differences, and examine trends in terms of the attributes covered. Finally, we show how our results provide a foundation for user-centric privacy visualizations, inspire best practices for developers, and give structure to privacy policies.

1 Introduction

Online services currently handle unprecedented amounts of user-related data [129]. Machine learning algorithms extract value from large amounts of data by recognizing hidden patterns, links, behaviors, trends, identities, and practical knowledge, which has given birth to a “big data economy” [9, 152]. This has opened a “Pandora’s Box” of privacy concerns [113, 141, 151]. But the privacy policies that are meant to address these concerns are often lengthy, legally worded documents written to protect the provider [15, 59]. Even the interactive permission system found on modern smartphones fails to provide a sufficient understanding of the privacy risks involved with using an application [24, 39, 78].

To communicate privacy risks to users in a clear and concise manner, researchers, regulators, and industry have called for a more visual representation of how online services handle personal data [14, 15, 55, 74, 122, 164]. Since 2001, the United States Federal Trade Commission (FTC) has been encouraging standardized, tabular privacy policies similar to nutrition labels [13]. The more recent European General Data Protection Regulation (GDPR) also suggests using “standardized icons” to provide a meaningful overview of the intended data processing [106]. The Digital Advertising Alliance displays a YourAdChoices button on their ads [47] and the Entertainment Software Rating board has introduced icons indicating whether or not games share personal information with third parties [68]. At the same time, a variety of privacy icons, labels, and notices designed to convey how personal data are handled have been proposed by researchers [60, 62, 66, 77, 123, 145, 147] and industry [63, 115]. However, these visualizations differ with regard to the privacy attributes they cover, as well as their level of detail. Furthermore, the comprehensibility and effectiveness of the visualizations remains questionable as most of them have never been tested with users [122].

Whereas visual representations of privacy attributes are intended for users, Privacy by Design (PbD) guidelines are intended for developers. They determine to a significant extent how user privacy is handled. Because developers are not privacy experts, they need clear and unambiguous instructions with regards to how personal data should be handled [36], and they need to know which privacy attributes are considered important by users. While guidelines for what was once referred to as “fair information practice” go as far back as the 1970s [64], technological developments have prompted a renewed interest in developing privacy-aware information systems [127]. However, there is currently no generally accepted PbD standard or best practice. Rather, multiple regulators and industry stakeholders have each elaborated their own PbD principles that, similarly to privacy visualizations, differ significantly in terms of the privacy attributes they consider.

As a result, developers are confronted with diverging and sometimes contradictory guidelines and lack a universal privacy communication language that is understandable to end-users. We address this problem by systematizing knowledge surrounding privacy from relevant approaches in academia, industry, and government and by considering the opinions of both privacy experts and users to compile, validate, and rank a complete list of generally applicable privacy attributes. As a first step, a list of privacy attributes was derived by means of a systematic review of existing privacy visualizations and PbD principles. Second, this list was refined and extended in collaboration with information security professionals via interviews. Third, we distributed an online questionnaire among predominantly European privacy experts and users of online services, resulting in a ranking according to perceived importance from both perspectives. Finally, based on the results, we explain notable differences and patterns and identify trends. Together, our results form a foundation for understanding, communicating, and discussing privacy, and inform the development of user-oriented privacy-aware online services. We present practical recommendations for the development of future privacy visualizations and PbD guidelines, as well as outline research challenges toward facilitating the analysis and comparison of privacy policies and investigating the context-dependency of privacy attributes.

2 Background

The debate around privacy started in the late 19th century with the launch of the telephone and intensified throughout the “cybernetic revolution” of the 1970s [94]. In his landmark 1967 book, Westin defined privacy as “the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others” [153p. 7]. Fundamentally, modern privacy is about information [31]. However, the concept kept expanding in both scope and significance with the emergence of the Internet, mass surveillance, terrorism threats [151, 154], and, more recently, with the development of big data and the Web 2.0 [9, 113, 141, 152]. Currently, privacy, and in particular online privacy, remains hard to define [107] or, in the words of Solove, “a concept in disarray” [136]. Smith, Dinev, and Xu [133] notes that historically, privacy was seen as a right, a commodity, a control, or a state. Martin [88p. 557] sees privacy as a “social contract around what, to whom, and for what purpose” information is gathered or disclosed within a given community and context. Nissenbaum posits that privacy is shaped by social boundaries and norms [99] and, because individuals cannot provide truly informed consent, she suggests articulating context-specific norms that govern the collection and sharing of data online [100]. According to her theory of “contextual integrity,” whether or not an action constitutes a violation of information privacy depends on variables related to the context, the nature of the information, the actors involved and their relationships to the data subject, as well as the terms for collecting and sharing information. Acquisti, Taylor, and Wagman [5] discuss the economic value of privacy and also find that in some situations data sharing can be beneficial for the user, while in other situations it can be damaging. Nevertheless, in his landmark articles, Solove [135, 136] points out that while it is not feasible to arrive at an overarching definition of privacy, the concept can be understood by isolating common “essential” or “core” characteristics. According to Morales-Trujillo et al., to address privacy during software development and to be able to respond to user’s privacy concerns, a conceptual framework is needed that goes beyond data minimization and access control [95].

Solove [137] approached this from a legal perspective by developing a taxonomy of privacy violations pertaining to information collection, information processing, information dissemination, or invasion. From a technical perspective, privacy metrics are often used to compute the efficacy of privacy-enhancing technologies [150], but these are of little use to people without a background in statistics. Martín, del Alamo, and Yelmo [90] highlighted a lack of technical privacy requirements and criticized disagreement between high-level privacy principles. Anwar, Gill, and Beydoun [16] found commonalities between privacy laws and standards but noted differences in nature and scope that require further investigation. Wilson et al. [155] identified 10 categories of data practices by annotating 115 privacy policies. However, Morel and Pardo [96] found that natural language privacy policies required by legislators often cover only a fraction of those categories. They also found significant differences in terms of coverage compared to privacy policies expressed graphically (usually proposed by privacy advocates) or in machine-readable form (usually proposed by academics).

Acquisti et al. [4] saw potential in efforts toward assisting users with online privacy decisions by helping them reflect on their actions before the fact or by “nudging” them toward certain behaviours. But Rossi and Palmirani [122] concluded that existing privacy visualizations vary in terms of the privacy attributes they cover and criticized that the majority were not user tested. They suggest a visual layer summarizing the privacy policy with special focus on the privacy principles of transparency and informed consent but to date, no new system has been developed. Hansen [69] compared privacy pictograms and found most to be of limited practical relevance, noting a lack of international consensus on syntax and semantics. Motti and Caine [97] reviewed icons related to privacy and classified them as either data collection, data transmission, data storage, data sharing, or access control.

Overall, there appears to be a lack of agreement in terms of decomposing privacy into its core attributes. To help understand online privacy, we identified a list of unified privacy attributes and ranked this list based on importance. We did so by systematically comparing proposals for conceptualizing privacy aimed at users (privacy visualizations) and at developers (PbD guidelines), considering all sources (academia, industry, and government), and accounting for the perspectives of users as well as privacy experts.

3 Method

Our goal is to distill, validate, and rank a complete list of privacy attributes. The first step toward achieving this was to perform a systematic review to identify privacy visualizations (Section 4.1) and PbD principles (Section 4.2) relevant for online services. We then extracted a list of privacy attributes by coding the results until reaching satisfactory inter-coder reliability and then refining it with practitioners (Section 4.3). Finally, we used online surveys to understand and compare the perceived importance of these privacy attributes to experts and users (Section 4.4). The research methodology behind each of these three steps is described in more detail below.

3.1 Systematic Review

The goal of the systematic review was to identify proposals from academia, industry, and government that can be used as sources of privacy attributes relevant for online services. We limited the scope of the review to documents that include either (a) an original visual representations of aspects related to privacy or data handling by online services or (b) a concrete list of high-level principles related to privacy or data handling for developing online services. While privacy is context dependent, the goal of this article is to extract a general list of privacy attributes that are applicable to any kind of online service. Therefore, we are not interested in privacy attributes that are only relevant for a specific technology (e.g., mobile applications or IoT devices), domain (e.g., healthcare or social networks), or specific target-group (e.g., children).

We started by searching Scopus using the following queries, selecting papers published between 2001 and 2019.

•

TITLE-ABS-KEY(privacy AND (label OR icons OR symbols)) resulting in total of 2063 papers;

•

TITLE-ABS-KEY(“privacy by design” AND (principles OR guidelines)) resulting in a total of 185 papers.

We then followed a systematic review process [132] using a “snowballing” approach [157] described below to iteratively extend the search query and the sample by examining references (Figure 1).

Fig. 1.

3.1.1 Privacy Visualizations.

We read the abstracts and titles of the 2,063 papers retrieved from Scopus and identified 23 that might include an original visual representation of aspects related to privacy or data handling by online services. When scanning these papers, we learned of other terms used to describe privacy visualizations so we assembled these into an extended Scopus and Web of Science query, this time using phrases to reduce the amount of irrelevant results:

•

TITLE-ABS-KEY (“privacy symbol” OR “privacy label” OR “privacy icon” OR “privacy graphic” OR “privacy visual” OR “privacy pictogram” OR “privacy indicator” OR “privacy indication” OR “privacy badge” OR “privacy emblem” OR “privacy image” OR “privacy motif” OR “privacy mark” OR “privacy token” OR “privacy stamp”) resulting in a total of 82 papers.

We read the titles and abstracts of these results and identified 10 more potentially relevant papers. We then read the full text of the 23+10 papers selected thus far and found 17 other relevant papers among their references, which we also read. In total, we were able to find 41 papers containing privacy visualizations. We also learned about a 2016 Workshop on Privacy Indicators but found none of the papers published there satisfied the inclusion criteria described in Section 3.1. To make sure we did not miss anything, we performed several Google searches using all of the keywords we identified and found five more proposals for privacy visualizations coming from Non-Governmental Organisations (NGOs) and industry.

Finally, we analyzed and discussed all of the 41+5 results to select suitable candidates for extracting privacy attributes applicable to online services in general. We therefore excluded papers that were technology specific [57, 65, 76, 131], domain specific [75], or target-group specific [49, 116, 134]. These are marked with an asterisk (*) in our reference list. We also excluded papers that only include an overall rating [18, 46, 71, 138, 163], as well as other papers where no individual attributes could be distinguished [56, 83, 118]. Finally, we excluded papers that evaluate and classify existing visualizations [69, 96, 144, 164], because the visualizations they cover were already in our sample.

After removing duplicates, we ended up with a final sample of 13 privacy visualizations that we discuss in detail in Section 4.1 and that served as input for our coding process. Of these 13, 7 come from academia, 5 from industry, and 1 from government.

3.1.2 Privacy by Design Principles.

We read the abstracts and titles of the 185 papers retrieved from Scopus and selected 39 that appeared to include a concrete list of high-level principles related to privacy or data handling for developing online services. When scanning these papers, we learned of other related terms so we assembled these into an extended Scopus and Web of Science query:

•

TITLE-ABS-KEY (“privacy by design” AND (principles OR guidelines OR conventions OR fundaments OR rules OR strategies OR methods OR procedures OR protocols OR guide)) resulting in a total of 330 papers.

We read the titles and abstracts of these results and identified 18 more potentially relevant papers. We then read the full text of the 39+18 papers selected thus far and found 25 other relevant papers among their references, which were also read. In total, 69 papers containing high-level PbD principles were found. We also ran a Google search based on the original Scopus query and found two other proposals for PbD principles coming from industry.

We analyzed and discussed all of the 69+2 results to select which are suitable for extracting generally applicable privacy attributes. We therefore excluded papers that discuss technology-specific principles [2, 61, 109, 110, 111, 112, 114, 130, 148] as well as papers that translate generic PbD principles to specific domains [21, 27, 28, 35, 38, 48, 82, 117, 124, 146, 149]. These are marked with a dagger (\(\dagger\)) in the Reference list. We excluded papers that discuss and compare existing PbD principles [22, 70, 85, 120, 121, 128] as well as those that refine or operationalize PbD principles [6, 10, 19, 26, 37, 41, 42, 43, 50, 52, 91, 92, 139, 140, 143], but added the PbD principles they reference to our sample.

After removing duplicates, we ended up with a final sample of 14 PbD guidelines that we discuss in detail in Section 4.2 and that served as input for our coding process. Of these 14, 2 come from academia, 5 from industry, and 7 from government.

3.2 Coding

To analyze the results of the systematic review, we followed an iterative coding process.

First, the second author of this article analyzed the privacy visualizations and PbD guidelines selected during the systematic review. The content was divided into passages, and each passage was coded with one or more terms related to the handling of personal data. This resulted in an initial list of 13 privacy attributes.

Second, we discussed the initial list of privacy attributes with two information security professionals from a large software solutions provider in a 1-hour unstructured interview. Both security professionals deal with information privacy on a daily basis. As a result of the interviews, two attributes were split-up and the definitions of the attributes were clarified.

Third, to validate the refined list of 15 privacy attributes, three other coders coded 60% of the sample. After three rounds of discussions, refining the definitions of our codes, and re-coding the documents, Cohen’s kappa reached .93, which indicates an almost perfect agreement between the coders and therefore validates our final list of attributes.

Fourth, the final list and corresponding description of attributes was used as a coding scheme for analyzing the full sample of 13 privacy visualizations and 14 PbD guidelines.

3.3 Online Survey

To understand which attributes are most important, we designed an online survey to take the opinion of privacy experts and users into account. A convenience sample of users was recruited via universities, online social networks, and two commercial subject pools. We recruited privacy experts via LinkedIn by first asking approximately 500 members with “privacy officer” in their profile description to connect. The ones that accepted the invitation were asked if they perceive themselves as suitable privacy experts for this study and, if so, were directed to the questionnaire.

The survey, approved by the ethical committee of The University of Twente, collected demographic data about gender, education, occupation, nationality, and the type and frequency of online service usage. We asked the subjects how important on a scale from 0 (not at all important) to 10 (extremely important) they considered each of the 15 privacy attributes. The full phrasing is offered as the supplementary material. Since we want to obtain an overall ranking, we did not select a specific scenario. To assess the sensitivity of our findings, we asked participants whether or not they would rate these attributes differently for different types of services. Finally, in open questions, we asked if any of the descriptions were ambiguous and if they felt any attributes were missing.

By the December 5, 2019, 646 adult participants (148 privacy experts and 498 users) had responded to the questionnaire. To clean the data, we removed all 86 incomplete responses. A further 75 responses were removed after being considered invalid due to (1) questionable completion times (less than 2 minutes or more than 20 minutes), (2) pattern answering, (3) uncertainty (by their own admission) as to what the question was asking, or (4) no usage of online services. The number of valid responses was \(N=485\), of which 20.6% were privacy experts and 79.4% users. Of these, 49.7% were women and 48.9% men. EU nationals made up 91.8% of the sample. All adult age groups were represented: 18–24 (35.1%), 25–34 (10.7%), 35–44 (13.2%), 45–55 (22.7%), and 55+ (15.9%). Many of the respondents were well educated, with either under-graduate degrees or post-graduate degrees (24.3% and 29.3%, respectively). All respondents used online services at least once a day, and 66.2% did so several times a day. We also calculated the overall attribute importance, as the average score of the 15 privacy attributes. It was found to be reliable (Cronbach’s \(\alpha =0.90\)).

4 Results

4.1 Privacy Visualizations

Privacy visualizations are visual representations designed to communicate aspects related to the handling of personal data to users of online services. In this section, we briefly describe the 13 privacy visualizations selected in chronological order and discuss the privacy attributes they cover. The complete icon sets are provided as supplementary material.

4.1.1 Mehldau’s Data-privacy Declarations.

To the best of our knowledge, designer Martin Mehldau was the first to propose an iconset to communicate the privacy aspects of an online service, in 2007. His list of “data-privacy declarations” contained 30 icons grouped into four categories that could be used to represent how data are used, stored, shared, or deleted. Because of the large number of icons, we do not show them here.¹ Some examples per category:

•

What data?, e.g., username, address, IP, contacts, cookies;

•

How is my data handled?, e.g., deleted, saved, anonymized, encrypted, published;

•

For what purpose?, e.g., statistics, advertising, shopping;

•

For how long?, e.g., end of usage, timestamp, undetermined.

4.1.2 KnowPrivacy’s Policy Coding Methodology.

In 2009, the KnowPrivacy research project² proposed set of “tags” used for coding privacy policies. Each tag referred to a type of user data, a general data practice, or a data sharing agreement and consisted of an icon and a description, as shown in Figure 2. For a given privacy policy, each tag could be in one of three states: YES, NO, or UNCLEAR [66] to visually indicate what data are collected, how they are used, and who they are shared with. They coded 50 privacy policies and compared the results with consumer expectations, finding a “large level of ignorance on the part of users about how data is collected” [79].

Fig. 2.

4.1.3 CyLab’s Privacy Nutrition Label.

Developed by Carnegie Mellon’s CyLab Usable Privacy and Security laboratory in 2009, the privacy nutrition label [77] takes a tabular approach to represent how personal data are handled by an online service provider (see Figure 3). Each row corresponds to a data item (e.g., location, health information, etc.), and each column corresponds to a way in which each item is used (e.g., marketing, profiling, sharing with other companies, etc.). Each cell in the resulting matrix gives a visual indication with regard to each data item–usage pair as follows:

Fig. 3.

•

An exclamation mark on a dark or red background signifies that the item is used for that purpose;

•

The text OUT on a dark gray or light red background signifies that the item is used for that purpose unless the user opts-out;

•

The text IN on a light gray or dark blue background signifies that the items is not used for that purpose unless the user opts-in;

•

A dash on a light background signifies that the data item is neither collected nor used for that purpose.

The rows and columns are fixed so that two policies can be compared side by side. There are a total of 10 data items and seven ways in which these can be used. The possible usages are as follows: (1) provide service and maintain site, (2) research and development, (3) marketing, (4) telemarketing, (5) profiling, (6) sharing with other companies, and (7) sharing on public forums.

4.1.4 Mozilla’s Privacy Icons.

In 2010, Aza Raskin from Mozilla proposed a set of icons that could be attached to existing privacy policies to provide a visual summary of the most important privacy issues: retention period, third-party use, ad networks, and law enforcement. The icon designs to represent these attributes have been the subject of multiple iterations, the latest are show in Figure 4. The project has since been abandoned but the icons are still present on Mozilla’s Wiki.³

Fig. 4.

4.1.5 The PrimeLife Project.

Also in 2010, the EU-funded PrimeLife project⁴ published several sets of icons: a general set and other sets for specific domains such as social media [60]. The icons were designed to be aligned with European privacy laws. The initial proposal contained 30 icons representing three types of privacy concepts: data types (i.e., personal, sensitive, payment, or medical data), data purpose (i.e., legal obligation, shipping, tracking, or profiling), and data processing (storage, deletion, pseudonymization, anonymization, disclosure, and collection). For social networks, PrimeLife added icons for groups of recipients (friends, friends of friends, selected individuals, and public). They performed user studies to compare different designs and found that icons should be as simple as possible and culturally neutral, and their number held to a minimum. [74]. Figure 5 shows the icons rated highest during their evaluation.

Fig. 5.

4.1.6 TrustArc’s Privacy Short Notice.

In 2011, TrustArc (the developers of the TRUSTe privacy certification standard) proposed an icon-based “privacy short notice” aimed at providing a simplified summary of privacy policies. After analyzing previous approaches, they concluded that such a short notice should focus on the data practices and uses that are invisible to users, namely secondary use (none, customization, or profiling), sharing (none, affiliates, or unrelated), third-party tracking, and data retention (none, limited, or indefinite) [115]. Therefore, their visualization only includes four icons (see Figure 6), accompanied by textual descriptions.

Fig. 6.

4.1.7 Privacy Wheel.

Based on a survey that showed users prefer general and less legally detailed information about data handling practices, van den Berg and van der Hof [145] developed the privacy wheel. Taking the privacy principles of the Organisation for Economic Cooperation and Development (OECD) as a starting point for their visualization, the wheel (see Figure 7) covers eight core concepts of privacy related information: (1) collection, (2) data quality, (3) purpose, (4) limited use, (5) security, (6) consent, (7) third parties, and (8) accountability. The spokes of the wheel are clickable, providing two layers of increasingly detailed information. Furthermore, some spokes provide an interactive mechanism for updating opt-in/opt-out preferences. The service provider has the obligation to apply those changes in the system.

Fig. 7.

4.1.8 GDPR’s Draft Privacy Icons.

Article 12 of the European GDPR mandates that “The information ...may be provided in combination with standardized icons to give in an easily visible, intelligible and clearly legible manner a meaningful overview of the intended processing.” [106]. In addition, it specifies that the icons should be machine readable. The final version of the GDPR does not prescribe specific icons or specific attributes that need to be represented but does empower the European Commission to determine these at a later time. However, an earlier draft of the GDPR did explicitly describe six icons shown in Figure 8. For each icon, a given application may score a checkmark or an X.

Fig. 8.

4.1.9 DCIs.

Developed in 2017 by researchers from the University of Oxford and Cambridge [147], the Data Controller Indicators (DCIs) provide information on the kinds of data that are sent by an app to various parties while considering the background information of those parties and the purposes behind data usage. Unlike the other visualizations we reviewed, DCIs labels are automatically generated and therefore can be easily scaled to a large number of services. Testing different versions of the visualization with users revealed a preference for Personalized DCIs (see Figure 9) that provide a differential risk assessment of data controllers by comparing the dataflows of multiple apps app.

Fig. 9.

4.1.10 Fox et al.’s GDPR Compliant Label.

In 2018, Fox et al. [62] started developing a privacy label that is compliant to the requirements mandated in the GDPR. Their label is based on the CyLab’s privacy nutrition label. In an iterative process, the authors developed an icon- and a text-based label and tested them in the context of an e-commerce website, revealing users’ preference for the icons. Consequently, an icon-based label was further developed, covering 12 privacy attributes as shown in Figure 10: (1) information about data controller, (2) data processing purposes, (3) recipients of personal data, (4) transfer to third countries, (5) retention, (6) rights of data subject, (7) consent, (8) right to complain, (9) disclosure, (10) automated decision-making, (11) details of data protection officer, and (12) further data processing. Next, the authors aim to test the label with and without ON/OFF toggles to indicate consent.

Fig. 10.

4.1.11 CLEVER\(^\circ\)FRANKE’s Privacy Label.

In 2019, SensorLabs, a Dutch non-profit initiative by UX design firm CLEVER\(^\circ\)FRANKE, published a highly simplified privacy label [63]. The label is designed for online services as well as physical devices, such as vending machines, card scanners, and even storefronts. To come up with the label, they reviewed the literature on conceptualizing and extracted three essential aspects of privacy, namely (1) collection, (2) purpose, and (3) control. Each of these aspects is measured using five yes/no questions based on the Rathenau Institute’s overview of ethical and societal issues related to digitization [80]. Each “yes” answer achieves 1 point, up to a maximum of 15 points. The final score determines what label the entity receives. Each label consists of two elements: an A-to-F category that also determines the color (A is green, F is red, everything in between is shades of orange), and a visual representation of the score on each of the three aspects. Figure 11 shows some example labels. The circle around the letter is divided into three parts corresponding to collection, purpose, and control. Each part consists of five layers corresponding to the five questions for each aspect.

Fig. 11.

4.1.12 DaPIS.

The Data Protection Icon Set, developed by Rossi and Palmirani in 2019 [123], is based on PrOnto, a computational ontology of the GDPR. The machine-readable layers provide interpretable information from legal documents, whereas the human-centered layer adds visual accessible icon design. As seen in Figure 12, DaPIS covers: (1) data, e.g., personal; (2) agents’ roles e.g., data subject or controller; (3) processing operations, e.g., anonymization or profiling; (4) data subject’s rights, e.g., access or erasure; (5) processing purposes, e.g., research or marketing; and (6) legal bases for processing, e.g., consent or legitimate interest. The authors emphasize that DaPIS is not designed to be a standardized European icon set but provides a foundation for the implementation of GDPR’s icons and is still under development.

Fig. 12.

4.1.13 Privacy Label (privacylabel.org).

In 2020, a Dutch consortium of privacy related companies and non-profit organizations, launched privacylabel.org. In combination with graphical icons, their tabular label provides information on seven core themes: (1) data collection, (2) purpose, (3) data sharing, (4) location, (5) duration, (6) legal grounds, and (7) take action. See Figure 13 for an example. The title of each theme is clickable, providing short explanations in relation with the GDPR regulations. For more information, a “learn more” option is provided, directing the user to the Privacy Label website. Furthermore, each core theme contains several sub-themes (referred to as “ingredients”) that are, in turn, clickable and provide further information tailored to the data practices of the online service (e.g., the reason behind data aggregation).

Fig. 13.

4.2 Privacy by Design Guidelines

Privacy by Design is an umbrella term for software development approaches that take privacy considerations into account from the early stages of design. In this section, we briefly describe each of the 14 PbD guidelines selected in chronological order and summarize the principles it proposes.

4.2.1 The Australian Privacy Principles.

The Australian Privacy Principles (APPs) were first added to the Australian Privacy Act in 2001. The APPs apply to the private sector and most government entities in Australia. They are technology neutral, and can be tailored to the needs of individual organizations. In 2014, the original list of 10 principles was extended to 13 [102]:

•

Open and transparent management of personal information: Manage personal data in an open and transparent way, including a clear and up-to-date privacy policy.

•

Anonymity and pseudonimity: Provide individuals the option of not identifying themselves.

•

Collection of solicited personal information: Conditions for collecting personal or sensitive data when needed and allowed.

•

Dealing with unsolicited personal information: Avoid gathering unsolicited personal data.

•

Notification of the collection of personal information: Provide information about data collection.

•

Use or disclosure of personal information: Conditions for usage or disclosure of personal data.

•

Direct marketing: Restrict use, disclosure of personal data for direct marketing purposes.

•

Cross-border disclosure of personal information: Conditions for personal data protection before disclosure overseas.

•

Adoption, use or disclosure of government related identifiers: Conditions for government related identifier adoption, or the disclosure of it.

•

Quality of personal information: Ensure personal data collected, used, or disclosed is accurate, up-to-date, and complete.

•

Security of personal information: Protect personal data and remove it when needed.

•

Access to personal information: Conditions for providing access to personal data.

•

Correction of personal information: Obligations for amendment of personal data.

4.2.2 CSA’s Model Code for the Protection of Personal Information.

First published in 1996, the Canadian Standards Association (CSA) has reaffirmed the Model Code for the Protection of Personal Information in 2001. The standard is focused around privacy rights and individual control over the use and exchange of personal information. Eventually, the 10 principles developed by the CSA have been incorporated into Canadian law. The following principles form the basis of the Model Code for the Protection of Personal Information [67]:

•

Accountability: Responsibility for personal data and compliance with the principles.

•

Identifying purposes: Identification of purposes before or at the time of data collection.

•

Consent: Consent is required for the collection, use, or disclosure of personal data.

•

Limiting collection: Data collection is limited to specified purposes.

•

Limiting use, disclosure and retention: Disclosure and retention limited to purposes.

•

Accuracy: Accuracy, completeness and up-to-dateness of personal data.

•

Safeguards: Data protection in proportion to the sensitivity of the information.

•

Openness: Readily available privacy policies and data management information.

•

Individual access: Upon request, access and amendment of personal data.

•

Challenging compliance: Possibility to challenge compliance with the principles.

4.2.3 APEC’s Privacy Framework.

Published in 2005, Asia-Pacific Economic Cooperation (APEC) developed a principle-based privacy framework. Inspired by OECD guidelines, this framework aims at developing information privacy protections and to warrant the free information flow in the Asia Pacific region. The privacy framework includes the following privacy principles [45]:

•

Preventing Harm: Prevention of misuse of personal information.

•

Notice: Provision of clear and easily accessible privacy policies.

•

Collection Limitation: Limitation of information collection to purpose.

•

Uses of Personal Information: Usage of personal data limited to purposes.

•

Choice: Possibility to exercise choice regarding collection, use, and disclosure of data.

•

Integrity of Personal Information: Accuracy, completeness, and up-to-dateness of data.

•

Security Safeguards: Protection of data against risks, e.g., loss or unauthorized access.

•

Access and Correction: Provision of access to personal data and the ability to correct them.

•

Accountability: Responsibility for compliance with these principles.

4.2.4 The Global Privacy Standard.

The Global Privacy Standard (GPS), was published in 2006, at the 28th International Data Protection and Privacy Commissioners Conference. Its purpose was to reinforce the mandate of data protection authorities by drafting “fundamental and universal privacy concepts” [32], namely:

•

Consent: Consent for collection, use or disclosure of personal information, and ability to withdraw consent.

•

Accountability: Communicate all privacy policies and procedures and seek equivalent privacy protection from third parties.

•

Purposes: Specify and communicate the purpose for collecting, using, retaining and disclosing personal information.

•

Collection Limitation and Data Minimization: Collection is fair, lawful and limited to specified purposes; data minimization and anonymization or pseudonymization should be applied.

•

Use, Retention, and Disclosure Limitation: Limit use, retention, and disclosure of personal information to specified purposes, except when required by law.

•

Accuracy: Accurate, complete, up-to-date personal information as per the specified purposes.

•

Security: Ensure security of personal information throughout its lifecycle as per recognized international standards.

•

Openness: Make information about policies and practices related to personal information readily available.

•

Access: Provide access to personal information, its uses, and allow to challenge its completeness or have it amended.

•

Compliance: Monitor, evaluate, and verify compliance with privacy policies and procedures.

4.2.5 ISTPA’s Privacy Framework.

In 2007, triggered by considerable changes in information privacy since 2002, as well as huge variations in the language and content of existing privacy frameworks, the International Security, Trust and Privacy Alliance (ISTPA) performed a structured review of existing privacy regulations and standards and extracted a set of key principles. They supplemented the list with three additional principles, resulting in a working set of 11 privacy principles [126]:

•

Notice: Provision of an overarching privacy policy.

•

Consent: Opt-in/opt-out, or implied affirmative process.

•

Collection Limitation: Minimal data collection and related to purposes.

•

Use Limitation: Usage and retention of personal data for specified purposes only.

•

Disclosure: Release, transfer, access or re-use of data with consent of data subject only.

•

Access and Correction: Ability to access and amend personal data.

•

Security/Safeguards: Confidentiality, availability and integrity of personal data.

•

Data Quality: Adequacy, up-to-dateness, minimization or elimination of personal data in relation to purposes.

•

Enforcement: Assurance of compliance with privacy policy and ability to challenge this.

•

Openness: Availability of privacy policy.

•

Anonymity: Prevention of identification.

•

Data Flow: Communication of data across geo-political jurisdictions.

•

Sensitivity: Specification of data that need special security controls.

4.2.6 The Generally Accepted Privacy Principles.

In 2009, the American Institute of Certified Public Accountants and the Canadian Institute of Chartered Accountants published Generally Accepted Privacy Principles (GAPP) [7]. It was intended as a global privacy framework aimed at helping accountants develop their own privacy program. GAPP is supported by 70 objectives grouped under 10 core principles:

•

Management: Communicate, and assign accountability for privacy policies and procedures.

•

Notice: Notice about privacy policies and procedures, identify purposes for personal information collection, usage, retention, and disclosure.

•

Choice and consent: Describe the choices and obtain implicit or explicit consent for the collection, use, and disclosure of personal information.

•

Collection: Collect personal information only for the purposes identified in the notice.

•

Use, retention, and disposal: Limit use and retention of personal information to identified and consented purposes or as required by law and thereafter disposal of such information.

•

Access: Provide individuals with access to their personal information for review and update.

•

Disclosure to third parties: Disclose personal information to third parties only for the purposes identified in the notice and with the implicit or explicit consent of the individual.

•

Security for privacy: Protect personal information against unauthorized access (both physical and logical).

•

Quality: Maintain accurate, complete, and relevant personal information for the purposes identified in the notice.

•

Monitoring and enforcement: Monitor compliance with privacy policies and procedures and have procedures to address privacy-related complaints and disputes.

4.2.7 Cavoukian’s 7 Foundational Principles.

Also in 2009, Ann Cavoukian [33, 34], the Privacy Commissioner of Ontario combined Langheinrich’s Principles of Privacy-Aware Ubiquitous Systems [82] with those of the GPS [32] into a set of high-level design principles for privacy-aware software that were later adopted by Deloitte [30]:

•

Proactive not Reactive; Preventive not Remedial: Anticipate and prevent privacy-invasive events instead of resolving them after they occur.

•

Privacy as Default: Ensure personal data protection automatically, without requiring action from individuals.

•

Privacy Embedded into Design: Embed PbD into the design by making it a core functionality and not an add-on.

•

Full functionality—Positive-Sum, not Zero-Sum: Accommodate all legitimate interests and avoid unnecessary tradeoffs and false dichotomies such as privacy vs. security.

•

End-to-End Life-cycle Protection: Secure data from start to finish and ensure it is securely destroyed at the end of the process.

•

Visibility and Transparency: Assure stakeholders that data are handled in accordance with stated promises and objectives and ensure visibility and transparency.

•

Respect for User Privacy: Protect the interests of the individuals by offering strong privacy defaults, appropriate notice, and by empowering user-friendly options.

4.2.8 ISO29100 Privacy Framework.

In 2011, the ISO/IEC Information Technology Task Force published its own privacy framework specifying a common privacy terminology wile defining actors and roles involved in the processing of Personally Identifiable Information (PII) [1]. Revised in 2017, the standard defines its own set of privacy safeguarding considerations, namely:

•

Consent and choice: Inform PII principals about PII processing, their rights, available choices, and implications; obtain consent and allow it to be withdrawn easily and free of charge.

•

Purpose legitimacy and specification: Ensure purpose(s) comply with law; communicate purpose(s) to PII principals before the time the information is collected or used for a new purpose.

•

Collection limitation: Limit the collection of PII to the bounds of applicable law and strictly necessary for the specified purpose(s).

•

Data minimization: Minimize the amount of PII processed and the number of third-parties involved, strive for anonymity or pseudonymity and delete PII when retention is no longer necessary.

•

Use, retention and disclosure limitation: Limit use, retention and sharing of PII to the purposes specified.

•

Accuracy and quality: Ensure that PII processed is reliable, accurate, complete, up-to-date, and periodically check and verify the validity and correctness before making any changes.

•

Openness, transparency, and notice: Provide clear and accessible information about policies and procedures concerning PII process, and notice about any major changes.

•

Individual participation and access: Provide PII principals with the ability to access and review PII, to challenge accuracy, have it amended, corrected or removed without cost or delay.

•

Accountability: Document and communicate privacy policies and procedures; define complaint procedures, inform about privacy breaches, including sanctions and compensation.

•

Information security: Protect PII with controls at the operational, functional and strategic levels to ensure integrity, confidentiality, and the availability of PII throughout its life-cycle.

•

Privacy compliance: Have appropriate internal controls and independent supervision mechanisms, periodically conduct audits perform privacy risk assessments.

4.2.9 OECD’s Privacy Principles.

Based on their 1980 Fair Information Practices aimed at the trans-border flow of information, the OECD published a revised set of privacy principles in 2013 that integrated the recent work on privacy law enforcement cooperation, resulting in the following principles [101]:

•

Collection Limitation Principle: Limited, fair, lawful data collection, obtain informed consent.

•

Data Quality Principle: Keep personal data relevant, accurate, complete, and up-to-date.

•

Purpose Specification Principle: Specify intended use before collection.

•

Use Limitation Principle: Do not use personal data for purposes other than those specified.

•

Security Safeguards Principle: Protect personal data using reasonable security safeguards.

•

Openness: Be transparent about the handling of personal data and provide contact information.

•

Individual Participation: Provide easy access to personal data and the ability to remove it.

•

Accountability Principle: Be accountable for complying with the principles stated above.

4.2.10 Hoepman’s Privacy Design Strategies.

First published in 2014, Hoepman’s Little Blue Book [72] outlines strategies to make PbD more concrete and applicable in practice. The book translates legal norms and best-practices surrounding personal data into the following design requirements:

•

Minimise: Keep the amount of personal information processed to a minimum.

•

Hide: Hide any personal information that is processed from plain view.

•

Separate: Process personal information in a distributed fashion whenever possible.

•

Aggregate: Process personal information at the highest aggregation and with the least detail.

•

Inform: Inform data subjects adequately whenever personal information is processed.

•

Control: Provide data subjects with agency over the processing of their personal information.

•

Enforce: Have a privacy policy compatible with legal requirements in place and enforce it.

•

Demonstrate: Demonstrate compliance with privacy policy and legal requirements.

4.2.11 OASIS Privacy Management Reference Model.

The Privacy Management Reference Model and Methodology (PMRM) was developed and published in 2016 by the Organization for the Advancement of Structured Information Standards (OASIS), a non-profit organization committed to privacy and personal data protection. Derived from international legislation and regulations, the PMRM provides a set of 14 privacy principles [51]:

•

Accountability: Compliance with privacy policies.

•

Notice: Open and transparent privacy policies.

•

Consent and Choice: Opt-in/opt-out, or implied affirmative process.

•

Collection Limitation and Information Minimization: Data collection, processing and retention limited to purpose fulfillment.

•

Use Limitation: Usage limited to specified and accepted purposes.

•

Disclosure: Transfer, access, or re-use of personal data with consent permission.

•

Access, Correction and Deletion: Right to discover, correct or delete personal data.

•

Security/Safeguards: Confidentiality, availability and integrity of personal data.

•

Information Quality: Accuracy, correctness and up-to-dateness of personal data.

•

Enforcement: Compliance with privacy policies.

•

Openness: Access to information about data handling practices.

•

Anonymity: Prevention of identification.

4.2.12 The Privacy Company’s PbD Framework.

In 2018, the Privacy Company published a data protection by design framework aimed at developers [29]. It translates the requirements of the European GDPR into the following guidelines:

•

Anonymization: Anonymize and aggregate.

•

Data minimization: Gather only necessary data and delete unnecessary data immediately.

•

Pseudonymization: Remove directly identifying elements, hashing, polymorphic pseudo-ID.

•

Encryption: Use public-key encryption, disk encryption, and so on.

•

Access control: Use digital data vault, logical access controls, authentication and authorization.

•

Data protection by default: Provide privacy-friendly settings by default, transparent user interface, and permission management.

•

Deletion/Retention terms: Automate deletion, data “flagging” after end of retention term, sticky policies, data fading.

•

Facilitate rights of data subjects: Privacy dashboard, communication/support.

4.2.13 GDPR Art. 5.

Launched in 2018, the European GDPR regulates data privacy laws across Europe and replaced the Data Protection Directive 95/46/EC. All organizations that target or collect data from people within EU must comply with the GDPR. Article 5 of the GDPR covers the following seven data protection principles relating to the processing of personal data [106]:

•

Lawfulness, fairness and transparency: Lawful, fair and transparent processing.

•

Purpose limitation: Specification of legitimate purposes for data processing.

•

Data minimization: Collection and processing restricted to what is absolutely necessary.

•

Accuracy: Data kept accurate and up-to-date.

•

Storage limitation: Storage only as long as necessary for purpose fulfillment.

•

Integrity and confidentiality: Appropriate security, integrity, and confidentiality.

•

Accountability: Responsibility for compliance with these principles.

4.2.14 The Personal Information Protection and Electronic Documents Act.

Under the authority of the Office of the Privacy Commissioner of Canada, the Personal Information Protection and Electronic Documents Act (PIPEDA) was revised in 2019. PIPEDA provides 10 fair information principles that serve as the groundwork for the collection, use, disclosure of, and access to personal data handled by the private sector [103]:

•

Accountability: Responsibility for personal data and compliance with principles.

•

Identifying Purposes: Specification of purposes before or at the time of collection.

•

Consent: Collection, usage, or disclosure of personal data with consent.

•

Limiting Collection: Limitation of collection according to purposes.

•

Limiting Use, Disclosure, and Retention: Usage or disclosure only for specified purposes and retention limited to purpose fulfillment.

•

Accuracy: Accuracy, completeness, and up-to-dateness of data.

•

Safeguards: Appropriate security measures and in accordance with data sensitivity.

•

Openness: Publicly and readily available privacy policy.

•

Individual access: Provision to access data and ability to challenge accuracy and completeness.

•

Challenging Compliance: Ability to challenge compliance with principles.

4.3 Unified List of Privacy Attributes

By means of an open-coding procedure, we distilled an initial list of 13 privacy attributes from the privacy visualizations and PbD guidelines we reviewed: accountability, anonymization, collection, control, correctness, disclosure, functionality, purpose, retention, sale, security, sharing, and transparency. After discussing this list with two practitioners (see Section 3.2), we added pseudonymization and the right to be forgotten before adding simple definitions to each of the privacy attributes. Finally, we iteratively refined the definitions during three rounds of coding, arriving at the following unified list of privacy attributes, ordered alphabetically:

•

= Can the service provider be held accountable for violations? e.g., legally binding privacy policy, legal precedents, regulation, and so on.

•

= Are all identifiable markers completely removed so that data can never be traced back to a single person?

◦

High level data aggregation is part of anonymization.

•

= Which data are collected? e.g., IP address, phone number, credit card information, and so on

◦

A major distinction can be made between Personally identifiable information (information that relates to an identified or identifiable living individual) and anonymous data. Further distinction can be made between various types of personal data.

◦

Data minimization is part of collection: Collect as little data as possible; only data that are needed for provision of the service.

•

= Must the data subject provide consent for collection and processing of their data and to what extent is the data subject able to opt-out of data collection or processing?

◦

The core element of control is a self-determined decision on what to share and/or for which purpose and is the user able to actively influence how the service provider handles their personal data?

◦

Control includes obtaining informed consent as well as the ability to request a copy of the data and is directly related to the user-friendliness of privacy settings.

•

= Are there mechanisms for preventing and fixing incorrect data? e.g., data request forms, ability to edit collected data, and so on.

◦

Correctness has to do with the ability of the service provider and/or is the user able to fix incorrect data after the data were collected?

◦

Correctness goes a step further than control: If data are already disclosed, is the user able to correct data about him or herself that is not (or no longer) valid?

•

= What is the attitude of the service provider toward requests from law enforcement? e.g., disclosure upon request, disclosure only with a warrant, disclosure only after court order, and so on

◦

Disclosure is about how the service provider reacts to requests from government institutions and concerns the jurisdiction of where data are stored or processed, e.g., data leaving the EU.

•

= Is the user forced to choose between functionality and privacy? e.g., application does not run without accepting all permissions, only real names allowed, credit card details required for free trial, and so on

◦

Functionality is about whether the service provider artificially restricts the service or parts of the service unless personal data are provided.

•

= What is the collected data used for? e.g., provision of the service, advertising, profiling

◦

Purpose includes the legal basis for processing (e.g., data collected because of legal requirements or for vital/public interest).

•

= Are personally identifiable markers replaced by artificial identifiers, or pseudonyms, such that data can only be traced back to individual users with the help of additional information? e.g., names replaced by numbers, house number removed from address, birthday replaced by birth year, and so on

•

= How long is the collected data stored?

•

= Can data subjects request that all personal data be removed?

◦

Implementation can vary between hiding personal data and completely removing personal data.

•

= Are any of the data sold to third parties?

◦

Sale has to do with obtaining commercial gains by sharing user data with other organizations.

•

= What technical measures are taken to ensure that data are protected from unauthorized or malicious access?

•

= Does any of the collected data leave the ownership of the service provider? e.g., other companies, advertisers, research institutions, and so on

◦

Sharing is sometimes referred to as disclosure - and includes both voluntary and unintentional disclosure of data.

◦

Sharing refers to data shared without monetary compensation.

•

= Is the user able to obtain information with regards to how their personal data are handled? e.g., open-source code, availability of privacy policy, regular audits, and so on

◦

Transparency includes clarification before giving informed consent or, in other words, proactive distribution of information to the user.

◦

Transparency is about whether the service provider can adequately demonstrate the implementation of all the other privacy attributes on this list to data subjects and regulators.

Table 1 shows which attributes were covered by privacy visualization or PbD guidelines. Notably, most privacy visualizations and PbD guidelines cover issues regarding collection and purpose. However, data sharing is only covered by half of the PbD guidelines. Furthermore, while all PbD guidelines make statements about security and transparency requirements, only half of the privacy visualizations we reviewed communicate these aspects to users. Accountability and correctness are also mentioned frequently in PbD principles but were rarely covered by privacy visualizations. Functionality was only found in Cavoukian’s PbD guidelines and CLEVER\(^\circ\)FRANKE’s privacy label, and sale of data is only covered by two privacy visualizations and zero Pbd guidelines. The similarities and differences are discussed in detail in Section 5. For a detailed overview of how each proposal defines each of the attribute, we refer the reader to our Supplementary Material.

Table 1.

4.4 Perceived Importance of Privacy Attributes

As part of the online survey described in Section 3.3, 385 users and 100 privacy experts ranked the importance of the privacy attributes as described in Section 4.3. Figure 14 shows the mean importance of each attribute for the users and the privacy experts as well as the 95% confidence intervals. Notably:

Fig. 14.

•

Both users and privacy experts in our study agree that collection, sharing, and sale are the most important privacy attributes.

•

Privacy experts assign up to about 10% more importance than users to most attributes. However, the same experts assign anonymization and the right to be forgotten with up to 10% less importance than users.

•

The mean scores of users and privacy experts differed most for retention (+1.031), t(473) = 3.547, p = 0.000, purpose (+0.934), t(223) = 5.049, p = 0.000, and sale (+0.822), t(221) = 4.538, p = 0.000.

In our sample, 59% of privacy experts and 49% of users indicated that they would rate the attributes differently for different types of services. This is in line with similar findings indicating that privacy concerns are dependent on the context and the type of service [89, 99, 100, 113, 159].

Previous research suggests that privacy concerns are influenced by demographic factors [23, 165]. To investigate whether men and women felt differently about their privacy, we ran an independent sample \(t\)-test for all 15 privacy attributes. No significant differences were found, which is consistent with the results of a recent meta-study [142]. Since age is often found to be associated with privacy expectations [23], we ran an ANCOVA to control for the age of the respondents in assessing the differences between the mean scores of the users and the privacy experts. The only significant difference we found was for the right to be forgotten (\(p=0.005\)), but the adjusted means were almost the same as the unadjusted means. Therefore we conclude that age is not a confounding variable. The vast majority of the respondents were European nationals (41% Dutch, 23% German, 18% British). Since all Europeans fall under the same privacy regime, controlling for nationality was deemed unnecessary.

5 Discussion

Our literature review (Table 1) revealed notable differences between privacy visualizations and PbD guidelines in terms of the privacy attributes they cover. And, on average, PbD guidelines cover more attributes than visualizations (8.6 vs. 5.6 attributes per proposal). This result is not surprising if we consider that privacy visualizations are mostly designed to provide simple, user-friendly information about the handling of personal data [93, 105].

Additionally, experts and users rated some attributes differently in the survey of Section 4.4. And, overall, privacy experts assigned a higher importance to most attributes. This is to be expected, because as privacy officers, they are not only concerned with privacy as users, but also professionally.

Sale and sharing were rated as the most important attributes by most users and privacy experts in our sample. However, while icons related to the sharing of data were included in all but one of the privacy visualizations, only half of the PbD principles dealt with this issue. Studies find that willingness to exchange personal data is strongly mitigated by secondary use [89, 137] so it makes sense that almost all of the privacy visualizations we reviewed describe data sharing, since they are aimed at users. Sale, the attribute consistently ranked as most important in our survey, is covered by just two privacy visualizations and zero PbD guidelines. This is evidence of a growing discrepancy: While the sale of personal data remains an intrinsic part of the business model for online service providers [87, 125] it is one of the major concerns of users [81].

Collection and purpose are arguably the most fundamental privacy attributes, because they describe which data are to be collected and why. The privacy experts we surveyed consider both collection and purpose to be of very high importance (closely following sale and sharing). Other studies confirm this observation [3]. Users in our sample, however, rate purpose as less important. We speculate this is because users consider certain types of data as sensitive regardless of purpose [20]. Nevertheless, collection and purpose were the most frequently occurring attributes in both privacy visualizations and PbD guidelines. Therefore, they appear to be the most important attributes to consider when discussing online privacy.

Transparency was mentioned in all PbD guidelines but only half of the privacy visualizations, even though users value insights in data handling practices [144]. We speculate this is because privacy visualizations are themselves a tool for transparency. Nevertheless, complete transparency can only be achieved by having access to the source code or raw data streams [84].

Security of personal information is mentioned by all of the 14 PbD guidelines we reviewed, but less than half of the visualizations, mostly those published after 2012. In our survey, privacy experts ranked security as the fifth most important attribute (users ranked it as sixth). This suggests that the security of personal information is considered critical for developing privacy-aware online services, but is also of increasing concern to users.

Accountability is also mentioned more often in proposals for PbD guidelines than for visualizations (almost 80% vs. 23%). This is not surprising, since accountability increases the magnitude of potential losses for the service provider in case of data breaches and PbD guidelines are aimed at developers. Nevertheless, accountability was ranked as the seventh most important attribute by users in our sample.

Retention is ranked significantly higher by privacy experts compared to users and also covered by most PbD guidelines and privacy visualizations. The right to be forgotten, however, was perceived as more important by users and is rarely mentioned in the privacy visualizations or PbD guidelines we reviewed. The right to be forgotten and retention both relate to the ability of an organization to delete privacy sensitive data. Retention has always been a technical consideration, but the right to be forgotten is a relatively new, user-driven initiative. This is supported by the fact that in our literature review, we only found one mention of it before 2011. However, managing legacy data sources in a GDPR-compliant manner is a major challenge [112]. Knowing how hard it is to completely remove data from all sources might cause privacy experts to rate the importance of the right to be forgotten lower than retention. However, users are likely more interested in the benefits such a right would provide rather than the technical constraints.

Anonymization was ranked as the fifth most important attribute by users and the eighth most important attribute by privacy experts, but received surprisingly little attention in the literature reviewed here. Anonymization is technically challenging [95] and privacy experts know this. Because true anonymization is seldom achievable [108, 158], various degrees of pseudonymity are implemented instead. Although the information security practitioners in our initial focus group felt that pseudonymization should be differentiated from anonymization, several privacy experts in our survey indicated that the two attributes are difficult to distinguish. We speculate users are also not familiar with this distinction and ranked pseuodnymization as less important, because it implies less protection. Nevertheless, taking steps to remove personal identifiers from user data is of interest to users, which also implies this should be given more careful consideration by developers. However, from a practical perspective, pseudonymization can be viewed as partial or imperfect anonymization.

Control and correctness were ranked relatively low by both users and privacy experts but were often encountered in PbD guidelines. Furthermore, correctness was represented in 30% of visualizations. Online services increasingly gather and aggregate user data to glean insights into habits, trends or behaviors not directly related to the actual exchange of the product or service [9, 87, 152], but privacy controls are widely perceived as overly complex by users [25, 119]. The resulting difficulty in managing personal data results in privacy fatigue: a sense of not being in control of the collection and sharing of data online [40, 73]. This weakens the perceived utility and therefore importance of privacy settings and controls. Nevertheless, such mechanisms enhance privacy both proactively (preventing unauthorized collection or collection of incorrect data) and reactively (consent withdrawal and correction of previously collected data). Therefore, providing control over data collection and maintaining correctness of user data is an inherent part of online privacy [87].

5.1 Trends

Although we reviewed PbD guidelines published or updated after 2001 (see Figure 15), our initial search returned many older PbD guidelines. The FTC Fair Information Practice was the first set of PbD principles, forming the foundation for many of the newer principles and legislation. In 1990, the United Nations (UN) published similar guidelines and in 1995, the European Union (EU) introduced its first Data Protection Directive. Throughout the first decades of the 21st century, the publication rate of PbD guidelines slowly increased and after 2009 we saw an increase in domain- or technology-specific PbD guidelines. Since most of the PbD guidelines we found are either regulation or industry standards, we conclude that PbD by design has made its way into practice.

Fig. 15.

However, all of the privacy visualizations we found were published after 2007, with the majority being published by academics after 2012. This coincides with an increase in privacy awareness. Although the need for communicating online privacy is not a new discussion [93], research into empowering users to make informed disclosure decisions has recently started to gather steam [55, 74, 105, 122]. We are starting to see industry initiatives as well. However, despite the fact that both the European GDPR and the U.S. FTC recommend standardized privacy labels, no official standard has yet been defined.

Disclosure, correctness, accountability, and the right to be forgotten are increasingly common in recent privacy visualizations. This trend likely reflects increasing concerns regarding safe harbor [44] and data breaches [54]. Even though correctness and accountability are covered by many PbD guidelines, disclosure is not covered by recent initiatives such as PIPEDA, the Privacy Company, and privacylabel.org.

Sale, the right to be forgotten, anonymization and accountability were rated as very important by our sample of users. However, accountability and anonymization are missing from most privacy visualizations while sale and the right to be forgotten are missing from PbD guidelines as well. But sale of personal data is of increasing concern to users, EU law mandates the right to be forgotten, anonymization is becoming an industry standard, and service providers have been receiving record fines for privacy infringements. These developments lead us to believe that, while current approaches to communicating and implementing privacy do not yet take the needs and preferences of users into account, this situation will (hopefully) change in the future.

5.2 Limitations

Because the entry point of the literature search was Scopus, it is possible that not all relevant proposals from industry were considered. We mitigated this by performing auxiliary Google and Web of Science searches. Furthermore, even though we ran several searches using nine synonyms for principles and 14 synonyms for visualizations, important keywords may have been missed. We do believe, however, that our literature sample of 27 proposals is sufficient to reach saturation in terms of privacy attributes. This is supported by the fact that each privacy attribute was encountered in at least two documents and that over 93% of privacy experts and users we surveyed indicated the unified list was complete and unambiguous.

Some of the documents selected for our systematic review were ambiguous and many differed in terms of granularity and scope. Therefore, multiple attributes were sometimes attached to the same principle or visualization and multiple principles or visualizations sometimes corresponded to a single privacy attribute. Nevertheless, after three rounds of coding we reached almost perfect agreement between coders. This indicates an inherent overlap between the attributes that is to be expected, because they are inter-dependent and refer to the same overarching concept. Nevertheless, while the attributes on our list could be grouped, broken up, or renamed for practical applications, the list itself is complete and understandable.

In our online survey, the expert sample was smaller than the user sample. This is because privacy experts are a specialized group and a larger sample was hard to obtain. A disproportionate number of the respondents were young and have attended higher education. However, age and gender were not found to be confounding variables. Finally, while the list of attributes is international, almost all respondents were European, which makes our ranking European.

The results might be influenced by response bias. However, the topic of our questionnaire is not socially sensitive and therefore the risk of giving socially desirable answers is small. Furthermore, by screening the raw data rigorously and removing superficial and incomplete responses, we are confident that we have managed to keep any potential response bias to a minimum.

Last, differences between the perceived importance of most attributes were small and many respondents indicated that their rating depends on the type of application and data. We mitigated this by also considering the occurrence rate of each attribute in the literature we reviewed.

5.3 Practical Recommendations

5.3.1 Privacy Visualizations Should Be Legally Mandated.

Except for CLEVER\(^\circ\)FRANKE’s, DAPIS and privacylabel.org, which are currently under development, all of the other privacy visualization projects have been abandoned. We speculate that adopting such labels—and more importantly, getting a good score—provides a non-functional benefit to the user but comes at great costs for the provider, as is often the case with safety and security. Indeed, third-party privacy seals are not correlated with trustworthiness [53] and crowd-sourcing efforts such as Terms of Service; Didn’t Read (TOS:DR) have so far been unsuccessful. Providers should therefore supply an understandable summary of their privacy policies themselves [151]. However, since similar endeavors such as the EU energy label, movie ratings, and even seatbelts had to become mandatory before they were adopted, privacy visualizations will only become wide-spread if they are legally mandated.

5.3.2 Privacy Visualizations Should Go Beyond Data Collection and Processing.

We find that most privacy labels align with Nissenbaum [100] and Martin and Shilton [89] in that they primarily communicate what information is collected, how this information is shared and for what purpose. However, our ranking suggests that sale of data must also be made explicit. Furthermore, although most current visualizations do not include an indication of the level of security and accountability, this is important to both privacy experts and users and actually mandated by the GDPR [106]. Trustworthy online data exchange relies on obtaining truly informed consent [86], but this requires providing the end-user with relevant information in an understandable form. This could be achieved by grouping the information across multiple layers [55, 145]. Our ranked list of privacy attributes serves as a basis for a user-centric privacy visualization that covers all important aspects of privacy.

5.3.3 PbD Guidelines Should Be More User-centric.

One of the most striking findings was the fact that the two attributes rated as most important by both privacy experts and users (sale and sharing) were rarely covered by PbD principles. To avoid anxiety, uncertainty, or even fear [100], the gap between privacy concerns and guidelines aimed at addressing them must be reduced. PbD is aimed at taking the privacy concerns of the end-user into consideration during development, and so issues related to data sharing (in particular sale of user data) must be part of PbD guidelines. Ideally, since the lowest average importance rating was six on a 0-to-10 scale, PbD guidelines should cover all of the attributes on our list, with the possible exception of functionality. This is because functionality was ranked as one of the least important attributes and was sometimes marked as confusing by both privacy experts and users.

5.3.4 The Right to Be Forgotten Should Not Be Forgotten.

The right to be forgotten was rarely mentioned in the PbD guidelines we reviewed. In 2014 however, the European Court of Justice ruled that European users can request the removal of personal data from online service providers and the GDPR mandates this as well (despite the fact that the right to be forgotten is not one of the GDPR’s PbD principles). Newman questions whether the right to be forgotten is financially and legally feasible [98]. Still, according to Ausloos [17], the ability to demand the erasure of personal data can and must be available in data processing situations where consent was required and more widely assuming normative, economical, technical, and legislative changes. Even though most PbD guidelines already recommend obtaining consent (i.e., control) and recommend removal of data when it is no longer necessary (i.e., retention), the right to be forgotten goes a step further by giving users the ability to withdraw consent. Therefore, the right to be forgotten (or its diluted form, the “right to erasure” [11]), should be an integral part of future PbD guidelines.

5.4 Research Challenges

5.4.1 Structuring Privacy Policies.

Privacy policies often focus on collection, sale, and sharing of user data, but our survey revealed that the right to be forgotten and security are of increasing concern. Furthermore, regulation increasingly mandates that privacy policies provide information about potential disclosure to (foreign) government entities, accountability in case of breaches, and the ability to correct one’s data. The Unified List of Privacy Attributes of Section 4.3 is based on extensive review and comparison of privacy attributes covered by privacy visualizations and PbD guidelines aimed at online services in general. Therefore, it represents a complete and technology-/domain-independent checklist of aspects related to online privacy. A valuable research direction is to investigate whether such a checklist can be used to verify the completeness of privacy policies [8], to structure (or even automatically restructure [161]) privacy policies, to make automatically generated privacy policies more readable [162], or to automatically analyze privacy policies [12].

5.4.2 Developing a Privacy Rating System.

Similarly to PrivOnto [104], the privacy attributes on our list can be operationalized so that they can be used measure and compare the privacy level of online services on multiple metrics. The privacy attributes could also be used to (semi-)automatically annotate privacy policies [156]. In the long term, the privacy attributes could be used to produce or generate standardized, understandable, machine-readable summaries of privacy policies that enable both providers and users to assess, communicate, and compare the privacy of online services. To explore this direction, we started developing a free online service that implements some of these ideas: www.privacyrating.info. However, usability testing is critical, and the user testing performed on the DCI approach [147] and by Fox et al. [62] serves as a starting point for evaluating current and future proposals.

5.4.3 Investigating Context-dependency of Privacy Attributes.

The Unified List of Privacy attributes in Section 4.3 is a first step toward a standardized list of privacy attributes that can function as the foundation of a privacy visualization. However, the work of Nissenbaum [100] and Martin [88] showed that information privacy is discriminate, embedded in the context, and based on a social contract between the various stakeholders involved in the information exchange. It seems that privacy perception is not universal, but depends on the contextual factors such as the type of data and the disclosure scenario [99, 113, 159]. For instance, the importance of some attributes might differ for an eHealth service compared to an online shopping website. Nevertheless, Solove [135] suggests a certain congruity between situations of personal data disclosure online.

We excluded mobile- and IoT-specific papers from our survey, because the domains are subject to different privacy concerns, and constrained in terms of privacy communication. Mobile apps are among the most privacy intrusive means of interacting with an online service [12] and privacy policies are almost always incorrect, incomplete, imprecise, inconsistent and/or privacy-unfriendly [160]. Even health apps are often not GDPR compliant [58].

The unified list of general attributes in Section 4.3 can be used as a reference and starting point for domain-, technology-, or target-group-specific guidelines or visualizations. However, the extent to which privacy is context dependent remains an open problem. Are specialized privacy labels needed or is a universal privacy visualization effective? How specific should PbD guidelines be?

6 Conclusions

We performed a systematic review of current approaches to communicating privacy issues to users (privacy visualizations) and to developers (PbD guidelines). It revealed significant gaps in terms of the aspects of data processing these approaches cover. To understand these differences, we distilled a Unified List of Privacy Attributes and ranked it based on perceived importance by European privacy experts and users.

Our study revealed that some attributes are considered important by both privacy experts and users: what type of personal data is collected, with whom it is shared with, and whether or not it is sold. The PbD guidelines we reviewed also emphasize collection, but mention purpose more often than sharing or sale. Furthermore, PbD guidelines often focus on ensuring information security and transparency while providing users with privacy controls. Privacy visualizations take a user-centric perspective, focusing on collection, purpose, and sharing. Overall, we see an increase in publications pertaining to PbD and privacy visualizations. The right to be forgotten and accountability of service providers are increasingly mentioned in both regulations and guidelines. Both were found to be important in our survey. Disclosure to law enforcement, retention periods, and correctness of data are also mentioned increasingly often in publications covering online privacy, although these were ranked as relatively unimportant by our sample of privacy experts and users. Pseudonymization, anonymization, and the tradeoff between functionality and privacy are mentioned in a minority of the literature we reviewed and were perceived to be relatively unimportant by the users and privacy experts we surveyed.

The results serve as (1) a ranked list of privacy best practices for developers and providers of online services, (2) a foundation to visually communicate the most relevant aspects of a privacy policy to users, and (3) a taxonomy for structuring, comparing, and, in the future, rating privacy policies of online services.

Footnotes

The full list of icons is available under a CC-BY license from: https://netzpolitik.org/wp-upload/data-privacy-icons-v01.pdf.

http://knowprivacy.org.

https://wiki.mozilla.org/Privacy_Icons.

⁴

http://primelife.ercim.eu.

Supplementary Material

csur-2020-0641-File003 (csur-2020-0641-file003.zip)

Supplementary material

Download
6.71 MB

References

[1]

Technical Committee ISO/IEC JTC 1/SC 27. 2011. ISO/IEC 29100:2011 Information Technology – Security Techniques – Privacy Framework. Standard ISO/IEC 29100:2011(E). International Organization for Standardization and International Electrotechnical Commission, Geneva, Switzerland.