In the following, where not expressly otherwise stated, we set the denoising threshold
\(f=2\) and consider the Google suggested values for the Topic API parameters (
\(z=5,\ E=3,\ p=0.05, \Delta T=1\) week). We considered a population of
\(|U|=1,000\) personas. We repeat each experiment 10 times and report the average performance. As introduced in Section
3.1, we consider two websites
\(w_1\) and
\(w_2\) aiming at re-identifying a user based on the topics that each website has observed. As a reference metric, we consider the ratio of users that each attack correctly matches between two websites and define it as
Prob(re-identification). Similarly, we define
Prob(incorrect re-identification) as the ratio of incorrect matches.
6.1 Comparison of Attack Models
We first compare the performance of the three attacks presented in Section
3, and show the results in Figure
3, where the
\(x\)-axis represents different epochs and the
\(y\)-axis the reidentification probability
Prob(re-identification). As expected, increasing the number of epochs, all attacks become more effective and the
Prob(re-identification) increases. Overall, the
Loose Attack (blue line) shows to have up to
\(4\times\) better performance with respect to the
Strict Attack (red line), reaching around 25% in
Prob(re-identification) after
\(N=30\) epochs and almost 28% after
\(N=40\) epochs, for I.I.D. personas (Figure
3(a)) and almost 38% for Crossover personas (Figure
3(b)). With Crossover personas
Prob(re-identification) moderately improves, as personas are, by construction, more heterogeneous. As mentioned in Section
3, the
Loose Attack has a larger flexibility than the
Strict Attack, allowing the attacker to account for behaviours that can affect the performance of the
Strict Attack: for example, a topic overcoming the denoising threshold on website
\(w_1\), while not being able to do so in
\(w_2\). The other attack achieves worse performances, below 10% (20% with Crossover personas). The likelihood-based AWHA proposed by [
4] overcomes both the
Strict Attack and the
Loose Attack, exceeding 40% (50%) with I.I.D. (Crossover) personas. However, we show thereafter how the higher
Prob(re-identification) comes with the cost of a large probability of error. The fraction of users incorrectly re-identified could substantially reduce the attack effectiveness, and that we discuss in the following.
In fact, an incorrect re-identification may happen: the attack provides a match for the target user which is incorrect. We show the probability of this event (i.e., the
Prob(incorrect re-identification)) for the
Strict Attack and the
Loose Attack in Figure
4. Since the AWHA attack always matches a user’s profile with the most likely profile on the other website, the rate of users incorrectly matched is complementary to the number of users correctly matched by design. This does not happen in the
Strict Attack and
Loose Attack, where the attack matches no profile if the conditions are not met. As such, the fraction of incorrectly matched users for AWHA largely outnumbers the ones for the other attacks. To reduce incorrect re-identifications, one could set a threshold
\(D_{max}\) to reject the identification if the distance is higher than
\(D_{max}\), introducing a
no-match option in the AWHA.
In Figure
5, we show how the
Prob(re-identification) and
Prob(incorrect re-identification) vary for the AWHA attack when introducing the threshold
\(D_{max}\). We consider
\(N\)=30 epochs. The algorithm finds no match if
\(D_{max}\lt 63\). For higher values, both
Prob(re-identification) and
Prob(incorrect re-identification) start growing. Yet,
Prob(incorrect re-identification) grows quicker than
Prob(re-identification). This is because there are a lot more possible users that could generate a false-but-closer sequence than the correct-but-looser sequence the victim generates. In fact, by construction, the AWHA algorithm returns the users with the closest distance, that is incorrect with higher probability (i.e.,
Prob(incorrect re-identification) \(\gt\) Prob(re-identification), as shown in Figures
3 and
4).
Thus the benefits introduced by the threshold mechanism are limited and an attacker using a threshold to reduce the amount of FPs will end up reducing the TP to a larger extent.
The Strict Attack and Loose Attack are more suitable solutions for an attacker to the Topics API: using them, the attacker minimizes the probability of an incorrect match. Conversely, AWHA does not offer such a benefit. Incorrect re-identification could, for instance, push an attacker to define a personalized marketing strategy under the false assumption that a target user has visited two colluding websites. If the rate of incorrect re-identifications exceeds 50%, it means that every decision of the attacker will be incorrect 50% of the times, possibly with severe financial costs. To minimize such cost, the attacker may thus be interested in an attack that limits the amount of incorrect re-identification, although with fewer correct re-identifications.
Both the Strict Attack and the Loose Attack show an increase in the Prob(incorrect re-identification) in the first epochs, peaking between \(N=5\) and \(N=15\). In this phase, users’ profiles are still very similar one to the other, causing more users to be incorrectly matched. Increasing the epochs, the attacker builds a richer (and thus more unique) profile and improves the re-identification chances: after \(N=30\) epochs, with I.I.D. personas, the error rate is around 4%, while the Prob(re-identification) increases above 20%. For the Strict Attack, the Prob(incorrect re-identification) never exceeds 2%, converging toward 0% with Crossover personas and 1% with I.I.D. personas. This confirms that the Strict Attack is more conservative than Loose Attack in providing a match, but those matches are more accurate. In summary, with enough time, the Strict Attack and especially the Loose Attack are efficient enough to provide an interesting option for an attacker. On the other side, recall that the AWHA outputs too many false matches for an attack to be valuable.
At last, observe that it is not possible to use the classical metrics for classification tasks, such as F1-Score, precision or recall—nor True/FP Rate—because the Strict Attack and Loose Attack are not binary classifiers, that is, \({\it Prob(re-identification)} +{\it Prob(incorrect re-identification)} \lt 1\). This happens because a no-match option exists, that is, the one of a user not being matched with any other user.
In the remainder of this Section and in Section
7, we only consider the
Loose Attack, as it provides the best trade-off between
Prob(re-identification) and
Prob(incorrect re-identification), compared with the other two attacks. We keep comparing the results with both I.I.D. and Crossover personas.
Takeaway: . Under the current threat model, the Topics API still leave a considerable percentage of users at risk of being re-identified. Google’s AWHA returns the highest probability of correctly re-identifying the user, but, being so aggressive, it comes with a large portion of incorrect re-identification. From an attacker’s perspective, the LooseAttack results the best.
6.2 Impact of the Denoising Filter
In this section, we discuss the impact of the attacker choice for the denoising threshold
\(f\). We expect that imposing no threshold (i.e.,
\(f=1\)) leads to almost null performance, and, to maximize effectiveness, the attacker should set
\(f\) to 2 or 3. They could even consider combining the results obtained by using both thresholds. Notice that
\(f\) should increase with epochs
\(N\) as the attacker has a higher probability of observing multiple times the same random or rare topics. This was already evident in Figure
3(b): The
Prob(re-identification) for the
Loose Attack Attack flattens when
\(N\) exceeds 30. This is in great part caused by setting
\(f=2\), which becomes less effective the more epochs the attacker observes topics exposed by users.
To better understand the impact of
\(f\), we show in Figure
6 how
Prob(re-identification) evolves with
\(f=2\) and 3. Later, we also propose a couple of compound strategies. For the sake of readability, we omit to represent the case with
\(f=1\): in fact, the
Prob(re-identification) never exceeds 3% for both population models demonstrating that a filtering strategy is necessary to achieve attack effectiveness. Let us first focus on the curves representing the
Prob(re-identification) with
\(f=2\) and
\(f=3\). Using
\(f=2\) (red line), the attacker re-identifies users earlier, because, in a few epochs, new topics populate
\(\mathcal {R}\). However, when the number of epochs increases, the attack becomes less effective, allowing a number of random and rare topics to pollute
\(\mathcal {R}\). Indeed, those topics make the reconstructed profile of a given user different on the two websites, thus impeding re-identification. At that point, the attacker shall increase the threshold to
\(f=3\), which can better cope with the larger magnitude of noise introduced by rare and random topics. When
\(f=3\) (blue curve), the attack is less effective in the first epochs—since too few topics exceed the threshold resulting in an (almost) empty profile
\(\mathcal {R}\). Conversely, it performs better when the number of epochs becomes sufficiently large. Setting
\(f=3\) outperforms
\(f=2\) when
\(N\gt 24\) and
\(N\gt 36\) for I.I.D. and Crossover personas, respectively.
6.2.1 Combining Strategies.
Now, we consider two additional strategies that combine the sets of the users re-identified with threshold \(f=2\) and \(f=3\) to make a final decision. In the first strategy, the attacker considers a user to be re-identified if the user appears in both sets; this represents a conservative approach. In the second strategy, the attacker considers a user re-identified if they appear in at least one of the sets; this represents a daring approach.
It is important to clear a possible misunderstanding at once: one could consider, for instance, that the set of users re-identified with
\(f=2\) and \(f=3\) is the same as the set of users re-identified with
\(f=3\), thinking that if a user is re-identified with
\(f=2\), than they will be re-identified also with the stricter threshold
\(f=3\). However, due to the filtering threshold a user’s profile can be unique
13 with
\(f=2\) but not with
\(f=3\), causing the user to be re-identified in one case but not the other.
These two filtering strategies work as lower and upper bounds when tuning the trade-off between the fraction of re-identified users and the error rate. With the first approach (green curve, labelled as “
\(f=2\) AND
\(f=3\)” in Figure
6),
Prob(re-identification) is always below the
\(f=2\) and
\(f=3\) cases. Conversely, with the second approach (purple curve, labelled as “
\(f=2\) OR
\(f=3\)’),
Prob(re-identification) is always higher. Different is the picture for the error rate—the
Prob(incorrect re-identification)—depicted in Figure
7. The cautious attacker that uses the AND approach obtains a negligible
Prob(incorrect re-identification), thus maximizing the high correct/incorrect match ratio. An attacker willing to maximize the
Prob(re-identification) would instead opt for the OR approach, which, however, leads to a sizeable
Prob(incorrect re-identification). Also in terms of
Prob(incorrect re-identification), the two classical attacks with
\(f=2\) and
\(f=3\) stand in the middle as expected.
In the analysis, we limit the study to \(f\le 3\). The benefits of \(f\gt 3\) would appear for very large observation windows (i.e., for \(N\gt 40\)) which makes the analysis not interesting.
Takeaway: . Different threshold \(f\) values impact the efficiency of the attack. Moreover, combinations of threshold can offer lower and upper bounds to the Prob(re-identification) and Prob(incorrect re-identification). Since an attacker cannot know the underlying topic-visiting rate distribution, they cannot know in advance the optimal \(f\) value to use. Such bounds can thus allow to understand the expected efficiency range of the attack.
6.3 Impact of the Number of Users
We now fix
\(f=2\),
\(N=30\) and vary the number of users
\(|U|\). Intuitively, the number of users in the set of candidates has an impact on the probability of a user being re-identified. The larger the website’s audience, the harder the re-identification is. We illustrate this effect in Figure
8, where we show how re-identification probability varies when increasing the number of users in the audience of
\(w_1\) and
\(w_2\)—notice the log scale on the
x-axis. In a larger pool of users, there is a higher probability of finding another user exposing a similar combination of topics. This makes the user identical to more than one individual in the eyes of the attacker, thus preventing re-identification. Recall that
Strict Attack and
Loose Attack do not make any guess if a user does not have a unique Denoised Reconstructed Profile. Notice, however, that the decrease of the
Prob(correct re-identification) slows down with a larger number of users
\(|U|\) both with I.I.D. and Crossover personas, following a logarithmic decrease: even with a pool of
\(10^5\) users, the
Prob(re-identification) is not negligible. Moreover, also consider that other techniques (such as browser fingerprinting) could be used by an attacker to reduce the set of possible re-identification candidates. This could enhance the attack even on websites with a large audience, where it would be otherwise easier for the user to
hide in the crowd. Fingerprinting techniques could help the attacker partition a large number of users into smaller sub-populations, each of which could be the target of an attack independent from the others; the reduction of the population dimension would improve the attack chances, as the total virtual number of users would decrease.
Takeaway: . A large website popularity allows the user to “hide in the crowd”. However, techniques exist to partition and reduce the size of the victim audiences, increasing the attacker’s re-identification probabilities.