Private information retrieval for topics #142

menonsamir · 2023-02-18T00:05:15Z

Hello! Newbie to this group, excited to contribute. Let me know if I am doing anything wrong.

One central tension in deciding on the number of topics is user privacy vs. relevance. The more topics, the more content can be personalized / targeted, but simultaneously, the more a topic could itself be sensitive or deanonymizing.

What if we fetched content privately, using private information retrieval? We could have the browser itself perform private retrievals for content, and be sure that the individual topics never leave the browser unencrypted. This could increase privacy and relevance simultaneously.

The main concern is computation/communication overhead, though recent work makes it seem very practical. I think we could make this kind of 'progressive', in that in addition to the existing browsingTopics() call, we could also have something like sendDetailedBrowsingTopicsPrivately(<url>) (it's a bad name, I know). This would POST an encrypted PIR query for some more fine-grained topics to the supplied URL. Personalization code could choose to use one or the other, depending on network conditions etc.

No information about this more fine-grained topic would be readable by the server, since the query stays encrypted. This would make it safer to use finer-grained topics, and increase the relevance of personalized content.

Is this an exciting idea to folks? I am happy to write up some more detailed thoughts, or even make a demo doing PIR with the existing experimental Topics API.

The text was updated successfully, but these errors were encountered:

michaelkleber · 2023-03-01T16:00:39Z

Hi @menonsamir: Techniques like PIR could help with one part of the problem, but as soon as an ad selected using Topics was rendered in the browser, it could immediately reveal what topic was used for the selection. That means that PIR, like any other privacy-protective ad selection mechanism, would need to go along with other mechanisms like Fenced Frame rendering and aggregate-only reporting. In other words, this would bring in all the complexity of the FLEDGE ad serving proposal.

Topics was the Privacy Sandbox attempt to offer an ad targeting signal that's much more lightweight. Adding Topics-like signals to FLEDGE is certainly possible, but might not even need special browser support in that ad techs can accomplish many of the same goals on their own.

menonsamir · 2023-03-02T01:55:12Z

Hmm, that's true. Rendering certainly poses an issue. I also agree that it would be good to not 'feature creep' the proposal and keep things lightweight.

I do think PIR is relatively practical for images and even video, so one idea would be to perform PIR directly over the rendered content of the ad. This has the added benefit of not injecting arbitrary JavaScript into pages (that then needs to be sandboxed, etc). Should I make a demo of something like this working? If it would be convincing to folks, I'm happy to.

As you point out, the other issue is reporting/attribution. In addition to some already proposed private attribution systems (https://eprint.iacr.org/2022/1174.pdf), at a basic level, we could restrict support to only 'pay-per-click' models of attribution for these "more private" Topics-based ads. Here, the idea would be that users who are just shown an ad have intuitively not really consented to being tracked, or even (maybe) having their interests broadcast to advertisers, but once you actually click an ad, it's reasonable for the ad to learn this. Does that make sense?

tfatfa11 · 2023-03-02T02:18:37Z

منن

michaelkleber · 2023-03-02T02:31:23Z

Our investigations of PIR for ad delivery have generally run into the problem that the universe of possible advertisements is (1) extremely large and (2) changes very frequently. The usual PIR response to (1) is lots of pre-computation, which is largely thwarted by (2).

Specifically in the case of Topics, it might be possible to do better since the size of the taxonomy limits the PIR universe. But that approach, at least implemented naively, would involve the party doing ad selection running hundreds of ad auctions, to pick the best ad for each possible user topic, which is also computationally prohibitive. Demos of solutions to part of this problem are of course welcome, but there is an awful lot of context that ruins simplifying assumptions.

And unfortunately, while some kinds of advertisers really do pay per click, publishers on the web almost always receive money per impression — ad networks use ML-predicted click-through rates to balance out CPC buyers with CPM sellers. So I don't think migrating the web to an attribution system that is blind to non-click outcomes is viable.

Welcome to my world of difficult trade-offs!

menonsamir · 2023-03-02T04:21:42Z

Thanks for the context on this here, it's definitely interesting to see what real-world challenges stuff like PIR runs into. Also, nice to hear you guys have done some investigation of PIR for this stuff already.

Re: the database changing, yes, this is definitely an issue if you use schemes that assume the database does not change frequently (SimplePIR/DoublePIR, FrodoPIR). There are other, somewhat more costly schemes (Spiral) with a different tradeoff, which shift bandwidth costs from 'per database change' to 'per user'. Not a perfect solution, but quite workable for this application I think.

Re: the need to run many auctions, yeah, I can see how that doesn't fit well with how things work currently. It's cool that Topics slots quite nicely into this existing context - you can just run a bidding process for the topics that the browser gives you. For PIR, yeah, we would need ad networks to deliver larger 'sets' of ads during bidding, for all possible topics. This isn't the craziest change though, since in theory, the (sole?) input to an ad network's decision on what to bid should be the topic, right?

Re: attribution, in my mind, this seems like the major sticking point. The publisher needs to find out (in aggregate) which ads are getting shown, so that it can get paid on a CPM basis, but the client does not want to directly reveal this. I'm gonna mull this over a bit...

I know on mobile, the Apple solution is SKAdNetwork, which basically uses Apple servers as a trusted source of decorrelation and delay, hiding the IPs and timing of the attribution events. We are not so lucky in the browser context.

I do not envy trying to balance all of these tradeoffs 🙃

jkarlin · 2023-06-22T16:47:17Z

A PIR approach to Topics would look and behave quite different (e.g., leveraging fenced frames), and would be a separate API. As such, I'm closing this issue.

jkarlin closed this as completed Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Private information retrieval for topics #142

Private information retrieval for topics #142

menonsamir commented Feb 18, 2023

michaelkleber commented Mar 1, 2023

menonsamir commented Mar 2, 2023

tfatfa11 commented Mar 2, 2023

michaelkleber commented Mar 2, 2023

menonsamir commented Mar 2, 2023

jkarlin commented Jun 22, 2023

Private information retrieval for topics #142

Private information retrieval for topics #142

Comments

menonsamir commented Feb 18, 2023

michaelkleber commented Mar 1, 2023

menonsamir commented Mar 2, 2023

tfatfa11 commented Mar 2, 2023

michaelkleber commented Mar 2, 2023

menonsamir commented Mar 2, 2023

jkarlin commented Jun 22, 2023