Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Private information retrieval for topics #142

Closed
menonsamir opened this issue Feb 18, 2023 · 6 comments
Closed

Private information retrieval for topics #142

menonsamir opened this issue Feb 18, 2023 · 6 comments

Comments

@menonsamir
Copy link

Hello! Newbie to this group, excited to contribute. Let me know if I am doing anything wrong.

One central tension in deciding on the number of topics is user privacy vs. relevance. The more topics, the more content can be personalized / targeted, but simultaneously, the more a topic could itself be sensitive or deanonymizing.

What if we fetched content privately, using private information retrieval? We could have the browser itself perform private retrievals for content, and be sure that the individual topics never leave the browser unencrypted. This could increase privacy and relevance simultaneously.

The main concern is computation/communication overhead, though recent work makes it seem very practical. I think we could make this kind of 'progressive', in that in addition to the existing browsingTopics() call, we could also have something like sendDetailedBrowsingTopicsPrivately(<url>) (it's a bad name, I know). This would POST an encrypted PIR query for some more fine-grained topics to the supplied URL. Personalization code could choose to use one or the other, depending on network conditions etc.

No information about this more fine-grained topic would be readable by the server, since the query stays encrypted. This would make it safer to use finer-grained topics, and increase the relevance of personalized content.

Is this an exciting idea to folks? I am happy to write up some more detailed thoughts, or even make a demo doing PIR with the existing experimental Topics API.

@michaelkleber
Copy link
Collaborator

Hi @menonsamir: Techniques like PIR could help with one part of the problem, but as soon as an ad selected using Topics was rendered in the browser, it could immediately reveal what topic was used for the selection. That means that PIR, like any other privacy-protective ad selection mechanism, would need to go along with other mechanisms like Fenced Frame rendering and aggregate-only reporting. In other words, this would bring in all the complexity of the FLEDGE ad serving proposal.

Topics was the Privacy Sandbox attempt to offer an ad targeting signal that's much more lightweight. Adding Topics-like signals to FLEDGE is certainly possible, but might not even need special browser support in that ad techs can accomplish many of the same goals on their own.

@menonsamir
Copy link
Author

Hmm, that's true. Rendering certainly poses an issue. I also agree that it would be good to not 'feature creep' the proposal and keep things lightweight.

I do think PIR is relatively practical for images and even video, so one idea would be to perform PIR directly over the rendered content of the ad. This has the added benefit of not injecting arbitrary JavaScript into pages (that then needs to be sandboxed, etc). Should I make a demo of something like this working? If it would be convincing to folks, I'm happy to.

As you point out, the other issue is reporting/attribution. In addition to some already proposed private attribution systems (https://eprint.iacr.org/2022/1174.pdf), at a basic level, we could restrict support to only 'pay-per-click' models of attribution for these "more private" Topics-based ads. Here, the idea would be that users who are just shown an ad have intuitively not really consented to being tracked, or even (maybe) having their interests broadcast to advertisers, but once you actually click an ad, it's reasonable for the ad to learn this. Does that make sense?

@tfatfa11
Copy link

tfatfa11 commented Mar 2, 2023

منن

@michaelkleber
Copy link
Collaborator

Our investigations of PIR for ad delivery have generally run into the problem that the universe of possible advertisements is (1) extremely large and (2) changes very frequently. The usual PIR response to (1) is lots of pre-computation, which is largely thwarted by (2).

Specifically in the case of Topics, it might be possible to do better since the size of the taxonomy limits the PIR universe. But that approach, at least implemented naively, would involve the party doing ad selection running hundreds of ad auctions, to pick the best ad for each possible user topic, which is also computationally prohibitive. Demos of solutions to part of this problem are of course welcome, but there is an awful lot of context that ruins simplifying assumptions.

And unfortunately, while some kinds of advertisers really do pay per click, publishers on the web almost always receive money per impression — ad networks use ML-predicted click-through rates to balance out CPC buyers with CPM sellers. So I don't think migrating the web to an attribution system that is blind to non-click outcomes is viable.

Welcome to my world of difficult trade-offs!

@menonsamir
Copy link
Author

Thanks for the context on this here, it's definitely interesting to see what real-world challenges stuff like PIR runs into. Also, nice to hear you guys have done some investigation of PIR for this stuff already.

Re: the database changing, yes, this is definitely an issue if you use schemes that assume the database does not change frequently (SimplePIR/DoublePIR, FrodoPIR). There are other, somewhat more costly schemes (Spiral) with a different tradeoff, which shift bandwidth costs from 'per database change' to 'per user'. Not a perfect solution, but quite workable for this application I think.

Re: the need to run many auctions, yeah, I can see how that doesn't fit well with how things work currently. It's cool that Topics slots quite nicely into this existing context - you can just run a bidding process for the topics that the browser gives you. For PIR, yeah, we would need ad networks to deliver larger 'sets' of ads during bidding, for all possible topics. This isn't the craziest change though, since in theory, the (sole?) input to an ad network's decision on what to bid should be the topic, right?

Re: attribution, in my mind, this seems like the major sticking point. The publisher needs to find out (in aggregate) which ads are getting shown, so that it can get paid on a CPM basis, but the client does not want to directly reveal this. I'm gonna mull this over a bit...

I know on mobile, the Apple solution is SKAdNetwork, which basically uses Apple servers as a trusted source of decorrelation and delay, hiding the IPs and timing of the attribution events. We are not so lucky in the browser context.

I do not envy trying to balance all of these tradeoffs 🙃

@jkarlin
Copy link
Collaborator

jkarlin commented Jun 22, 2023

A PIR approach to Topics would look and behave quite different (e.g., leveraging fenced frames), and would be a separate API. As such, I'm closing this issue.

@jkarlin jkarlin closed this as completed Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants