-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hostnames may be private #185
Comments
Similar to 3 above, it is also common for SaaS contracts to prohibit disclosure that company X is a customer of App Y except as authorized. As a result, disclosing "companyX.SaaSApplicationY.com" would violate pretty standard clauses governing this kind of confidentiality. |
Thanks for reaching out! Responses inline:
We have a few layers of protection here. The first, is that we don't classify hostnames that resolve to private IP address space (IANA reserved address ranges). Many intranets exist in such reserved ranges. Second, I don't expect that these intranet sites will be serving ads and calling the browsingTopics API, and therefore won't be included in the user's top topic calculation. Third, the taxonomy is rather coarse grained. Fourth, we introduce noised topics so one doesn't know which sites the user actually visited. And finally, the user may have visited any one (or multiple) of a number of sites about said topic. It's at best a probabilistic inference which site a user visited.
Similar to above. The topics provided by the taxonomy are very coarse grained. The Topics API currently classifies
Those apps/sites can disable topics (e.g., via the permissions policy API) on instances where consent is not given or disclosure is prohibited. When calculating the next set of Topics for the user, the API only considers hostnames from those pages in which the API is called and the permission policy grants the call and the IP address is not reserved and the user is not in incognito mode and the user hasn't disabled permission etc. |
Thank you for your thoughtful response. I had misunderstood a very basic aspect -- that the classification occurs within the browser not by a service running elsewhere. (I think I saw the "public" and "by a partner" and jumped to conclusions.) I will add though that it seems that there is an argument that the default permissions policy should be to deny extraction and sharing that data during the experimentation phase. This leads more broadly to a comment that I'll make elsewhere on the spec (when I'm more confident I haven't missed basic points, like the above ;) )... the safety of this API relies very heavily on the coarseness of the topic taxonomy and relies somewhat on the coarseness of the topics calculation input data. But the spec makes no promise that the taxonomy will remain coarse. Indeed, it highlights that accepting the spec entails accepting changes to the taxonomy. And the spec explicitly states that the input data could include all text in the document. |
Oh, I would also add that lots of internal apps and intranets are hosted on public cloud infrastructure. We may not be in a zero-trust world, but we are definitely in a world where lots of what-used-to-be internal-only apps are accessible from anywhere and not ip blocked. |
How does the Topics API envision managing these aspects?
The text was updated successfully, but these errors were encountered: