[css-fonts] incorporate mitigations for font based fingerprinting #4055

pes10k · 2019-06-24T20:29:45Z

Font based finger printing is a common, privacy violating pattern, where websites build semi-identifiers based on uncommon fonts a user has installed. This semi-identifier is then combined with other semi-unique-identifiers (hardware configuration, user configuration, viewport size, etc) to build highly identifying values, used for tracking users.

Examples

Panopticlick includes a well know demonstration of how this can be done: https://panopticlick.eff.org
Fingerprint2.js is a popular library that uses font-based fingerprinting (among other signals) to identify users

Some browsers provide some defenses against this privacy violation. Safari, for example, only reports the default system fonts through Safari, and will not use other, uncommon fonts, even if they're installed on the OS. Firefox provides a similar option.

The standard should be modified to protect against / not allow font-based fingerprinting by default, instead of relying on non-standardized, vendor specific mitigations.

Suggested Mitigation
I suggest having the standard follow Safari's approach, and requiring browsers to only treat the default fonts on the platform as system fonts. A simple (though maybe not the best / most elegant) way of doing this would be to modify section 5.2 in "CSS Fonts Module Level 3" to modify the system font fallback procedure to only return the default platform fonts. Those might be specified per platform, or just as this list:
http://www.ampsoft.net/webdesign-l/WindowsMacFonts.html

pes10k · 2019-06-24T20:30:41Z

If the above approach is appealing, i would be happy to submit a PR to the existing level 3 standard, as well as the level 4 proposal.

svgeesus · 2019-06-24T21:02:56Z

If the above approach is appealing, i would be happy to submit a PR to the existing level 3 standard, as well as the level 4 proposal.

Appreciated, but please restrict that change to just CSS Fonts 4 which is the focus of current implementation. Errata can be gathered for Fonts 3, but there is no intention to back port all of Fonts 4 to Fonts 3. Instead, Fonts 4 is gradually replacing Fonts 3.

pes10k · 2019-06-24T21:06:00Z

I see. Is there an expected timeline for Fonts 4? If it's a far ways off, then possibly valuable to push out a 3.1 for security and privacy purposes (e.g. not all of font 4, but the things where the current spec is being leveraged to harm users)?

svgeesus · 2019-06-24T21:07:56Z

All browsers are implementing both the Variable fonts and the Color Fonts parts of Fonts 4, plus smaller changes (like font-weight being a number in the range 1 to 999 rather than being a set of number-like tokens 100, 200 etc).

So this is being used now.

pes10k · 2019-06-24T21:14:24Z

Okie dokie, sounds good. Would a strict, enumerated set of font faces that can act as system fonts be preferred, or would a broader phrase like "fonts provided by the platform by default, and not installed by the platform's user" suffice?

AmeliaBR · 2019-06-24T22:42:30Z

I don't think we want to list the specific fonts in the spec. The general rule should be that the list of fonts shouldn't provide any more information than can be obtained by other means: e.g., by the combination of browser & OS & preferred language. (I don't know about Mac, but Windows has language specific fonts that are included in the OS but not installed by default unless the user actually uses that language in the OS.)

I would also hope that there would be some exclusion/option for supporting a wider set of fonts for trusted sites.

For the PR: Fonts Level 4 already has a section on Preinstalled Fonts vs User-Installed Fonts, which currently says:

User Agents may choose to ignore User-Installed Fonts for the purpose of the Font Matching Algorithm.

So, the request here is to upgrade that "may" into a "should".

This should probably affect local() references in @font-face, as well as font-family matching. Otherwise, the fingerprinting techniques could be changed to compare a font-face defined as src: local(test-name), url(reference.woff);, where the reference file has a characteristic size that will differ from the true font of that name. (Unfortunately, this means that periodically downloading & installing the most popular Google Fonts will no longer save me on data!)

Next step: work on an API for full access to all installed fonts as a list! (With an explicit permission prompt, of course, which would also allow those fonts to be used for rendering.) This is essential for document-editing web apps to replace their native versions. Some apps still use Flash just to get this data.

pes10k · 2019-06-25T00:25:27Z

Hi @AmeliaBR

Thanks for the comments. A couple of comments:

So, the request here is to upgrade that "may" into a "should".

I think must (i.e. "User Agents ~~may choose to~~ must ignore…") would be the right word. Correctly implementing the standard should make impossible the kinds of privacy violations the current version enables. Similarly, standards should strictly protect user privacy, at least until there is some signal (permission, etc) saying the user granted the site greater privileges.

Re: local() thats all great points! I wouldn't have thought of that, but that all seems terrific! Thanks for catching my goof :)

Re: permissions: I don't have a strong sense about this (other than that permissions discussions often rounding down to "users don't like permissions, so just grant access by default". As long as things don't wind up there!). But for the use case you mentioned, maybe a better norm to push for would be a service worker + site hosted fonts?

dbaron · 2019-06-25T00:43:10Z

I don't think a must is viable here without a better solution for addressing language support.

Many languages aren't supported in the default fonts installed on a given operating system. In many cases users can then install fonts that support more languages by choosing to install support for those languages. Presumably the requirement being proposed here would allow web use of all of the default fonts for all languages -- which in turn still exposes a good bit of fingerprinting data (which languages the user has installed fonts for) -- but I think there are still significant languages that those defaults don't cover (with significant variation between operating systems). (It also wouldn't surprise me if the fonts installed on Android devices vary based on carrier/market and aren't consistent within a language, though I'd be happy to be wrong.)

So there's a tradeoff here between one of many active fingerprinting vectors and support for significant numbers of the world's languages. Without clear data that fixing just a part of this active fingerprinting vector (still allowing fingerprinting of which languages are supported by fonts on the system) would make a real dent in ability to do active fingerprinting on the web (which is much easier than passive fingerprinting) -- data that would probably require a project to gather a list of fingerprinting vectors available on the web (with entropy for each item) -- I don't think there's a very clear case for degrading the support for many minority languages on the Web.

pes10k · 2019-06-25T00:56:46Z

Fingerprinting doesn't get solved until you start solving it :) saying "this isn't the worst vector, so lets not fix" seems like a sure fire way to make sure fingerprinting never gets better
font based finger printing actually is one of the worst FP methods though! See the Panopticlick paper / project linked above, the Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints paper, and many others (happy to provide links if you like). They all find the same thing: fonts are hugely identifying if you have anything but the default configuration (put differently: if you allow non-system fonts to be used, it will be hugely identifying in the cases where its useful, and not useful in the cases its not identifying)
You might consider the statement from PING regarding meta-standards for standards (e.g. ways to fix privacy in web standards). From the third section of the recent PING blog post, Privacy Anti-Patterns In Standards, there being bigger problems elsewhere doesn't obviate the need for standards to address the privacy harm they introduce. (note: I wrote it, but it states the position of the IG)
I think @AmeliaBR has the exactly right idea: fonts should give no more information away than "browser & OS & preferred language". So no argument against making the system fonts "the non-user installed fonts for the current system language." So not "all fonts for all languages", but something narrower than that. Would that address the concern?

dbaron · 2019-06-25T03:06:04Z

many people see passive fingerprinting as not solvable given the web's API surface. Refuting that requires gathering the data from the various sources to see what the state of things is (see below), not just hoping. Agreeing to solve it needs to be a wide consensus, not a bunch of ad hoc and inconsistent decisions made in different working groups to different standards.
many of the papers used flash-based font data, which is much more identifying since it's an ordered list of fonts, not just a set. That's why I'm suggesting that the convincing thing is to maintain a common repository of the state of fingerprinting rather than point to a bunch of papers all/most of which are seriously out of date in various ways, and all of which are incomplete.
(sorry, need a (3) here for consistent numbering)
many users use an OS and browser whose UI doesn't match their preferred language

dbaron · 2019-06-25T03:08:28Z

Oh, I guess I should respond to 3, actually: The Privacy IG isn't the right forum for making tradeoffs between Privacy and other issues; it's going to have an obvious bias. You'd probably get a very different result on a privacy vs. internationalization tradeoff in the Internationalization WG.

litherum · 2019-06-25T04:41:39Z

Safari, too, has different fonts for different internationalizations. My strategy when implementing this fingerprinting mitigation in Safari wasn't to treat every user the same; that would have made many of our users' lives worse. Instead, my goal was to limit the number of equivalence classes a user could fall into. Before the mitigation, a user could be in a class of one, thereby being uniquely identified. After the mitigation, there are still multiple equivalence classes, but there are only a handful. Each equivalence class has many, many users, thereby significantly reducing the number of bits of entropy.

litherum · 2019-06-25T04:48:44Z

Next step: work on an API for full access to all installed fonts as a list!

I would formally object to such an API. It explicitly undoes all the font-based privacy mitigations we've done. Users don't want to see more dialog boxes, and trying to explain the privacy implications of using fonts to a user is difficult. If a website wants to use fancy fonts, it can serve them as web fonts.

FremyCompany · 2019-06-25T10:14:54Z

Based on @litherum comments, my two cents here is that we should instead do the following:

User Agents must limit the exposure of system fonts to protect user privacy. The exact mechanism through which this is done is left at the discretion of User Agents.

To achieve this, User Agents should collect telemetry about fonts supported by their users. One way to prevent installed fonts to leak information about the user would be to cross-reference this telemetry data with their installed languages and operating system version, and not expose to the web the fonts which are not commonly supported in any of the [ OS-Version x Installed Language ] buckets that the user is part of.

tildelowengrimm · 2019-06-25T16:31:05Z

Agreeing to solve [passive fingerprinting] needs to be a wide consensus, not a bunch of ad hoc and inconsistent decisions made in different working groups to different standards.

David, I think I have the opposite expectations about what approach we should take. I believe that the only practical way to address passive fingerprinting is standard-by-standard and implementation-by-implementation doing the in-the-weeds work to ensure that passive fingerprinting surface isn't exposed.

But I definitely agree with you about the need for broad consensus. Good news, though: that one's already taken care of! People almost-universally agree that they don't want to be silently tracked across the web. It's not just consensus, it's basically unanimous. Now it's our job to implement that for everyone.

svgeesus · 2019-06-25T18:11:41Z

@dbaron wrote:

You'd probably get a very different result on a privacy vs. internationalization tradeoff in the Internationalization WG.

Similarly, people coming from a performance optimization perspective (which has substantial, real-word implications especially for those with slow network connections or with pricey, metered bandwidth) would take a different perspective if told "there is no need to download this font since you have it already, but we are going to forbid the browser to say so, and thus force you to download it every time, to enhance your privacy".

dbaron · 2019-06-25T18:22:38Z

There is not consensus that active fingerprinting is solvable to the point that there won't still be large numbers of unique users; I've seen a number of chrome implementors and tech leads take the position that it is not in discussions on fingerprinting, including in this working group and elsewhere. I'm not convinced either way as to whether it's solvable because I haven't seen anybody put together the data (an up-to-date list of fingerprinting vectors, with data on them and proposed mitigations) that would let me make that judgment.

(edit: fixed typo where I wrote passive when I meant active)

pes10k · 2019-06-25T18:45:41Z

@dbaron not sure what the suggestion is here. Freeze progress on CSS Font v4 until a "up-to-date list of fingerprinting vectors, with data on them and proposed mitigations" is built? It definitely does not seem user serving to say "we know there is a problem, we know its significant, but haven't had others propose mitigations for them, so we're going to ship the problem anyway".

Seems way better to fix a problem that we know exists now, and is harming users today. This isn't hypothetical; the current CSS Font v3 spec enables users to be tracked w/o their consent.

As stated before, there are many, many research papers showing this is a problem, as well as many deployed examples in the wild. It is not the case that these papers find no problem in the absence of flash, the findings are either "not having flash degrades identifiability some, but its still identifying" or "we measured w/o flash, and find its highly identifying." It's also apparently serious enough that FF and Safari have deployed mitigations.

pes10k · 2019-06-25T18:51:23Z

@litherum can you say more about Safari's algorithm? How different is it from anonymity sets of "browser & OS & preferred language"? I'm not married to the specific mitigation in the issue text, as long as the standard includes a fix for the problem. Maybe Safari's approach is the way to go!

jasonanovak · 2019-07-10T05:19:37Z

In terms of the efficacy of font fingerprinting / entropy it exposes, this paper from INRIA is fairly interesting/helpful. They conducted a real world study of fingerprinting, including Javascript based font probing, and found that fonts were one of the top contributors to fingerprint-ability.

jasonanovak · 2019-07-10T16:34:33Z

One additional thoughts: a concern has been raised about the performance impact of downloading fonts; there's an interesting performance impact of JS based font fingerprinting -- it takes time/resources for a fingerprinting script to iterate through fonts to determine what a user has installed (for example fingerprintjs2 has searching for an extended list of fonts as an defaulted-off option because of the performance impact of doing so).

AmeliaBR · 2019-07-10T17:10:37Z

@jasonanovak I don't think you can fairly compare performance impacts from malicious pages with performance impacts on normal usage. Blocking one fingerprinting script might just provoke the spyware to use another fingerprinting method with even worse performance impacts.

If the primary concern was the performance impact of the current methods for figuring out which fonts a user has, the solution would be to create a proper API for doing so.

tildelowengrimm · 2019-07-10T17:41:09Z

Agreed — I'm not a fan of a calculus which concludes that fixing a common fingerprinting method has a performance cost on the basis that sites might decide to use a less-performant fingerprinting method instead.

AmeliaBR · 2019-07-10T18:00:42Z

The performance cost of the fix is that people would end up downloading web fonts that they don't actually need (because they already have the font installed on their system).

E.g., I have most common Google Fonts installed, and one of the reasons I did that was to cut down on web font downloads. If we prevent browsers from using those custom installed fonts, there will be a performance cost to me (more data usage and slower page loading) when visiting sites that use these fonts.

How many people this will affect, and to what degree, I can't say. Some browsers give users the option to turn off web font downloads altogether, which would negate the performance impact but increase the impact on user experience. E.g., turning off web fonts might not be a good solution for people whose pre-installed system fonts don't offer a lot of choice for the languages/scripts they use.

The performance impact of malicious scripts is a separate issue altogether. I was using the example of switching fingerprint methods to emphasize that we can't expect that fixing the fingerprinting vector will have a net performance benefit on malicious sites. Malicious sites generally don't care about user data plans.

jumde · 2019-08-26T23:45:20Z

if there is a plan to introduce a local-font permission with font-table-access(https://github.com/inexorabletash/font-table-access/#privacy-and-security-considerations), then there is no need to allow non-standard system fonts by default

svgeesus · 2019-09-16T02:15:17Z

Privacy INterest Group tracking this

tabatkins · 2019-09-27T19:11:34Z

I haven't see a standard for it, any specifics of thresholds or empirical observations of it being a useful privacy protection strategy.

I assume you read the explainer I linked? Note that this is also something we're actively working on and developing; it's far from complete so far.

Further, since a unique font generally puts someone in an extremely small equiv class by itself (w/o needing to be combined with other inputs), its unclear how a privacy budget approach would be useful here.

A single unique font probably does, yeah. How do you expect a website to find that single unique font that the user has? If it's highly identifying, that means only a small number of people have it. So either the website is only targetting those handful of people and is thus testing only for that font (interesting case...) or they're testing lots of "unique" fonts to see which small bucket the user falls in. The latter is exactly what the Privacy Budget approach is intended to detect - spamming hundreds or thousands of local font requests looking for the one that highly identifies the user.

Put differently, users are being harmed today by this flaw in the font standard. It seems inappropriate to hinge the solution to that problem to something that isn't anywhere close to standardization (i.e. privacy budget).

And as others have argued in this thread, users will be harmed by the suggestion to restrict local font access to solely system fonts. (And aren't currently harmed by Safari's actions due to the differences in user demographics between browsers.) We need to think about the balance of benefits, harms, and costs of mitigating those harms.

As I argued in TPAC, and Chris Wilson and others at Google argued in their response to PING's charter discussion, the web is chock full of data that can be used for fingerprinting. Any attempt to reduce that, particularly any attempt with significant user-harmful side effects, needs to show that it'll actually reduce the fingerprinting surface to a usefully low level; going from 400 bits to 40 bits of identifying information achieves precisely nothing, since you only need 33 bits to uniquely identify every person on Earth. (And you really want to allow less than 20 bits, to ensure that people are "bucketed" together with at least several thousand others.)

If the PING can show that the sum of their suggested mitigations will reduce fingerprinting surface to 20 bits or less, or at least that there's a believeable path to getting under that limit, and that performing all of those mitigations will not harm the web to such an extent that the attack surface just moves elsewhere (such as sites moving to native apps...), then great! That would be an ideal solution, because reducing information wholesale is typically far easier than trying to be clever!

So far, the PING hasn't attempted to show that it's possible to do that. And so far, Chrome's security engineers don't believe it's possible to reasonably do an absolute fingerprinting reduction, either. Thus Privacy Budget, our attempt to dynamically enforce a pay-as-you-go budget that, hopefully, will let us prevent attacks (like scanning the user's local fonts) without harming legitimate uses (like using a handful of local fonts to actually render text).

I think you should do more than dismiss Privacy Budget out-of-hand; it's a serious effort to actually solve fingerprinting across the entire web platform, not an attempt to deflect attention. The math is clear here: this isn't a problem that can be solved with band-aids, and even knowing if your efforts will achieve anything at all requires a serious analysis of the whole attack surface; standard defense-in-depth security intuitions don't apply, at least not with the current state of things.

So, as Chris Wilson said, without a formal model showing that this change is part of a combined effort that will achieve a useful result, Chrome will continue to be against it, and will instead pursue methods like I described to achieve useful fingerprinting reduction. Harming users and webdevs for what is currently just a fig-leaf is not something we're interested in.

hsivonen · 2019-09-28T09:22:48Z

If users are given choice, we can't protect users who use the ability to make choices from being fingerprinted on those choices.

I believe that's the problem the working groups (CSS WG + privacy WG + other related WG like i18n, etc.) need to working together and figure out.

For clarity: I'm not suggesting taking away choice from users. That exercising choice makes a user fingerprintable is just a fact and there's nothing for WGs to work out about it. (I think most of this issue is probably not a standard-setting one but a browser product decision one.)

What I see as a problem is that what I believe to be substantial populations of users who don't need to exercise such choice and could be protected are not. It's not particularly nice to know that there are users who are cannot be protected, but that's not a good reason not to protect the substantial user population who could be protected.

Consider the following types of users:

Users who never install fonts.
Users who install fonts unknowingly by installing apps that add fonts that are available system-wide.
Users who knowingly install fonts for non-browser use but don't change the default font prefs in their browser.
Users whose language is well-served by system fonts but who as a matter of individual taste still change their browser font prefs.
Users whose language is merely covered by system fonts and who as a matter of community norm install additional fonts for their language.
Users whose language isn't covered by system fonts and who have to install fonts for stuff to work at all.
Web developers who want to prototype with local non-system fonts before deploying with @font-face.

(The taxonomy is simplified: The most notable complication is the one seen upthread on Windows 10 with Chinese and Japanese: That there are fonts that are bundled with the system and that are conditionally enabled. For example, for someone in Japan, having the conditionally-enabled Japanese fonts enumerable probably isn't a substantial fingerprinting vector. For someone in Europe who has a Japanese IME in the text input menu, they are. For the purpose of the below paragraphs, I'm hand-waving conditionally-enabled system fonts into group 1.)

Users in group 1 need no protection mechanisms compared to status quo. Evidently users in group 4 can change browser prefs and could uncheck whatever "don't expose user-installed fonts to the Web" checkbox to opt out of protection.

Browsers cannot protect users in group 6 without developing all-encompassing font download mechanisms as part of the browser.

Groups 2 and 3 could be protected but aren't. As a user in group 3 (previously in group 4), I'm unhappy that I'm not made indistinguishable from group 1. It should be within technical feasibility to do so without breaking use cases for groups 5, 6, and 7, but the details do need careful thought. In particular, it would be good to know what language communities are in group 5 and with what details (e.g. in the context of particular operating systems only or out of habit despite operating system font repertoire having improved). Group 4, as noted, will manage.

pes10k · 2019-09-30T21:26:34Z

@tabatkins

Setting aside our different opinions on privacy budget, can we at least agree that since it hasn't even been fully defined (let alone proposed or standardized), its not a useful solution to the problem discussed here?
Sure, understood that solving this issue won't solve fingerprinting, but the only way to get the # of identifying bits down to an acceptable number is to start removing as bits anywhere and everywhere we can (where "can" means "w/o breaking the web").

No single change will get us to "win", but there are a lot of bits in fonts (see above linked papers), and so fixing font fingerprinting will be a necessary _but not sufficient- step to unbreaking things (where breaking means "allowing people to be tracked w/o their consent")

Several of us in this thread are working on finding ways of fixing the standard. It might be the case that things can't be fully-un-broken, and we can't undo the harm that's been done for everyone, but we can at least stop harming as many people as possible. For the reasons many have mentioned, its difficult (but not impossible!). We would really appreciate your help :)

frivoal · 2019-10-01T01:11:33Z

but the only way to get the # of identifying bits down to an acceptable number is to start removing as bits anywhere and everywhere we can (where "can" means "w/o breaking the web").

That's the trick though:

Can we eventually remove enough bits without breaking the web to reach a usefully low number? If yes, then we should start where we can. If no, it doesn't matter where we start if we already know we won't get to a useful level ever. As I understand it, @tabatkins 's claim is that he (and Google) are skeptical that we can ever reach a usefully low number, and that they would therefore like to see some attempt at proving that it is possible (or at least plausible) that we can get there at all before we start removing things.
What do we count as "breaking the web"? How severely do we need to inconvenience people before it counts as breaking the web? How many people need to be impacted? What if it is a small absolute number that represents a high percentage of a particular demographic?

I am in particular concerned about making the web substantially worse or unusable for people who fall into @hsivonen 's category 5 and 6.

I'll add two more categories:
8. People who pay a substantial percentage of their income on data costs, and wish to reduce the weight of web pages by installing commonly used fonts to avoid them being downloaded over and over by @font-face. May overlap with 5.
9. People on linux systems like debian where everything including the base system is installed via a package manager, and there's no distinction between system fonts and user installed fonts.

tabatkins · 2019-10-01T01:32:20Z

Setting aside our different opinions on privacy budget, can we at least agree that since it hasn't even been fully defined (let alone proposed or standardized), its not a useful solution to the problem discussed here?

No, we can't, because Privacy Budget is intended by Chrome's security folks to be the way to solve the problem discussed here.

Put another way, I'll object to anything in this vein that attempts to standardize a Safari-like "spec mandates you must never expose more than this subset of local fonts" until my security folks tell me they've given up and that's the best way forward.

Sure, understood that solving this issue won't solve fingerprinting, but the only way to get the # of identifying bits down to an acceptable number is to start removing as bits anywhere and everywhere we can (where "can" means "w/o breaking the web").

I addressed this in my preceding comments, and so has Florian. No one is saying "this one change won't fix fingerprinting, so let's not do it", so talking about that as an objection is a moot point.

@frivoal said:

As I understand it, @tabatkins 's claim is that he (and Google) are skeptical that we can ever reach a usefully low number, and that they would therefore like to see some attempt at proving that it is possible (or at least plausible) that we can get there at all before we start removing things.

Yup, exactly. Again, this is not a situation where incremental progress is worthwhile; getting halfway to the goal has zero benefit. It needs to be shown that we can actually plausibly reach the goal before we start breaking things to move toward it.

What do we count as "breaking the web"? How severely do we need to inconvenience people before it counts as breaking the web? How many people need to be impacted? What if it is a small absolute number that represents a high percentage of a particular demographic?

Yup, and this is an important part of the analysis as well. Killing fingerprinting will bring benefits, but limiting a bunch of APIs will bring harms. Need to make sure the balance is worthwhile, but at the moment we have no idea what the set of harms we'll be looking at even are.

hsivonen · 2019-10-02T08:52:54Z

@frivoal

What do we count as "breaking the web"?

That's a very relevant question. For there to be even change (leaving aside for the moment which changes are breakage), the Web author has to specify a font that that's not a system font and that the user has installed (or the user-installed font has to be the only fallback).

These days the notion that for European languages a Web author would bet their design on the user having a particular non-system font installed as opposed to either being satisfied with known system fonts or providing a specific font via @font-face is ridiculous on its face, since the expected success rate would be very low.

For what language communities can a Web author generally expect users to have a nameable but non-system-bundled font installed (group 5 in my previous comment) such that it's not pretty much the only font available for the script (in which case there'd be no need to name it)?

As for the case where everyone in a language community has to install a font because their script isn't covered by system fonts at all (group 6 in my previous comment), the issue might be more solvable than it might seem. After my previous comment, I learned that macOS ships the long tail of Noto fonts but hides them from font pickers. It appears that Android ships the long tail of Noto, too. I gather Chrome OS does, too, but I didn't test. It doesn't seem unreasonable to give dead-script scholars a slightly worse out-of-the-box experience in order to protect the privacy of everyone else (i.e. require dead-script scholars to go to group 4).

The unhinted fonts in the long tail (the tail not already covered by e.g. Windows 10, Ubuntu, and Fedora out-of-the-box) of Noto are remarkably small in terms of file size. The hidden Noto folder on macOS Mojave takes a mere 6.3 megabytes.

Are there living scripts without any font out of the box on macOS, Android, and Chrome OS?

People who pay a substantial percentage of their income on data costs, and wish to reduce the weight of web pages by installing commonly used fonts to avoid them being downloaded over and over by @font-face. May overlap with 5.

Performing that optimization requires enough understanding of how browsers work that it doesn't seem unreasonable to expect users who perform that optimization to also have the knowhow to flip a pref to opt out of the privacy protection. (Whereas I'm worried that for group 5 it might be less practical to require flipping a pref.)

People on linux systems like debian where everything including the base system is installed via a package manager, and there's no distinction between system fonts and user installed fonts.

Seems unreasonable to give up in the case users of more mainstream operating systems, including Linux distros, just because one can show that there exist Linux distros that leave so much configuration to the user that the notion of "default" isn't very meaningful.

Debian is an outlier on these matters even as far as Linux distros go. Consider a Japanese IME. Fedora installs one out of the box even if you don't install Fedora in Japanese. Ubuntu installs one when you enable Japanese text input, which I assume happens out of the box if you install Ubuntu in Japanese, though I haven't tested. OpenSuSE gives you a Japanese IME if you install OpenSuSE in Japanese (but I failed to figure out how enable a Japanese IME from GUI on an English OpenSuSE installation). Debian doesn't give you a Japanese IME even if you install Debian in Japanese and choose a Japanese keyboard layout during installation!

@tabatkins

No, we can't, because Privacy Budget is intended by Chrome's security folks to be the way to solve the problem discussed here.

Even if not intended for privacy reasons, do you consider this issue already incidentally solved on Chrome OS and Android by the combination of shipping Noto and it being hard for users to install fonts?

hsivonen · 2019-10-02T11:15:07Z

Are there living scripts without any font out of the box on macOS, Android, and Chrome OS?

Looks like despite all the recent activity to encode dead scripts in the Supplementary Multilingual Plane, there are still open proposals for living scripts to be encoded there, so there's still work to be done with living scripts.

frivoal · 2019-10-02T13:08:02Z

For what language communities can a Web author generally expect users to have a nameable but non-system-bundled font installed (group 5 in my previous comment) such that it's not pretty much the only font available for the script (in which case there'd be no need to name it)?

Japanese and Chinese (and maybe Korean? I'm not sure) come to mind, if we count fonts installed by office as non system fonts. Also if we count the windows 10 not-installed-by-default fonts as non system fonts. They'll be readable without that, but they won't be pretty.

As for the case where everyone in a language community has to install a font because their script isn't covered by system fonts at all (group 6 in my previous comment)[...]

Are there living scripts without any font out of the box on macOS, Android, and Chrome OS?

Mongolian fails to display properly out of the box of macOS. Demo here: https://florian.rivoal.net/csswg/rare-lang/

I'm sure there's more. @r12a probably knows of some.

People who pay a substantial percentage of their income on data costs, and wish to reduce the weight of web pages by installing commonly used fonts to avoid them being downloaded over and over by @font-face. May overlap with 5.

Performing that optimization requires enough understanding of how browsers work that it doesn't seem unreasonable to expect users who perform that optimization to also have the knowhow to flip a pref to opt out of the privacy protection.

On an individual basis, I agree. In a community where you set up your system like your cousin/friend told you, because then you save tons of money, even if you don't know why, there could be a substantial number of people doing it without understanding. Sure, if the steps to follow change, the cousin/friend's message can change, but we're still living out in the cold people for who it used to work.

Granted, this is a hypothetical, and I don't have first hand knowledge of this situation existing. But I wouldn't be surprised if it did.

hsivonen · 2019-10-02T18:48:45Z

Japanese and Chinese (and maybe Korean? I'm not sure) come to mind, if we count fonts installed by office as non system fonts.

Yes, that came up upthread. It's an interesting case considering that in general it seems like an anti-feature that Web sites can tell what apps you have installed.

Are there others?

Also if we count the windows 10 not-installed-by-default fonts as non system fonts. They'll be readable without that, but they won't be pretty.

How do the Windows 10 Chinese and Japanese font packs affect the role of Office fonts. (It seems that making a browser ignore the Windows 10 font packs that install as side effect of enabling a language would be unlikely to be well received by users.)

Mongolian fails to display properly out of the box of macOS. Demo here: https://florian.rivoal.net/csswg/rare-lang/

macOS ships a working (Noto) font for the Mongolian script. It also ships broken fonts, and a broken one takes precedence.

I wonder to what extent issues with other scripts arise from bad default font selection order rather than lack of system-bundled fonts.

pes10k · 2019-10-03T22:50:19Z

Does anyone have a sense of how big the set of useful-to-have-local fonts is? Even if just rough estimate, are we talking 1, 2 or 3 digits of fonts?

hsivonen · 2019-10-04T13:15:52Z

Does anyone have a sense of how big the set of useful-to-have-local fonts is? Even if just rough estimate, are we talking 1, 2 or 3 digits of fonts?

Useful is relative. Before GUIs, Latin-script computing got by with one monospace font...

Currently, the full unhinted set of Noto fonts has 1605 fonts that take 1.5 GB of disk space uncompressed. It's useful to have even more: E.g. outside of Noto, Chrome OS ships fonts that are part of or compatible with the 1990s Microsoft core Web font set as well as a couple of common-on-Linux Indic fonts from the Lohit set (I have no idea why the particular ones and not all Lohit fonts), and an extra Hangul font.

For sure it would be useful to have even more. (To begin with, even two weights for the scripts that now only have one.)

But can some definition of useful be fewer?

Is it useful to have more than two weights? Yes, but how much does the Web rely on more than two weights being available locally? Probably not much and more weights are probably mostly a @font-face thing.

Is it useful to have condensed versions? Yes, but again that's probably more of a @font-face thing.

If you take all Noto fonts, keep up to two weights, remove the UI variants, remove Display variants, and remove condensed variants, you are left with 195 fonts that unhinted take 550 MB of disk space uncompressed.

(The numbers 1605 and 195 don't really mean anything, since some fonts cater to multiple scripts and others only to one, and either way they vary a lot in size per font.)

But again, the long-tail delta that macOS needed to catch up with Noto coverage is just 6.3 MB.

AmeliaBR · 2019-10-04T20:27:34Z

how big the set of useful-to-have-local fonts is?

I think the conclusion we can make from this thread is that which fonts are useful really depends on user language & that if you created a union of all fonts that are essential to someone, you'd end up in the 100s — but most people wouldn't have most of those fonts installed. "Which fonts are useful to you" is itself a fingerprinting vector.

That said, I think it may be productive to discuss how many fonts are reasonable for a given website to use, and can we limit it from that side (as Tab suggested with the privacy budget). Then, you'd might be able to cap it to a reasonable fingerprinting surface (<20, maybe?) while still addressing most use cases. I'd want to see some actual data, of course, on how many fonts per website are currently in use, trying to separate out the fingerprinting cases from the valid ones.

One valid use case that wants access to dozens of system fonts is a document editor app, but that use case would be much better served with a dedicated <input type="font"/> element (that only exposes any font to the website after it has been explicitly chosen by the user, similar to a file picker input) — trying to address that use case with current techniques is horribly hacky, anyway.

pes10k · 2019-10-04T22:34:34Z

@hsivonen understood, useful is relative :) but i'm just trying to get a sense of whether we can solve this problem by just creating a small (maybe, by accept-lang) fixed set of fonts that can be served as local fonts. Maybe thats a way of getting progress here (it would be similar to Apple's current approach, but a bit more flexible)

@AmeliaBR I'm extremely skeptical about privacy budget / dynamic approaches (see list below, so that at least i've said it), but I'm not asking which fonts are used on the web, im asking which are really needed to prevent sites for breaking for people. Also, <20 local-fonts will still be uniquely identifying for many people.

straw man proposal
Crawl a lot of the web, find all the non-system-font, non-web-font font-face's in static CSS, allow (font face, lang) tuples that appear that take care of {95,99…}% of the distribution.

Why dynamic approaches will not solve this problem, partial list (at least w/o answers to the following)

What is the life time of the "budget"? If its short, you've done little to address FP. If its long, you'll break a bunch of sites when the site ships new code / new fonts / etc
Whats the scope of the budget? Presumably 3ps need to be covered by the 1p budget (or else the budget can be trivially circumvented). but then it means an embedded resource / ad / whatever can drain the parent's budget (!?), which seems clearly bad / likely to break stuff.
Similar, etld+1 seems like a natural scope, but it means, for example, content hosted on, say, github can exhaust the budget for unrelated accounts? that seems like a non-starter
You enable all sorts of privacy harms, since (depending on choices to the above questions) I can likely learn a lot about your prior browsing behavior based on your remaining budget
what happens when i do something budget-related, after the budget has been exhausted? Do I need to surround every privacy-relevant Web API point with try / catch? etc?
Developer hell of never being able to reason about how any code will work at a given time (e.g. the possible state for every piece of code that needs to ship has now exploded exponentially)
It doesn't remove the need to reduce FP surface, since you need to preserve a (to be useful) shallow budget to have anything be useful
it means websites can't use, say, canvas and web audio… (all sorts of implications of this sort).
entropy measurements that inform dynamic approaches are the wrong way to measure the problem. E.g. the UA has a lot of entropy in it, but if Im in a big anonymity set, it doesn't matter.
presumably when you reset the budget you also need to reset all storage, or else budget protections can be trivially circumvented too
etc

I'm happy to discuss these dynamic approaches, but if folks want to advocate for it as a general solution to finger printing protections the CSS font issue queue probably isn't the best place. Similarly, if folks are suggesting privacy-budget like approaches instead of static solutions to the currently-harming-millions-of-people problem of font-fingerprinting, it would be useful to have questions to the above…

yoavweiss · 2019-10-05T09:38:49Z

(arriving late at this thread)

Seems to me that beyond the specific discussion around fonts, there's an underlying meta question: should specific mitigations be baked into specifications?

If we examine font fingerprinting as an example, it seems like this is a place where browsers' anti-fingerprinting efforts can develop extremely smart solutions to reduce the entropy exposed (e.g. block only "rare" fonts for some definition of rare, track scripts that enumerate them either offline or online - with or without a Privacy Budget, etc).

Baking-in specific mitigations as MUST requirements seems like something that can stifle innovation that can benefits users on the privacy, performance and usability fronts.

The fact that even proponents of baking-in mitigations are not sure which mitigations we want to bake-in emphasizes that point.

The same is true for other specifications where the PING has tried to propose mitigations without any evidence as to why they'd actually help users effectively.

That makes me believe that we need a wider discussion between the PING, TAG, and WG chairs & members to get a better understanding of the trade-offs between well-defined, frozen-in-time, formal mitigations and UA-defined ones.

hsivonen · 2019-10-05T17:47:04Z

@AmeliaBR

if you created a union of all fonts that are essential to someone, you'd end up in the 100s — but most people wouldn't have most of those fonts installed

Does the part after the dash mean specific fonts by specific name or fonts that cover the use cases generally? That is, do you consider macOS, Android, and Chrome OS users (whose systems come with a use-case-wise remarkably wide set of fonts installed but not necessarily the fonts with Windows name recognition) to have "those fonts" installed?

@yoavweiss

e.g. block only "rare" fonts for some definition of rare

The rarest fonts are the most problematic to block: Fonts for languages whose needs aren't addressed by system-bundled fonts.

FWIW, as a user, I want relatively non-rare fonts to be blocked: Fonts that are common enough to be available via Google Fonts and Adobe Fonts or fonts that LibreOffice makes available system-wide on Windows. The combination of these may be distinctive and these are common enough that it makes sense for a tracker to try to probe for them.

pes10k · 2019-10-15T19:38:36Z

Here is another strawman proposal to try and move things forward:

What if the standard didn't put any limitations on what the page could access as local fonts, but required local fonts to be specifically, intentionally loaded into the browser, instead of defaulting to any and all fonts it could find.

That would seem to both solve the problem of some users wanting additional fonts, but also address significant, in the wild, problem of people being fingerprinted because of things they did on their system completely unrelated to their browser.

WDYT?

r12a · 2019-10-29T17:40:20Z

macOS ships a working (Noto) font for the Mongolian script. It also ships broken fonts, and a broken one takes precedence.

Entre parenthèses, just a couple of quick thoughts off the top of my head about using Noto Fonts that may be worth considering. This is not an exhaustive description of issues, but i wanted to make it clear that, especially for minority scripts, Noto fonts are not a panacea.

The Noto Mongolian font has some issues. It's much better for readability to use MS Mongolian Baiti, or one of the more recent fonts provided by Mongolian font vendors. (If you want the gory details, see https://r12a.github.io/mongolian-variants/)

Same goes for other scripts. For example, the initial Noto Javanese font was roundly rejected by the community as unreadable as well as aesthetically unattractive.

Noto Sans Tai Tham font cannot be used for representing both Northern Thai and Tai Khün languages, since the glyph shapes needed for each language can be significantly different (although the code points are the same).

The Noto fonts don't often offer the alternative variant glyphs that can be useful for certain languages, and tend to be provided, for example, by SIL fonts. For example, in my documents i use smart features in a SIL font to maintain a particular glyph shape for 'a' in certain italicised text and for phonetic text.

Although Noto (unusually) provides both naskh and nasta'liq fonts for Arabic script, they don't provide ruq'a or kano font styles for arabic text. They also don't provide the mool or slanted font styles for Khmer which users might expect to be used for certain typographic situations.

Note that often minority (and some non-minority) languages use different fonts/font styles to distinguish certain types of text from others, whereas in the west we'd use italicisation, bolding or font size.

Noto fonts don't provide features aimed at assisting readability for beginner readers or the visually impaired, such as SIL's Andika.

And then, of course, there are some living languages cropping up in recent Unicode releases for which there are no Noto fonts available.

So Noto fonts may help to some extent, but there'll also be problems in relying on them, not least because international font usage is generally more complicated than we're used to for Latin-based text.

svgeesus · 2019-11-07T18:07:36Z

I have been carefully re-reading the INRIA paper.

It is notable that the study is restricted to 15 French websites (one weather page and one news page on each) so the comments by @hsivonen about European usage applies. In a French context, most users would be from France, Belgium, and then places with large francophone populations such as Morocco, Algeria, etc. So I would expect some font variability based on some uses having more Arabic fonts for example.The cultural and geographic homogeneity is confirmed in the paper: " 97.7% of fingerprints present French as their first language" and " 98% of users present the same value for timezone, which corresponds to CentralEuropean Time Zone UTC+01:0". On the plus side, the test subjects were normal web users not, for example, subject specialists such as privacy or internationalization researchers.

In the study, font fingerprinting is the third-largest source of entropy on desktop/laptop machines, but the eighth-largest on mobile (phones, tablets, pads).

I note that they only probed for 66 fonts (each in serif, sans-serif and monospace, which seems odd), because (unlike the Flash situation which instantly returns a complete font list) each font has to be probed for one by one, rendering a text string and measuring metrics. I would assume then that a site which tested many thousands of fonts would be awfully slow? Unless it was very interesting, and the content above the fold displayed quickly, allowing the tracking script to run while the user actually reads the content.

The set of fonts probed for (to their credit, they give the complete list) seems drawn from Windows for the most part, with some from MacOS, without specific probings for Android or iOS. The fonts are primarily Latin fonts; no specifically Arabic, Chinese, or Japanese fonts were probed for example.

Andale Mono, AppleGothic, Arial, Arial Black, Arial Hebrew, Arial MT, Arial Narrow, Arial Rounded MT Bold, Arial Unicode MS,Bitstream Vera Sans Mono, Book Antiqua, Bookman Old Style, Calibri, Cambria, Cambria Math, Century, Century Gothic, Century Schoolbook, Comic Sans, Comic Sans MS, Consolas, Courier,Courier New, Garamond, Geneva, Georgia, Helvetica, HelveticaNeue, Impact, Lucida Bright, Lucida Calligraphy, Lucida Console ,Lucida Fax, LUCIDA GRANDE, Lucida Handwriting, Lucida Sans, Lucida Sans Typewriter, Lucida Sans Unicode, Microsoft Sans Serif, Monaco, Monotype Corsiva, MS Gothic, MS Outlook, MS PGothic, MS Reference Sans Serif, MS Sans Serif, MS Serif, MYRIAD, MYR-IAD PRO, Palatino, Palatino Linotype, Segoe Print, Segoe Script, Segoe UI, Segoe UI Light, Segoe UI Semibold, Segoe UI Symbol, Tahoma, Times, Times New Roman, Times New Roman PS, Trebuchet MS, Verdana, Wingdings, Wingdings 2, Wingdings 3

xfq · 2024-11-04T05:51:34Z

There is one more point not mentioned above: some users want to use some Chinese/Japanese fonts (like Jigmo) with Han characters encoded in recent years, but the system fonts don't support them and the webfonts are too large, so they want to be able to load them locally. Fingerprint protections are important, but giving users the option to load local fonts is also important.

This issue is also somewhat related: w3c/typography#86 (comment)

pes10k mentioned this issue Aug 21, 2019

Font enumeration fingerprinting (w3c/ccswg-drafts#4055) w3cping/tracking-issues#3

Open

ewilligers added css-fonts-3 css-fonts-4 Current Work and removed css-fonts-3 labels Aug 27, 2019

svgeesus self-assigned this Oct 29, 2019

r12a added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Oct 29, 2019

r12a mentioned this issue Oct 29, 2019

Incorporate mitigations for font based fingerprinting w3c/i18n-activity#797

Open

pes10k mentioned this issue Nov 7, 2019

[css-fonts] limit local fonts to those selected by users in browser settings (or other browser chrome) #4497

Closed

plehegar added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Feb 10, 2020

astearns mentioned this issue Mar 17, 2020

Add the install-your-fonts proposal as an option w3cping/font-anti-fingerprinting#6

Merged

swickr mentioned this issue Mar 20, 2020

CR Request for ttml-imsc1.2 w3c/transitions#234

Closed

frivoal mentioned this issue Mar 31, 2020

[meta] [css-fonts] Criteria for generic font families #4910

Open

quasicomputational mentioned this issue Oct 4, 2020

[css-fonts] JS-free probing of Unicode support of fonts #5578

Closed

pes10k mentioned this issue Oct 12, 2020

Fingerprinting v3: Font Fingerprinting brave/brave-browser#816

Closed

yarusome mentioned this issue May 21, 2023

Warn about UAs possibly ignoring fonts for local() in @font-face mdn/content#26879

Closed

[css-fonts] incorporate mitigations for font based fingerprinting #4055

[css-fonts] incorporate mitigations for font based fingerprinting #4055

Comments

pes10k commented Jun 24, 2019

pes10k commented Jun 24, 2019

svgeesus commented Jun 24, 2019

pes10k commented Jun 24, 2019

svgeesus commented Jun 24, 2019

pes10k commented Jun 24, 2019

AmeliaBR commented Jun 24, 2019 • edited Loading

pes10k commented Jun 25, 2019 • edited Loading

dbaron commented Jun 25, 2019

pes10k commented Jun 25, 2019 • edited Loading

dbaron commented Jun 25, 2019 • edited Loading

dbaron commented Jun 25, 2019

litherum commented Jun 25, 2019

litherum commented Jun 25, 2019

FremyCompany commented Jun 25, 2019

tildelowengrimm commented Jun 25, 2019

svgeesus commented Jun 25, 2019 • edited Loading

dbaron commented Jun 25, 2019 • edited Loading

pes10k commented Jun 25, 2019

pes10k commented Jun 25, 2019

jasonanovak commented Jul 10, 2019

jasonanovak commented Jul 10, 2019

AmeliaBR commented Jul 10, 2019 • edited Loading

tildelowengrimm commented Jul 10, 2019

AmeliaBR commented Jul 10, 2019

jumde commented Aug 26, 2019

svgeesus commented Sep 16, 2019

tabatkins commented Sep 27, 2019

hsivonen commented Sep 28, 2019

pes10k commented Sep 30, 2019 • edited Loading

frivoal commented Oct 1, 2019

tabatkins commented Oct 1, 2019

hsivonen commented Oct 2, 2019 • edited Loading

hsivonen commented Oct 2, 2019

frivoal commented Oct 2, 2019

hsivonen commented Oct 2, 2019

pes10k commented Oct 3, 2019

hsivonen commented Oct 4, 2019

AmeliaBR commented Oct 4, 2019

pes10k commented Oct 4, 2019

yoavweiss commented Oct 5, 2019 • edited Loading

hsivonen commented Oct 5, 2019

pes10k commented Oct 15, 2019

r12a commented Oct 29, 2019

svgeesus commented Nov 7, 2019 • edited Loading

xfq commented Nov 4, 2024

AmeliaBR commented Jun 24, 2019 •

edited

Loading

pes10k commented Jun 25, 2019 •

edited

Loading

pes10k commented Jun 25, 2019 •

edited

Loading

dbaron commented Jun 25, 2019 •

edited

Loading

svgeesus commented Jun 25, 2019 •

edited

Loading

dbaron commented Jun 25, 2019 •

edited

Loading

AmeliaBR commented Jul 10, 2019 •

edited

Loading

pes10k commented Sep 30, 2019 •

edited

Loading

hsivonen commented Oct 2, 2019 •

edited

Loading

yoavweiss commented Oct 5, 2019 •

edited

Loading

svgeesus commented Nov 7, 2019 •

edited

Loading