-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-fonts] incorporate mitigations for font based fingerprinting #4055
Comments
If the above approach is appealing, i would be happy to submit a PR to the existing level 3 standard, as well as the level 4 proposal. |
Appreciated, but please restrict that change to just CSS Fonts 4 which is the focus of current implementation. Errata can be gathered for Fonts 3, but there is no intention to back port all of Fonts 4 to Fonts 3. Instead, Fonts 4 is gradually replacing Fonts 3. |
I see. Is there an expected timeline for Fonts 4? If it's a far ways off, then possibly valuable to push out a 3.1 for security and privacy purposes (e.g. not all of font 4, but the things where the current spec is being leveraged to harm users)? |
All browsers are implementing both the Variable fonts and the Color Fonts parts of Fonts 4, plus smaller changes (like font-weight being a number in the range 1 to 999 rather than being a set of number-like tokens 100, 200 etc). So this is being used now. |
Okie dokie, sounds good. Would a strict, enumerated set of font faces that can act as system fonts be preferred, or would a broader phrase like "fonts provided by the platform by default, and not installed by the platform's user" suffice? |
I don't think we want to list the specific fonts in the spec. The general rule should be that the list of fonts shouldn't provide any more information than can be obtained by other means: e.g., by the combination of browser & OS & preferred language. (I don't know about Mac, but Windows has language specific fonts that are included in the OS but not installed by default unless the user actually uses that language in the OS.) I would also hope that there would be some exclusion/option for supporting a wider set of fonts for trusted sites. For the PR: Fonts Level 4 already has a section on Preinstalled Fonts vs User-Installed Fonts, which currently says:
So, the request here is to upgrade that "may" into a "should". This should probably affect Next step: work on an API for full access to all installed fonts as a list! (With an explicit permission prompt, of course, which would also allow those fonts to be used for rendering.) This is essential for document-editing web apps to replace their native versions. Some apps still use Flash just to get this data. |
Hi @AmeliaBR Thanks for the comments. A couple of comments:
I think must (i.e. "User Agents Re: Re: permissions: I don't have a strong sense about this (other than that permissions discussions often rounding down to "users don't like permissions, so just grant access by default". As long as things don't wind up there!). But for the use case you mentioned, maybe a better norm to push for would be a service worker + site hosted fonts? |
I don't think a must is viable here without a better solution for addressing language support. Many languages aren't supported in the default fonts installed on a given operating system. In many cases users can then install fonts that support more languages by choosing to install support for those languages. Presumably the requirement being proposed here would allow web use of all of the default fonts for all languages -- which in turn still exposes a good bit of fingerprinting data (which languages the user has installed fonts for) -- but I think there are still significant languages that those defaults don't cover (with significant variation between operating systems). (It also wouldn't surprise me if the fonts installed on Android devices vary based on carrier/market and aren't consistent within a language, though I'd be happy to be wrong.) So there's a tradeoff here between one of many active fingerprinting vectors and support for significant numbers of the world's languages. Without clear data that fixing just a part of this active fingerprinting vector (still allowing fingerprinting of which languages are supported by fonts on the system) would make a real dent in ability to do active fingerprinting on the web (which is much easier than passive fingerprinting) -- data that would probably require a project to gather a list of fingerprinting vectors available on the web (with entropy for each item) -- I don't think there's a very clear case for degrading the support for many minority languages on the Web. |
|
|
Oh, I guess I should respond to 3, actually: The Privacy IG isn't the right forum for making tradeoffs between Privacy and other issues; it's going to have an obvious bias. You'd probably get a very different result on a privacy vs. internationalization tradeoff in the Internationalization WG. |
Safari, too, has different fonts for different internationalizations. My strategy when implementing this fingerprinting mitigation in Safari wasn't to treat every user the same; that would have made many of our users' lives worse. Instead, my goal was to limit the number of equivalence classes a user could fall into. Before the mitigation, a user could be in a class of one, thereby being uniquely identified. After the mitigation, there are still multiple equivalence classes, but there are only a handful. Each equivalence class has many, many users, thereby significantly reducing the number of bits of entropy. |
I would formally object to such an API. It explicitly undoes all the font-based privacy mitigations we've done. Users don't want to see more dialog boxes, and trying to explain the privacy implications of using fonts to a user is difficult. If a website wants to use fancy fonts, it can serve them as web fonts. |
Based on @litherum comments, my two cents here is that we should instead do the following:
|
David, I think I have the opposite expectations about what approach we should take. I believe that the only practical way to address passive fingerprinting is standard-by-standard and implementation-by-implementation doing the in-the-weeds work to ensure that passive fingerprinting surface isn't exposed. But I definitely agree with you about the need for broad consensus. Good news, though: that one's already taken care of! People almost-universally agree that they don't want to be silently tracked across the web. It's not just consensus, it's basically unanimous. Now it's our job to implement that for everyone. |
@dbaron wrote:
Similarly, people coming from a performance optimization perspective (which has substantial, real-word implications especially for those with slow network connections or with pricey, metered bandwidth) would take a different perspective if told "there is no need to download this font since you have it already, but we are going to forbid the browser to say so, and thus force you to download it every time, to enhance your privacy". |
There is not consensus that active fingerprinting is solvable to the point that there won't still be large numbers of unique users; I've seen a number of chrome implementors and tech leads take the position that it is not in discussions on fingerprinting, including in this working group and elsewhere. I'm not convinced either way as to whether it's solvable because I haven't seen anybody put together the data (an up-to-date list of fingerprinting vectors, with data on them and proposed mitigations) that would let me make that judgment. (edit: fixed typo where I wrote passive when I meant active) |
@dbaron not sure what the suggestion is here. Freeze progress on CSS Font v4 until a "up-to-date list of fingerprinting vectors, with data on them and proposed mitigations" is built? It definitely does not seem user serving to say "we know there is a problem, we know its significant, but haven't had others propose mitigations for them, so we're going to ship the problem anyway". Seems way better to fix a problem that we know exists now, and is harming users today. This isn't hypothetical; the current CSS Font v3 spec enables users to be tracked w/o their consent. As stated before, there are many, many research papers showing this is a problem, as well as many deployed examples in the wild. It is not the case that these papers find no problem in the absence of flash, the findings are either "not having flash degrades identifiability some, but its still identifying" or "we measured w/o flash, and find its highly identifying." It's also apparently serious enough that FF and Safari have deployed mitigations. |
@litherum can you say more about Safari's algorithm? How different is it from anonymity sets of "browser & OS & preferred language"? I'm not married to the specific mitigation in the issue text, as long as the standard includes a fix for the problem. Maybe Safari's approach is the way to go! |
In terms of the efficacy of font fingerprinting / entropy it exposes, this paper from INRIA is fairly interesting/helpful. They conducted a real world study of fingerprinting, including Javascript based font probing, and found that fonts were one of the top contributors to fingerprint-ability. |
One additional thoughts: a concern has been raised about the performance impact of downloading fonts; there's an interesting performance impact of JS based font fingerprinting -- it takes time/resources for a fingerprinting script to iterate through fonts to determine what a user has installed (for example fingerprintjs2 has searching for an extended list of fonts as an defaulted-off option because of the performance impact of doing so). |
@jasonanovak I don't think you can fairly compare performance impacts from malicious pages with performance impacts on normal usage. Blocking one fingerprinting script might just provoke the spyware to use another fingerprinting method with even worse performance impacts. If the primary concern was the performance impact of the current methods for figuring out which fonts a user has, the solution would be to create a proper API for doing so. |
Agreed — I'm not a fan of a calculus which concludes that fixing a common fingerprinting method has a performance cost on the basis that sites might decide to use a less-performant fingerprinting method instead. |
The performance cost of the fix is that people would end up downloading web fonts that they don't actually need (because they already have the font installed on their system). E.g., I have most common Google Fonts installed, and one of the reasons I did that was to cut down on web font downloads. If we prevent browsers from using those custom installed fonts, there will be a performance cost to me (more data usage and slower page loading) when visiting sites that use these fonts. How many people this will affect, and to what degree, I can't say. Some browsers give users the option to turn off web font downloads altogether, which would negate the performance impact but increase the impact on user experience. E.g., turning off web fonts might not be a good solution for people whose pre-installed system fonts don't offer a lot of choice for the languages/scripts they use. The performance impact of malicious scripts is a separate issue altogether. I was using the example of switching fingerprint methods to emphasize that we can't expect that fixing the fingerprinting vector will have a net performance benefit on malicious sites. Malicious sites generally don't care about user data plans. |
if there is a plan to introduce a |
I assume you read the explainer I linked? Note that this is also something we're actively working on and developing; it's far from complete so far.
A single unique font probably does, yeah. How do you expect a website to find that single unique font that the user has? If it's highly identifying, that means only a small number of people have it. So either the website is only targetting those handful of people and is thus testing only for that font (interesting case...) or they're testing lots of "unique" fonts to see which small bucket the user falls in. The latter is exactly what the Privacy Budget approach is intended to detect - spamming hundreds or thousands of local font requests looking for the one that highly identifies the user.
And as others have argued in this thread, users will be harmed by the suggestion to restrict local font access to solely system fonts. (And aren't currently harmed by Safari's actions due to the differences in user demographics between browsers.) We need to think about the balance of benefits, harms, and costs of mitigating those harms. As I argued in TPAC, and Chris Wilson and others at Google argued in their response to PING's charter discussion, the web is chock full of data that can be used for fingerprinting. Any attempt to reduce that, particularly any attempt with significant user-harmful side effects, needs to show that it'll actually reduce the fingerprinting surface to a usefully low level; going from 400 bits to 40 bits of identifying information achieves precisely nothing, since you only need 33 bits to uniquely identify every person on Earth. (And you really want to allow less than 20 bits, to ensure that people are "bucketed" together with at least several thousand others.) If the PING can show that the sum of their suggested mitigations will reduce fingerprinting surface to 20 bits or less, or at least that there's a believeable path to getting under that limit, and that performing all of those mitigations will not harm the web to such an extent that the attack surface just moves elsewhere (such as sites moving to native apps...), then great! That would be an ideal solution, because reducing information wholesale is typically far easier than trying to be clever! So far, the PING hasn't attempted to show that it's possible to do that. And so far, Chrome's security engineers don't believe it's possible to reasonably do an absolute fingerprinting reduction, either. Thus Privacy Budget, our attempt to dynamically enforce a pay-as-you-go budget that, hopefully, will let us prevent attacks (like scanning the user's local fonts) without harming legitimate uses (like using a handful of local fonts to actually render text). I think you should do more than dismiss Privacy Budget out-of-hand; it's a serious effort to actually solve fingerprinting across the entire web platform, not an attempt to deflect attention. The math is clear here: this isn't a problem that can be solved with band-aids, and even knowing if your efforts will achieve anything at all requires a serious analysis of the whole attack surface; standard defense-in-depth security intuitions don't apply, at least not with the current state of things. So, as Chris Wilson said, without a formal model showing that this change is part of a combined effort that will achieve a useful result, Chrome will continue to be against it, and will instead pursue methods like I described to achieve useful fingerprinting reduction. Harming users and webdevs for what is currently just a fig-leaf is not something we're interested in. |
For clarity: I'm not suggesting taking away choice from users. That exercising choice makes a user fingerprintable is just a fact and there's nothing for WGs to work out about it. (I think most of this issue is probably not a standard-setting one but a browser product decision one.) What I see as a problem is that what I believe to be substantial populations of users who don't need to exercise such choice and could be protected are not. It's not particularly nice to know that there are users who are cannot be protected, but that's not a good reason not to protect the substantial user population who could be protected. Consider the following types of users:
(The taxonomy is simplified: The most notable complication is the one seen upthread on Windows 10 with Chinese and Japanese: That there are fonts that are bundled with the system and that are conditionally enabled. For example, for someone in Japan, having the conditionally-enabled Japanese fonts enumerable probably isn't a substantial fingerprinting vector. For someone in Europe who has a Japanese IME in the text input menu, they are. For the purpose of the below paragraphs, I'm hand-waving conditionally-enabled system fonts into group 1.) Users in group 1 need no protection mechanisms compared to status quo. Evidently users in group 4 can change browser prefs and could uncheck whatever "don't expose user-installed fonts to the Web" checkbox to opt out of protection. Browsers cannot protect users in group 6 without developing all-encompassing font download mechanisms as part of the browser. Groups 2 and 3 could be protected but aren't. As a user in group 3 (previously in group 4), I'm unhappy that I'm not made indistinguishable from group 1. It should be within technical feasibility to do so without breaking use cases for groups 5, 6, and 7, but the details do need careful thought. In particular, it would be good to know what language communities are in group 5 and with what details (e.g. in the context of particular operating systems only or out of habit despite operating system font repertoire having improved). Group 4, as noted, will manage. |
No single change will get us to "win", but there are a lot of bits in fonts (see above linked papers), and so fixing font fingerprinting will be a necessary _but not sufficient- step to unbreaking things (where breaking means "allowing people to be tracked w/o their consent") Several of us in this thread are working on finding ways of fixing the standard. It might be the case that things can't be fully-un-broken, and we can't undo the harm that's been done for everyone, but we can at least stop harming as many people as possible. For the reasons many have mentioned, its difficult (but not impossible!). We would really appreciate your help :) |
That's the trick though:
|
No, we can't, because Privacy Budget is intended by Chrome's security folks to be the way to solve the problem discussed here. Put another way, I'll object to anything in this vein that attempts to standardize a Safari-like "spec mandates you must never expose more than this subset of local fonts" until my security folks tell me they've given up and that's the best way forward.
I addressed this in my preceding comments, and so has Florian. No one is saying "this one change won't fix fingerprinting, so let's not do it", so talking about that as an objection is a moot point. @frivoal said:
Yup, exactly. Again, this is not a situation where incremental progress is worthwhile; getting halfway to the goal has zero benefit. It needs to be shown that we can actually plausibly reach the goal before we start breaking things to move toward it.
Yup, and this is an important part of the analysis as well. Killing fingerprinting will bring benefits, but limiting a bunch of APIs will bring harms. Need to make sure the balance is worthwhile, but at the moment we have no idea what the set of harms we'll be looking at even are. |
That's a very relevant question. For there to be even change (leaving aside for the moment which changes are breakage), the Web author has to specify a font that that's not a system font and that the user has installed (or the user-installed font has to be the only fallback). These days the notion that for European languages a Web author would bet their design on the user having a particular non-system font installed as opposed to either being satisfied with known system fonts or providing a specific font via For what language communities can a Web author generally expect users to have a nameable but non-system-bundled font installed (group 5 in my previous comment) such that it's not pretty much the only font available for the script (in which case there'd be no need to name it)? As for the case where everyone in a language community has to install a font because their script isn't covered by system fonts at all (group 6 in my previous comment), the issue might be more solvable than it might seem. After my previous comment, I learned that macOS ships the long tail of Noto fonts but hides them from font pickers. It appears that Android ships the long tail of Noto, too. I gather Chrome OS does, too, but I didn't test. It doesn't seem unreasonable to give dead-script scholars a slightly worse out-of-the-box experience in order to protect the privacy of everyone else (i.e. require dead-script scholars to go to group 4). The unhinted fonts in the long tail (the tail not already covered by e.g. Windows 10, Ubuntu, and Fedora out-of-the-box) of Noto are remarkably small in terms of file size. The hidden Noto folder on macOS Mojave takes a mere 6.3 megabytes. Are there living scripts without any font out of the box on macOS, Android, and Chrome OS?
Performing that optimization requires enough understanding of how browsers work that it doesn't seem unreasonable to expect users who perform that optimization to also have the knowhow to flip a pref to opt out of the privacy protection. (Whereas I'm worried that for group 5 it might be less practical to require flipping a pref.)
Seems unreasonable to give up in the case users of more mainstream operating systems, including Linux distros, just because one can show that there exist Linux distros that leave so much configuration to the user that the notion of "default" isn't very meaningful. Debian is an outlier on these matters even as far as Linux distros go. Consider a Japanese IME. Fedora installs one out of the box even if you don't install Fedora in Japanese. Ubuntu installs one when you enable Japanese text input, which I assume happens out of the box if you install Ubuntu in Japanese, though I haven't tested. OpenSuSE gives you a Japanese IME if you install OpenSuSE in Japanese (but I failed to figure out how enable a Japanese IME from GUI on an English OpenSuSE installation). Debian doesn't give you a Japanese IME even if you install Debian in Japanese and choose a Japanese keyboard layout during installation!
Even if not intended for privacy reasons, do you consider this issue already incidentally solved on Chrome OS and Android by the combination of shipping Noto and it being hard for users to install fonts? |
Looks like despite all the recent activity to encode dead scripts in the Supplementary Multilingual Plane, there are still open proposals for living scripts to be encoded there, so there's still work to be done with living scripts. |
Japanese and Chinese (and maybe Korean? I'm not sure) come to mind, if we count fonts installed by office as non system fonts. Also if we count the windows 10 not-installed-by-default fonts as non system fonts. They'll be readable without that, but they won't be pretty.
Mongolian fails to display properly out of the box of macOS. Demo here: https://florian.rivoal.net/csswg/rare-lang/ I'm sure there's more. @r12a probably knows of some.
On an individual basis, I agree. In a community where you set up your system like your cousin/friend told you, because then you save tons of money, even if you don't know why, there could be a substantial number of people doing it without understanding. Sure, if the steps to follow change, the cousin/friend's message can change, but we're still living out in the cold people for who it used to work. Granted, this is a hypothetical, and I don't have first hand knowledge of this situation existing. But I wouldn't be surprised if it did. |
Yes, that came up upthread. It's an interesting case considering that in general it seems like an anti-feature that Web sites can tell what apps you have installed. Are there others?
How do the Windows 10 Chinese and Japanese font packs affect the role of Office fonts. (It seems that making a browser ignore the Windows 10 font packs that install as side effect of enabling a language would be unlikely to be well received by users.)
macOS ships a working (Noto) font for the Mongolian script. It also ships broken fonts, and a broken one takes precedence. I wonder to what extent issues with other scripts arise from bad default font selection order rather than lack of system-bundled fonts. |
Does anyone have a sense of how big the set of useful-to-have-local fonts is? Even if just rough estimate, are we talking 1, 2 or 3 digits of fonts? |
Useful is relative. Before GUIs, Latin-script computing got by with one monospace font... Currently, the full unhinted set of Noto fonts has 1605 fonts that take 1.5 GB of disk space uncompressed. It's useful to have even more: E.g. outside of Noto, Chrome OS ships fonts that are part of or compatible with the 1990s Microsoft core Web font set as well as a couple of common-on-Linux Indic fonts from the Lohit set (I have no idea why the particular ones and not all Lohit fonts), and an extra Hangul font. For sure it would be useful to have even more. (To begin with, even two weights for the scripts that now only have one.) But can some definition of useful be fewer? Is it useful to have more than two weights? Yes, but how much does the Web rely on more than two weights being available locally? Probably not much and more weights are probably mostly a Is it useful to have condensed versions? Yes, but again that's probably more of a If you take all Noto fonts, keep up to two weights, remove the UI variants, remove Display variants, and remove condensed variants, you are left with 195 fonts that unhinted take 550 MB of disk space uncompressed. (The numbers 1605 and 195 don't really mean anything, since some fonts cater to multiple scripts and others only to one, and either way they vary a lot in size per font.) But again, the long-tail delta that macOS needed to catch up with Noto coverage is just 6.3 MB. |
I think the conclusion we can make from this thread is that which fonts are useful really depends on user language & that if you created a union of all fonts that are essential to someone, you'd end up in the 100s — but most people wouldn't have most of those fonts installed. "Which fonts are useful to you" is itself a fingerprinting vector. That said, I think it may be productive to discuss how many fonts are reasonable for a given website to use, and can we limit it from that side (as Tab suggested with the privacy budget). Then, you'd might be able to cap it to a reasonable fingerprinting surface (<20, maybe?) while still addressing most use cases. I'd want to see some actual data, of course, on how many fonts per website are currently in use, trying to separate out the fingerprinting cases from the valid ones. One valid use case that wants access to dozens of system fonts is a document editor app, but that use case would be much better served with a dedicated |
@hsivonen understood, useful is relative :) but i'm just trying to get a sense of whether we can solve this problem by just creating a small (maybe, by accept-lang) fixed set of fonts that can be served as local fonts. Maybe thats a way of getting progress here (it would be similar to Apple's current approach, but a bit more flexible) @AmeliaBR I'm extremely skeptical about privacy budget / dynamic approaches (see list below, so that at least i've said it), but I'm not asking which fonts are used on the web, im asking which are really needed to prevent sites for breaking for people. Also, <20 local-fonts will still be uniquely identifying for many people. straw man proposal Why dynamic approaches will not solve this problem, partial list (at least w/o answers to the following)
I'm happy to discuss these dynamic approaches, but if folks want to advocate for it as a general solution to finger printing protections the CSS font issue queue probably isn't the best place. Similarly, if folks are suggesting privacy-budget like approaches instead of static solutions to the currently-harming-millions-of-people problem of font-fingerprinting, it would be useful to have questions to the above… |
(arriving late at this thread) Seems to me that beyond the specific discussion around fonts, there's an underlying meta question: should specific mitigations be baked into specifications? If we examine font fingerprinting as an example, it seems like this is a place where browsers' anti-fingerprinting efforts can develop extremely smart solutions to reduce the entropy exposed (e.g. block only "rare" fonts for some definition of rare, track scripts that enumerate them either offline or online - with or without a Privacy Budget, etc). Baking-in specific mitigations as MUST requirements seems like something that can stifle innovation that can benefits users on the privacy, performance and usability fronts. The fact that even proponents of baking-in mitigations are not sure which mitigations we want to bake-in emphasizes that point. The same is true for other specifications where the PING has tried to propose mitigations without any evidence as to why they'd actually help users effectively. That makes me believe that we need a wider discussion between the PING, TAG, and WG chairs & members to get a better understanding of the trade-offs between well-defined, frozen-in-time, formal mitigations and UA-defined ones. |
Does the part after the dash mean specific fonts by specific name or fonts that cover the use cases generally? That is, do you consider macOS, Android, and Chrome OS users (whose systems come with a use-case-wise remarkably wide set of fonts installed but not necessarily the fonts with Windows name recognition) to have "those fonts" installed?
The rarest fonts are the most problematic to block: Fonts for languages whose needs aren't addressed by system-bundled fonts. FWIW, as a user, I want relatively non-rare fonts to be blocked: Fonts that are common enough to be available via Google Fonts and Adobe Fonts or fonts that LibreOffice makes available system-wide on Windows. The combination of these may be distinctive and these are common enough that it makes sense for a tracker to try to probe for them. |
Here is another strawman proposal to try and move things forward: What if the standard didn't put any limitations on what the page could access as local fonts, but required local fonts to be specifically, intentionally loaded into the browser, instead of defaulting to any and all fonts it could find. That would seem to both solve the problem of some users wanting additional fonts, but also address significant, in the wild, problem of people being fingerprinted because of things they did on their system completely unrelated to their browser. WDYT? |
Entre parenthèses, just a couple of quick thoughts off the top of my head about using Noto Fonts that may be worth considering. This is not an exhaustive description of issues, but i wanted to make it clear that, especially for minority scripts, Noto fonts are not a panacea. The Noto Mongolian font has some issues. It's much better for readability to use MS Mongolian Baiti, or one of the more recent fonts provided by Mongolian font vendors. (If you want the gory details, see https://r12a.github.io/mongolian-variants/) Same goes for other scripts. For example, the initial Noto Javanese font was roundly rejected by the community as unreadable as well as aesthetically unattractive. Noto Sans Tai Tham font cannot be used for representing both Northern Thai and Tai Khün languages, since the glyph shapes needed for each language can be significantly different (although the code points are the same). The Noto fonts don't often offer the alternative variant glyphs that can be useful for certain languages, and tend to be provided, for example, by SIL fonts. For example, in my documents i use smart features in a SIL font to maintain a particular glyph shape for 'a' in certain italicised text and for phonetic text. Although Noto (unusually) provides both naskh and nasta'liq fonts for Arabic script, they don't provide ruq'a or kano font styles for arabic text. They also don't provide the mool or slanted font styles for Khmer which users might expect to be used for certain typographic situations. Note that often minority (and some non-minority) languages use different fonts/font styles to distinguish certain types of text from others, whereas in the west we'd use italicisation, bolding or font size. Noto fonts don't provide features aimed at assisting readability for beginner readers or the visually impaired, such as SIL's Andika. And then, of course, there are some living languages cropping up in recent Unicode releases for which there are no Noto fonts available. So Noto fonts may help to some extent, but there'll also be problems in relying on them, not least because international font usage is generally more complicated than we're used to for Latin-based text. |
I have been carefully re-reading the INRIA paper. It is notable that the study is restricted to 15 French websites (one weather page and one news page on each) so the comments by @hsivonen about European usage applies. In a French context, most users would be from France, Belgium, and then places with large francophone populations such as Morocco, Algeria, etc. So I would expect some font variability based on some uses having more Arabic fonts for example.The cultural and geographic homogeneity is confirmed in the paper: " 97.7% of fingerprints present French as their first language" and " 98% of users present the same value for timezone, which corresponds to CentralEuropean Time Zone UTC+01:0". On the plus side, the test subjects were normal web users not, for example, subject specialists such as privacy or internationalization researchers. In the study, font fingerprinting is the third-largest source of entropy on desktop/laptop machines, but the eighth-largest on mobile (phones, tablets, pads). I note that they only probed for 66 fonts (each in serif, sans-serif and monospace, which seems odd), because (unlike the Flash situation which instantly returns a complete font list) each font has to be probed for one by one, rendering a text string and measuring metrics. I would assume then that a site which tested many thousands of fonts would be awfully slow? Unless it was very interesting, and the content above the fold displayed quickly, allowing the tracking script to run while the user actually reads the content. The set of fonts probed for (to their credit, they give the complete list) seems drawn from Windows for the most part, with some from MacOS, without specific probings for Android or iOS. The fonts are primarily Latin fonts; no specifically Arabic, Chinese, or Japanese fonts were probed for example.
|
There is one more point not mentioned above: some users want to use some Chinese/Japanese fonts (like Jigmo) with Han characters encoded in recent years, but the system fonts don't support them and the webfonts are too large, so they want to be able to load them locally. Fingerprint protections are important, but giving users the option to load local fonts is also important. This issue is also somewhat related: w3c/typography#86 (comment) |
Font based finger printing is a common, privacy violating pattern, where websites build semi-identifiers based on uncommon fonts a user has installed. This semi-identifier is then combined with other semi-unique-identifiers (hardware configuration, user configuration, viewport size, etc) to build highly identifying values, used for tracking users.
Examples
Panopticlick includes a well know demonstration of how this can be done: https://panopticlick.eff.org
Fingerprint2.js is a popular library that uses font-based fingerprinting (among other signals) to identify users
Some browsers provide some defenses against this privacy violation. Safari, for example, only reports the default system fonts through Safari, and will not use other, uncommon fonts, even if they're installed on the OS. Firefox provides a similar option.
The standard should be modified to protect against / not allow font-based fingerprinting by default, instead of relying on non-standardized, vendor specific mitigations.
Suggested Mitigation
I suggest having the standard follow Safari's approach, and requiring browsers to only treat the default fonts on the platform as system fonts. A simple (though maybe not the best / most elegant) way of doing this would be to modify section 5.2 in "CSS Fonts Module Level 3" to modify the system font fallback procedure to only return the default platform fonts. Those might be specified per platform, or just as this list:
http://www.ampsoft.net/webdesign-l/WindowsMacFonts.html
The text was updated successfully, but these errors were encountered: