Classifier timeline and white paper #64

fmingyan · 2022-04-22T18:36:38Z

Could you provide some guidance on when the classifier weights will become public? Will there be a white paper on the classifier similar to what was published for FLoC?

jkarlin · 2022-04-25T13:22:04Z

The model is currently public in the sense that it's shipped to canary/dev/beta browsers that have enabled the API (e.g., via enabling Privacy Sandbox Ads APIs in chrome://flags). It's in your config directory under OptimizationGuidePredictionModels/ as a tflite model. The directory names are cryptic, but you're in the right directory if there is a override_list.pb.gz file in there as well. That override list makes it possible to override the model's output for domains. We currently have the top 10,000 domains manually labeled and placed in that override list.

I do expect that we'll have an updated website with more details.

Note that we also intend to make a chrome://topics-internals page for helping to debug topics data. One section of the internals page will include a tool to let developers manually query the model.

fmingyan · 2022-04-28T22:59:20Z

Thank you, could you also guide on where to find the top 10k domains currently used and how it can be mapped to model input?

leeronisrael · 2022-05-25T14:59:20Z

You can find the path to the model file in the "Classifier" tab of chrome://topics-internals/ page (see docs here). The top 10k domains currently used are in the override_list.pb.gz file found in the same directory as the model above. The domain to topics associations in the list are utilized by the API in lieu of the output of the model itself.

To run the model directly, refer to documentation here: https://www.tensorflow.org/lite/guide/inference#running_a_model (Also see: https://www.tensorflow.org/learn)

To inspect the override_list.pb.gz file:

Unpack it: gunzip -c override_list.pb.gz > override_list.pb
Use protoc to inspect: protoc --decode_raw < override_list.pb > output.txt
Also see: Taxonomy of topics with IDs: https://github.com/patcg-individual-drafts/topics/blob/main/taxonomy_v1.md

nTastevin · 2022-06-15T13:10:51Z

Thank you for the details above.

According to the model get_input_details() method, the model seems to need this tensors in input:

'input_ids_1': array([ 1, 128], dtype=int32)
'input_mask_1': array([ 1, 128], dtype=int32)
'token_type_ids_1': array([ 1, 128], dtype=int32)

Could you, in order to run the model locally, provide some guidelines on the way you preprocess domain strings to convert it in input tensor values compatible with the model?

jkarlin · 2022-07-13T19:28:42Z

I'm not an expert on this, but looking at https://source.chromium.org/chromium/chromium/src/+/main:third_party/tflite_support/src/tensorflow_lite_support/cc/task/processor/bert_preprocessor.cc;l=117;drc=a6fe0210768868959ac7e8d0e04eaf771e83e524;bpv=1;bpt=1 it appears that the input mask is always set to 1 and the type id is the same as the input id.

You may also find #79 (comment) helpful.

stguav mentioned this issue Apr 27, 2022

Provide Topics API for not adding current page's topics #54

Closed

jkarlin closed this as completed Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classifier timeline and white paper #64

Classifier timeline and white paper #64

fmingyan commented Apr 22, 2022

jkarlin commented Apr 25, 2022

fmingyan commented Apr 28, 2022

leeronisrael commented May 25, 2022

nTastevin commented Jun 15, 2022

jkarlin commented Jul 13, 2022

Classifier timeline and white paper #64

Classifier timeline and white paper #64

Comments

fmingyan commented Apr 22, 2022

jkarlin commented Apr 25, 2022

fmingyan commented Apr 28, 2022

leeronisrael commented May 25, 2022

nTastevin commented Jun 15, 2022

jkarlin commented Jul 13, 2022