Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event-level API: deduplication and priority #700

Open
alois-bissuel opened this issue Feb 10, 2023 · 5 comments
Open

Event-level API: deduplication and priority #700

alois-bissuel opened this issue Feb 10, 2023 · 5 comments
Labels
possible-future-enhancement Feature request with no current decision on adoption

Comments

@alois-bissuel
Copy link
Contributor

Hello,

I have a question regarding the interaction between deduplication and priority. It seems that deduplication is applied before the priority system is used (in other words, triggers are dropped at registration time if there exists one matched trigger to the source with the same deduplication, see step 7 of the attribution algorithm).

This creates an issue when trying to track a hierarchy of triggers where one wants to get all conversions (the top of the hierarchy) and only one other type of events, preferably the highest one, to attribute visits. In our case, we manage this by using a common deduplication key for all events except conversions (where dedup is also used to really deduplicate transactions which may be sent to us twice). We add a priority system to get the highest event in the hierarchy for visits.
This does not work as the deduplication is done first, and so any subsequent events (not conversions) will be deduplicated.

Would it be possible to run the deduplication after the priority system or at least not dropping deduplicated triggers if they have a higher priority than the one currently being matched to a source ?
Thanks a lot!

@csharrison
Copy link
Collaborator

This is a good question. It feels a bit conceptually wrong to me to have a deduplicated trigger kick out another trigger that's lower priority than it, although I can see the value it has especially with your usage of the dedup feature. The original purpose of the dedup was to avoid double counting, and in that case I suppose a true "duplicate" trigger would have the same priority.

It feels like the primitive you are asking for is something like a sub-grouping of triggers, which are independently capped, almost like a partitioned number of event-level reports field. This field is subtracted and processed after the priority system, and the priority system currently is only run if this field hits a max (10.9.15).

The way to handle this with dedup keys would be to alter 10.9.15 to something like: If we've hit maxAttributionsPerSource OR the dedup key matches any matching reports in the cache, run the prioritization algorithm. For dedup prioritization, only consider matching reports with the same dedup key.

cc @apasel422 @johnivdel for thoughts

@csharrison
Copy link
Collaborator

also @linnan-github

@alois-bissuel
Copy link
Contributor Author

I think the idea of capping triggers per sub-group was somehow hinted at in #278 in point 1 of your comment (this was the original source of our use of the dedup for what we are doing!).

I think such limit per sub-group could solve our present issue.

@csharrison
Copy link
Collaborator

I think such limit per sub-group could solve our present issue.

I agree. The main consideration will be how to design the API in such a way that interacts nicely (or completely subsumes) the dedup concept without introducing a lot more complexity. We'll need to think about it.

@johnivdel
Copy link
Collaborator

I think the flexibility of allowing prioritization within a deduplication key is useful. In the past, we have talked about using the dedup key as this primitive for only measuring "one conversion from a set of conversion types once", which is why we favored using a new key rather than the trying to do deduplication based on trigger data itself.

See some discussion here: https://github.com/WICG/attribution-reporting-api/blob/2867a961a2a7020ec2ff83812d5ba0501733b33a/meetings/2022-01-10-minutes.md#alo%C3%AFs-bissuel-priorities-and-reporting-window-in-the-event-level-api-for-clicks-issue-278

From a spec perspective, I think the easiest way to solve this would be the introduction of a new step which specifically applied priority to an existing deduped report, as we should be guaranteed there is only one in the cache at a given time.

One idea would be adding new substeps to 10.9.7, along the lines of this psuedo-code:

1. Let |report| be a report in the [event-level report cache] whose dedup key is |x| and source identifier is |y|, or null.
2. If |report|'s trigger priority is less than the new priority, replace it.

Something like what Charlie mentioned above probably scales better with any potential changes to the priority system (but needs to resolve some of the early exit logic).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
possible-future-enhancement Feature request with no current decision on adoption
Projects
None yet
Development

No branches or pull requests

3 participants