How Eyeo Worked With Students To Innovate On AI

Eyeo is putting the “learning” in machine learning.

Last year, it partnered with a university student initiative in Munich to have students find new approaches to using AI in its online ad filtering.

As the parent company of AdBlock and Adblock Plus, two of the most downloaded ad-blocking services on the market, eyeo has invested a lot of time and money into developing its own deep-learning methods for detecting and filtering ads.

But eyeo had yet to explore how generative modeling could be used to curate and maintain ad filtering lists, which is typically a tedious, error-prone, largely manual process that is difficult to scale.

Eyeo’s extensions rely heavily on filter lists, such as EasyList, that are maintained by a community of mostly unpaid volunteers and contain tens of thousands of rules for blocking and hiding certain network requests and the HTML code that’s responsible for ad rendering.

The ability to automatically update these lists with accuracy would be a major benefit, said Dr. Humera Noor Minhas, director of engineering at eyeo.

Automation station

Last year, TUM.ai, a student initiative within the Technical University of Munich, put out a call for proposals for its Moonshot competition, a hackathon-style project for students interested in pursuing a career in AI.

In response, eyeo submitted a proposal to have TUM.ai students tackle the challenge of developing an AI model that automatically classifies URLs and generates filter rules based on website content.

Eyeo’s proposal stood out due to “its real-world impact,” according to TUM.ai student advisor Thomas Wölkhart.

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

Daily Roundup

Daily News Roundup

Amazon Wraps Its First Upfront; DV360 Earns Less Than The Trade Desk

It also gave students the opportunity to acquire hands-on technical expertise and get direct feedback from eyeo engineers, he said.

See it to believe it

The six-week challenge ran from March to May 2023 and demonstrated how using generative AI could make ad filtering less onerous.

For instance, one student used OpenAI’s APIs to tweak the GPT-3 Ada model, producing an effective model that created automated webpage-based filter rules.

The student’s successful model opened eyeo’s eyes to generative AI’s real-world applications.

What students lack in experience, they often make up for in curiosity and fresh perspective.

But the challenge’s biggest boon for eyeo wasn’t actually the model itself, according to Minhas. It was the data set eyeo created for students to work with during the challenge, which contained more than 1 million ads.

To generate the data set for the challenge, eyeo first looked at open-source lists, like Alexa and Similarweb, to find the most-visited domains in different regions, according to Minhas. Then it gathered information from the HTML of those sites, including the headings, organic content and, crucially for eyeo’s purposes, the ads on the page.

Challenge participants worked with two data sets to develop their AI models: a training set and a test set. The training set included 100,000 pairs of webpages and corresponding filter rules, while the test set held 36,000 webpages.

By dividing data into two sets, students could check how well their models generalized data they hadn’t seen before, Wölkhart said, and the test set allowed eyeo to judge how closely the student-produced models matched the filters from the training set.

But the process also simulated how companies test model performance before rolling out a model to users – an important step, Wölkhart said, since “one wrong filter rule could break a whole website for thousands of users.”

Eyes ahead

Following the Moonshot project, eyeo went on to use the data set it generated for the challenge to train another model involving URL parameters.

Eyeo tested using URL parameters to detect if a certain portion of a page has ads or not, Minhas said.

URL parameters are query strings that append additional information to a basic web address to pass to a server, track ad campaigns and customize user experiences on a website.

This new URL-based classifier model achieved precision – a machine learning performance metric that measures a model’s accuracy – comparable to the solutions eyeo already has in production.

The company is working on a proof of concept and has identified use cases for the new model, such as automatically detecting buggy ad filters. “It has generalization ability that allows us to detect ads in unknown domains and provide a better user experience in ad filtering,” Minhas said.

Next up, eyeo has been researching how to detect ads served in AI chatbot experiences and filter them out if a user doesn’t want to see them anymore.

“The data set was a huge step for us to take our research forward,” Minhas said.

Must Read

Google filed a motion to exclude the testimony of any government witnesses who aren’t economists or antitrust experts during the upcoming ad tech antitrust trial starting on September 9.

Google antitrust trial

Google Is Fighting To Keep Ad Tech Execs Off the Stand In Its Upcoming Antitrust Trial

Google doesn’t want AppNexus founder Brian O’Kelley – you know, the godfather of programmatic – to testify during its ad tech antitrust trial starting on September 9.

ad fraud

How HUMAN Uncovered A Scam Serving 2.5 Billion Ads Per Day To Piracy Sites

Publishers trafficking in pirated movies, TV shows and games sold programmatic ads alongside this stolen content, while using domain cloaking to obscure the “cashout sites” where the ads actually ran.

In 2019, Google moved to a first-price auction and also ceded its last look advantage in AdX, in part because it had to. Most exchanges had already moved to first price.

Google antitrust trial

Thanks To The DOJ, We Now Know What Google Really Thought About Header Bidding

Starting last week and into this week, hundreds of court-filed documents have been unsealed in the lead-up to the Google ad tech antitrust trial – and it’s a bonanza.

CTV Roundup

Will Alternative TV Currencies Ever Be More Than A Nielsen Add-On?

Ever since Nielsen was dinged for undercounting TV viewers during the pandemic, its competitors have been fighting to convince buyers and sellers alike to adopt them as alternatives. And yet, some industry insiders argue that alt currencies weren’t ever meant to supplant Nielsen.

A comic depicting people in suits setting money on fire as a reference to incrementality: as in, don't set your money on fire!

Commerce Media

How Incrementality Tests Helped Newton Baby Ditch Branded Search

In the past year, Baby product and mattress brand Newton Baby has put all its media channels through a new testing regime for incrementality. It was a revelatory experience.

Colgate-Palmolive redesigned all of its consumer-facing sites and apps to serve as information hubs about its brands and make it easier to collect email addresses and other opted-in user data.

Marketers

Colgate-Palmolive’s First-Party Data Strategy Is A Study In Quality Over Quantity

Colgate-Palmolive redesigned all of its consumer-facing sites and apps to make it easier to collect opted-in first-party user data.

AdExchanger Daily

Daily Roundup

Related Stories

Must Read

Popular