Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle

Whitehill, Jacob

Computer Science > Machine Learning

arXiv:1707.01825 (cs)

[Submitted on 6 Jul 2017]

Title:Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle

Authors:Jacob Whitehill

View PDF

Abstract:In the context of data-mining competitions (e.g., Kaggle, KDDCup, ILSVRC Challenge), we show how access to an oracle that reports a contestant's log-loss score on the test set can be exploited to deduce the ground-truth of some of the test examples. By applying this technique iteratively to batches of $m$ examples (for small $m$), all of the test labels can eventually be inferred. In this paper, (1) We demonstrate this attack on the first stage of a recent Kaggle competition (Intel & MobileODT Cancer Screening) and use it to achieve a log-loss of $0.00000$ (and thus attain a rank of #4 out of 848 contestants), without ever training a classifier to solve the actual task. (2) We prove an upper bound on the batch size $m$ as a function of the floating-point resolution of the probability estimates that the contestant submits for the labels. (3) We derive, and demonstrate in simulation, a more flexible attack that can be used even when the oracle reports the accuracy on an unknown (but fixed) subset of the test set's labels. These results underline the importance of evaluating contestants based only on test data that the oracle does not examine.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1707.01825 [cs.LG]
	(or arXiv:1707.01825v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1707.01825

Submission history

From: Jacob Whitehill [view email]
[v1] Thu, 6 Jul 2017 14:54:54 UTC (1,059 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2017-07

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jacob Whitehill

export BibTeX citation

Computer Science > Machine Learning

Title:Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators