Pasin Manurangsi
Authored Publications
Sort By
Preview abstract
In this work, we study the task of estimating the numbers of distinct and k-occurring items in a time window under the constraint of differential privacy (DP). We consider several variants depending on whether the queries are on general time windows (between times t1 and t2), or are restricted to being cumulative (between times 1 and t2), and depending on whether the DP neighboring relation is event-level or the more stringent item-level. We obtain nearly tight upper and lower bounds on the errors of DP algorithms for these problems. En route, we obtain an event-level DP algorithm for estimating, at each time step, the number of distinct items seen over the last W updates with error polylogarithmic in W; this answers an open question of Bolot et al. (ICDT 2013).
View details
Differentially Private All-Pairs Shortest Path Distances: Improved Algorithms and Lower Bounds
Jelani Osei Nelson
Justin Y. Chen
Shyam Narayanan
Yinzhan Xu
SODA 2023 (to appear)
Preview abstract
We study the problem of releasing the weights of all-pairs shortest paths in a weighted undirected graph with differential privacy (DP). In this setting, the underlying graph is fixed and two graphs are neighbors if their edge weights differ by at most 1 in the ℓ1-distance. We give an algorithm with additive error ̃O(n^2/3/ε) in the ε-DP case and an algorithm with additive error ̃O(√n/ε) in the (ε, δ)-DP case, where n denotes the number of vertices. This positively answers a question of Sealfon [Sea16, Sea20], who asked whether a o(n) error algorithm exists. We also show that an additive error of Ω(n1/6) is necessary for any sufficiently small ε, δ > 0.
Furthermore, we show that if the graph is promised to have reasonably bounded weights, one can improve the error further to roughly n^{(√17−3)/2+o(1)}/ε in the ε-DP case and roughly n^{√2−1+o(1)}/ε in the (ε, δ)-DP case. Previously, it was only known how to obtain ̃O(n2/3/ε1/3) additive error in the ε-DP case and ̃O(√n/ε) additive error in the (ε, δ)-DP case for bounded-weight graphs [Sea16].
Finally, we consider a relaxation where a multiplicative approximation is allowed. We show that, with a multiplicative approximation factor k, the additive error can be reduced to ̃O(n^{1/2+O(1/k)}/ε) in the ε-DP case and ̃O(n^{1/3+O(1/k)}/ε) in the (ε, δ)-DP case.
View details
Preview abstract
Fairness and privacy are two important concerns in social decision-making processes such as resource allocation. We study privacy in the fair allocation of indivisible resources using the well-established framework of differential privacy. We present algorithms for approximate envy-freeness and proportionality when two instances are considered to be adjacent if they differ only on the utility of a single agent for a single item. On the other hand, we provide strong negative results for both fairness criteria when the adjacency notion allows the entire utility function of a single agent to change.
View details
Preview abstract
We consider the task of producing heatmaps from users' aggregated data while protecting their privacy. We give a differentially private algorithm for this task and demonstrate its advantages over previous algorithms on several real-world datasets.
Our core algorithmic primitive is a differentially private procedure that takes in a set of distributions and produces an output that is close in Earth Mover's Distance (EMD) to the average of the inputs. We prove theoretical bounds on the error of our algorithm under certain sparsity assumption and that these are essentially optimal.
View details
Preview abstract
We study the fine-grained complexity of the famous $k$-center problem in the metric induced by a graph with $n$ vertices and $m$ edges. The problem is NP-hard to approximate within a factor strictly better than $2$, and several $2$-approximation algorithms are known.
Two of the most well-known approaches for the $2$-approximation are (1) finding a maximal distance $r$-independent set (where the minimum pairwise distance is greater than $r$) and (2) Gonzalez's algorithm that iteratively adds the center farthest from the currently chosen centers.
For the approach based on distance-$r$ independent sets, Thorup [SIAM J. Comput. '05] already gave a nearly linear time algorithm. While Thorup's algorithm is not complicated, it still requires tools such as an approximate oracle for neighborhood size by Cohen [J. Comput. Syst. Sci. '97]. Our main result is a nearly straightforward algorithm that improves the running time by an $O(\log n$) factor. It results in an $(2+\eps)$-approximation for $k$-center in $O((m + n \log n)\log n \log(n/\eps))$ time.
For Gonzalez's algorithm [Theor. Comput. Sci. 85], we show that the simple $\widetilde{O}(mk)$-time implementation is nearly optimal if we insist the {\em exact} implementation. On the other hand, we show that an $(1+\eps)$-approximate version of the algorithm is efficiently implementable, leading to an $(2+\eps)$-approximation algorithm in running time $O((m + n \log n)\log^2 n / \eps)$. We also show that, unlike in the distance $r$-independent set-based algorithm, the dependency of $1/\eps$ in the running time is essentially optimal for $(1 + \eps)$-approximate Gonzalez's.
View details
Preview abstract
Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed.
In this work, we consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this framework, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person (i.e., all the attributes).
View details
Leveraging Bias-Variance Trade-offs for Regression with Label Differential Privacy
Ashwinkumar Badanidiyuru Varadaraja
Avinash Varadarajan
Chiyuan Zhang
Ethan Leeman
Pritish Kamath
NeurIPS 2023 (2023)
Preview abstract
We propose a new family of label randomization mechanisms for the task of training regression models under the constraint of label differential privacy (DP). In particular, we leverage the trade-offs between bias and variance to construct better noising mechanisms depending on a privately estimated prior distribution over the labels. We demonstrate that these mechanisms achieve state-of-the-art privacy-accuracy trade-offs on several datasets, highlighting the importance of bias-reducing constraints when training neural networks with label DP. We also provide theoretical results shedding light on the structural properties of the optimal bias-reduced mechanisms.
View details
Improved Inapproximability of VC Dimension and Littlestone’s Dimension via (Unbalanced) Biclique
ITCS 2023 (to appear)
Preview abstract
We study the complexity of computing (and approximating) VC Dimension and Littlestone's Dimension when we are given the concept class explicitly. We give a simple reduction from Maximum (Unbalanced) Biclique problem to approximating VC Dimension and Littlestone's Dimension. With this connection, we derive a range of hardness of approximation results and running time lower bounds. For example, under the (randomized) Gap-Exponential Time Hypothesis or the Strongish Planted Clique Hypothesis, we show a tight inapproximability result: both dimensions are hard to approximate to within a factor of o(log n) in polynomial-time. These improve upon constant-factor inapproximability results from [Manurangsi and Rubinstein, COLT 2017].
View details
Preview abstract
Knockout tournaments constitute a popular format for organizing sports competitions. While prior results have shown that it is often possible to manipulate a knockout tournament by fixing the bracket, these results ignore the prevalent aspect of player seeds, which can significantly constrain the chosen bracket. We show that certain structural conditions that guarantee that a player can win a knockout tournament without seeds are no longer sufficient in light of seed constraints. On the other hand, we prove that when the pairwise match outcomes are generated randomly, all players are still likely to be knockout winners under the same probability threshold with seeds as without seeds. In addition, we investigate the complexity of deciding whether a manipulation is possible when seeds are present.
View details
Preview abstract
We study the complexity of PAC learning halfspaces in the presence of Massart noise. In this problem, we are given i.i.d. labeled examples (x,y)∈ℝN×{±1}, where the distribution of x is arbitrary and the label y is a Massart corruption of f(x), for an unknown halfspace f:ℝN→{±1}, with flipping probability η(x)≤η<1/2. The goal of the learner is to compute a hypothesis with small 0-1 error. Our main result is the first computational hardness result for this learning problem. Specifically, assuming the (widely believed) subexponential-time hardness of the Learning with Errors (LWE) problem, we show that no polynomial-time Massart halfspace learner can achieve error better than Ω(η), even if the optimal 0-1 error is small, namely OPT=2^{−log^c(N)} for any universal constant c∈(0,1). Prior work had provided qualitatively similar evidence of hardness in the Statistical Query model. Our computational hardness result essentially resolves the polynomial PAC learnability of Massart halfspaces, by showing that known efficient learning algorithms for the problem are nearly best possible.
View details