CAREER:Exact Optimal and Data-Adaptive Algorithms and Tools for Differential Privacy
项目来源
项目主持人
项目受资助机构
财政年度
立项时间
项目编号
项目级别
研究期限
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
人员信息
机构信息
项目主管部门
项目官员
1.Provably Confidential Language Modelling
- 关键词:
- Computational linguistics ; Long short;term memory ; Modeling languages;And filters ; Differential privacies ; Language generation ; Language model ; Privacy information ; Social security numbers ; Training corpus ; Training data ; Training process
- ZhaoXuandong;LiLei;WangYu-Xiang
- 《2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022》
- 2022年
- July 10, 2022 - July 15, 2022
- Seattle, WA, United states
- 会议
Large language models are shown to memorize privacy information such as social security numbers in training data. Given the sheer scale of the training corpus, it is challenging to screen and filter all privacy data, either manually or automatically. In this paper, we propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We borrow ideas from differential privacy (which solves a related but distinct problem) and show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Moreover, we show that redaction with an approximately correct screening policy amplifies the confidentiality guarantee. We implement the method for both LSTM and GPT language models. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality. © 2022 Association for Computational Linguistics.
...2.Optimal Accounting of Differential Privacy via Characteristic Function
- Zhu, Yuqing;Dong, Jinshuo;Wang, Yu-Xiang
- 《International Conference on Artificial Intelligence and Statistics》
- 2022年
- MAR 28-30, 2022
- ELECTR NETWORK
- 会议
Characterizing the privacy degradation over compositions, i.e., privacy accounting, is a fundamental topic in differential privacy (DP) with many applications to differentially private machine learning and federated learning. We propose a unification of recent advances (Renyi DP, privacy profiles, f-DP and the PLD formalism) via the characteristic function (phi-function) of a certain dominating privacy loss random variable. We show that our approach allows natural adaptive composition like Renyi DP, provides exactly tight privacy accounting like PLD, and can be (often losslessly) converted to privacy profile and f-DP, thus providing (epsilon, delta)-DP guarantees and interpretable tradeoff functions. Algorithmically, we propose an analytical Fourier accountant that represents the complex logarithm of phi-functions symbolically and uses Gaussian quadrature for numerical computation. On several popular DP mechanisms and their subsampled counterparts, we demonstrate the flexibility and tightness of our approach in theory and experiments.
...3.Adaptive Private-K-Selection with Adaptive K and Application to Multi-label PATE
- Zhu, Yuqing;Wang, Yu-Xiang
- 《International Conference on Artificial Intelligence and Statistics》
- 2022年
- MAR 28-30, 2022
- ELECTR NETWORK
- 会议
We provide an end-to-end Renyi DP based-framework for differentially private top-k selection. Unlike previous approaches, which require a data-independent choice on k, we propose to privately release a data-dependent choice of k such that the gap between k-th and the (k + 1)st "quality" is large. This is achieved by a novel application of the Report-Noisy-Max. Not only does this eliminate one hyperparameter, the adaptive choice of k also certifies the stability of the top-k indices in the unordered set so we can release them using a variant of propose-test-release (PTR) without adding noise. We show that our construction improves the privacy-utility tradeoffs compared to the previous top-k selection algorithms theoretically and empirically. Additionally, we apply our algorithm to "Private Aggregation of Teacher Ensembles (PATE)" in multi-label classification tasks with a large number of labels and show that it leads to significant performance gains.
...4.Revisiting model-agnostic private learning: Faster rates and active learning
- 关键词:
- Artificial intelligence ; Learning systems;Active Learning ; Differential privacies ; Faster rates ; Learn+ ; Model;agnostic private learning ; Noise conditions ; Private aggregation of teacher ensemble ; Private aggregations ; Teachers' ; Tsybakov noise condition
- LiuChong;ZhuYuqing;ChaudhuriKamalika;WangYu-Xiang
- 《Journal of Machine Learning Research》
- 2021年
- 22卷
- 期
- 期刊
The Private Aggregation of Teacher Ensembles (PATE) framework is one of the most promising recent approaches in differentially private learning. Existing theoretical analysis shows that PATE consistently learns any VC-classes in the realizable setting, but falls short in explaining its success in more general cases where the error rate of the optimal classifier is bounded away from zero. We fill in this gap by introducing the Tsybakov Noise Condition (TNC) and establish stronger and more interpretable learning bounds. These bounds provide new insights into when PATE works and improve over existing results even in the narrower realizable setting. We also investigate the compelling idea of using active learning for saving privacy budget, and empirical studies show the effectiveness of this new idea. The novel components in the proofs include a more refined analysis of the majority voting classifier — which could be of independent interest — and an observation that the synthetic "student" learning problem is nearly realizable by construction under the Tsybakov noise condition. ©2021 Chong Liu, Yuqing Zhu, Kamalika Chaudhuri, and Yu-Xiang Wang.
...5.Privately Publishable Per-instance Privacy
- 关键词:
- ;Differential privacies;Empirical risks;Fine-grained analysis;Orders of magnitude;Privacy frameworks;Sensitive datas
- Redberg, Rachel;Wang, Yu-Xiang
- 《35th Conference on Neural Information Processing Systems, NeurIPS 2021》
- 2021年
- December 6, 2021 - December 14, 2021
- Virtual, Online
- 会议
We consider how to privately share the personalized privacy losses incurred by objective perturbation, using per-instance differential privacy (pDP). Standard differential privacy (DP) gives us a worst-case bound that might be orders of magnitude larger than the privacy loss to a particular individual relative to a fixed dataset. The pDP framework provides a more fine-grained analysis of the privacy guarantee to a target individual, but the per-instance privacy loss itself might be a function of sensitive data. In this paper, we analyze the per-instance privacy loss of releasing a private empirical risk minimizer learned via objective perturbation, and propose a group of methods to privately and accurately publish the pDP losses at little to no additional privacy cost. © 2021 Neural information processing systems foundation. All rights reserved.
...
