CAREER:Exact Optimal and ... - Yu-Xiang W... - 美国国家科学基金(NSF...

CAREER:Exact Optimal and Data-Adaptive Algorithms and Tools for Differential Privacy

项目来源

美国国家科学基金(NSF)

项目主持人

Yu-Xiang Wang

项目受资助机构

UNIVERSITY OF CALIFORNIA SAN DIEGO

财政年度

2025,2020

立项时间

未公开

项目编号

2503856

项目级别

国家级

研究期限

未知 / 未知

受资助金额

509630.00美元

学科

未公开

学科代码

未公开

基金类别

Continuing grant

关键词

Secure&Trustworthy Cyberspace ; SaTC:Secure and Trustworthy Cyberspace ; CAREER-Faculty Erly Career Dev

参与者

未公开

参与机构

UNIVERSITY OF CALIFORNIA,SAN DIEGO

项目标书摘要：This project is motivated by the increasing public concerns on privacy issues,new legislations and the high demand for privacy enhancing technologies such as differential privacy(DP)in applications from both private and public sectors.The overarching theme of the project is to address the pressing new challenges that arise as differential privacy transforms from a theoretical construct into a practical technology.The project advances the state-of-the-art of research in the area of DP,and contributes to privacy education.On the research front,the project develops new algorithms and analytical tools that enable more precise privacy accounting and higher utility in DP.On the education front,the project involves training future leaders in DP areas,creating educational materials and expanding an open-source software library called autodp that makes state-of-the-art differentially private computation more accessible.Collectively,the integrated research and educational activities contribute to ongoing collaborative efforts in building innovative applications of differential privacy.The project has three main components in use-inspired fundamental research.The first component unifies the recent breakthroughs in DP,such as,Renyi DP,moments accountant,f-DP and produce an intermediate functional representation that allows lossless conversions among these representations.The second component focuses on investigating the stronger privacy properties permitted by the structures of the actual data,and addressing the dilemma of interpreting worst-case privacy on average-case data.The third component focuses on using a public dataset to``denoise''the private data releases or to facilitate private machine learning.The outputs of the research will be broadly shared through integration in autodp library,and will be integrated in courses.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人员信息

Yu-Xiang Wang(Principal Investigator)：yuxiangw@ucsd.edu；

机构信息

【University of California-San Diego(Performance Institution)】StreetAddress：9500 GILMAN DR,LA JOLLA,California,United States/ZipCode：920930934；【UNIVERSITY OF CALIFORNIA,SAN DIEGO】StreetAddress：9500 GILMAN DR,LA JOLLA,California,United States/PhoneNumber：8585344896/ZipCode：920930021；

项目主管部门

Directorate for Computer and Information Science and Engineering(CSE)-Division Of Computer and Network Systems(CNS)

项目官员

Xiaogang(Cliff)Wang(Email：xiawang@nsf.gov；Phone：7032922812)

排序方式：时间相关性
显示方式：列表摘要

1.Provably Confidential Language Modelling

关键词：
Computational linguistics ; Long short;term memory ; Modeling languages;And filters ; Differential privacies ; Language generation ; Language model ; Privacy information ; Social security numbers ; Training corpus ; Training data ; Training process

ZhaoXuandong;LiLei;WangYu-Xiang
《2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022》
2022年
July 10, 2022 - July 15, 2022
Seattle, WA, United states
会议

Large language models are shown to memorize privacy information such as social security numbers in training data. Given the sheer scale of the training corpus, it is challenging to screen and filter all privacy data, either manually or automatically. In this paper, we propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We borrow ideas from differential privacy (which solves a related but distinct problem) and show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Moreover, we show that redaction with an approximately correct screening policy amplifies the confidentiality guarantee. We implement the method for both LSTM and GPT language models. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality. © 2022 Association for Computational Linguistics.

...

2.Optimal Accounting of Differential Privacy via Characteristic Function

Zhu, Yuqing;Dong, Jinshuo;Wang, Yu-Xiang
《International Conference on Artificial Intelligence and Statistics》
2022年
MAR 28-30, 2022
ELECTR NETWORK
会议

Characterizing the privacy degradation over compositions, i.e., privacy accounting, is a fundamental topic in differential privacy (DP) with many applications to differentially private machine learning and federated learning. We propose a unification of recent advances (Renyi DP, privacy profiles, f-DP and the PLD formalism) via the characteristic function (phi-function) of a certain dominating privacy loss random variable. We show that our approach allows natural adaptive composition like Renyi DP, provides exactly tight privacy accounting like PLD, and can be (often losslessly) converted to privacy profile and f-DP, thus providing (epsilon, delta)-DP guarantees and interpretable tradeoff functions. Algorithmically, we propose an analytical Fourier accountant that represents the complex logarithm of phi-functions symbolically and uses Gaussian quadrature for numerical computation. On several popular DP mechanisms and their subsampled counterparts, we demonstrate the flexibility and tightness of our approach in theory and experiments.

...

3.Adaptive Private-K-Selection with Adaptive K and Application to Multi-label PATE

Zhu, Yuqing;Wang, Yu-Xiang
《International Conference on Artificial Intelligence and Statistics》
2022年
MAR 28-30, 2022
ELECTR NETWORK
会议

We provide an end-to-end Renyi DP based-framework for differentially private top-k selection. Unlike previous approaches, which require a data-independent choice on k, we propose to privately release a data-dependent choice of k such that the gap between k-th and the (k + 1)st "quality" is large. This is achieved by a novel application of the Report-Noisy-Max. Not only does this eliminate one hyperparameter, the adaptive choice of k also certifies the stability of the top-k indices in the unordered set so we can release them using a variant of propose-test-release (PTR) without adding noise. We show that our construction improves the privacy-utility tradeoffs compared to the previous top-k selection algorithms theoretically and empirically. Additionally, we apply our algorithm to "Private Aggregation of Teacher Ensembles (PATE)" in multi-label classification tasks with a large number of labels and show that it leads to significant performance gains.

...

4.Revisiting model-agnostic private learning: Faster rates and active learning

关键词：
Artificial intelligence ; Learning systems;Active Learning ; Differential privacies ; Faster rates ; Learn+ ; Model;agnostic private learning ; Noise conditions ; Private aggregation of teacher ensemble ; Private aggregations ; Teachers' ; Tsybakov noise condition

LiuChong;ZhuYuqing;ChaudhuriKamalika;WangYu-Xiang
《Journal of Machine Learning Research》
2021年
22卷
期
期刊

The Private Aggregation of Teacher Ensembles (PATE) framework is one of the most promising recent approaches in differentially private learning. Existing theoretical analysis shows that PATE consistently learns any VC-classes in the realizable setting, but falls short in explaining its success in more general cases where the error rate of the optimal classifier is bounded away from zero. We fill in this gap by introducing the Tsybakov Noise Condition (TNC) and establish stronger and more interpretable learning bounds. These bounds provide new insights into when PATE works and improve over existing results even in the narrower realizable setting. We also investigate the compelling idea of using active learning for saving privacy budget, and empirical studies show the effectiveness of this new idea. The novel components in the proofs include a more refined analysis of the majority voting classifier — which could be of independent interest — and an observation that the synthetic "student" learning problem is nearly realizable by construction under the Tsybakov noise condition. ©2021 Chong Liu, Yuqing Zhu, Kamalika Chaudhuri, and Yu-Xiang Wang.

...

5.Privately Publishable Per-instance Privacy

关键词：
;Differential privacies;Empirical risks;Fine-grained analysis;Orders of magnitude;Privacy frameworks;Sensitive datas

Redberg, Rachel;Wang, Yu-Xiang
《35th Conference on Neural Information Processing Systems, NeurIPS 2021》
2021年
December 6, 2021 - December 14, 2021
Virtual, Online
会议

We consider how to privately share the personalized privacy losses incurred by objective perturbation, using per-instance differential privacy (pDP). Standard differential privacy (DP) gives us a worst-case bound that might be orders of magnitude larger than the privacy loss to a particular individual relative to a fixed dataset. The pDP framework provides a more fine-grained analysis of the privacy guarantee to a target individual, but the per-instance privacy loss itself might be a function of sensitive data. In this paper, we analyze the per-instance privacy loss of releasing a private empirical risk minimizer learned via objective perturbation, and propose a group of methods to privately and accurately publish the pDP losses at little to no additional privacy cost. © 2021 Neural information processing systems foundation. All rights reserved.

...

排序方式：时间相关性
显示方式：列表摘要