2024
pdf
bib
Proceedings of the Fifth Workshop on Privacy in Natural Language Processing
Ivan Habernal

Sepideh Ghanavati

Abhilasha Ravichander

Vijayanta Jain

Patricia Thaine

Timour Igamberdiev

Niloofar Mireshghallah

Oluwaseyi Feyisetan
Proceedings of the Fifth Workshop on Privacy in Natural Language Processing
2023
pdf
bib
abs
PrivacyPreserving Natural Language Processing
Ivan Habernal

Fatemehsadat Mireshghallah

Patricia Thaine

Sepideh Ghanavati

Oluwaseyi Feyisetan
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts
This cuttingedge tutorial will help the NLP community to get familiar with current research in privacypreserving methods. We will cover topics as diverse as membership inference, differential privacy, homomorphic encryption, or federated learning, all with typical applications to NLP. The goal is not only to draw the interest of the broader community, but also to present some typical usecases and potential pitfalls in applying privacypreserving methods to human language technologies.
2022
pdf
bib
Proceedings of the Fourth Workshop on Privacy in Natural Language Processing
Oluwaseyi Feyisetan

Sepideh Ghanavati

Patricia Thaine

Ivan Habernal

Fatemehsadat Mireshghallah
Proceedings of the Fourth Workshop on Privacy in Natural Language Processing
2021
pdf
bib
abs
Private Release of Text Embedding Vectors
Oluwaseyi Feyisetan

Shiva Kasiviswanathan
Proceedings of the First Workshop on Trustworthy Natural Language Processing
Ensuring strong theoretical privacy guarantees on text data is a challenging problem which is usually attained at the expense of utility. However, to improve the practicality of privacy preserving text analyses, it is essential to design algorithms that better optimize this tradeoff. To address this challenge, we propose a release mechanism that takes any (text) embedding vector as input and releases a corresponding private vector. The mechanism satisfies an extension of differential privacy to metric spaces. Our idea based on first randomly projecting the vectors to a lowerdimensional space and then adding noise in this projected space generates private vectors that achieve strong theoretical guarantees on its utility. We support our theoretical proofs with empirical experiments on multiple word embedding models and NLP datasets, achieving in some cases more than 10% gains over the existing stateoftheart privatization techniques.
pdf
bib
Proceedings of the Third Workshop on Privacy in Natural Language Processing
Oluwaseyi Feyisetan

Sepideh Ghanavati

Shervin Malmasi

Patricia Thaine
Proceedings of the Third Workshop on Privacy in Natural Language Processing
pdf
bib
abs
On a Utilitarian Approach to Privacy Preserving Text Generation
Zekun Xu

Abhinav Aggarwal

Oluwaseyi Feyisetan

Nathanael Teissier
Proceedings of the Third Workshop on Privacy in Natural Language Processing
Differentiallyprivate mechanisms for text generation typically add carefully calibrated noise to input words and use the nearest neighbor to the noised input as the output word. When the noise is small in magnitude, these mechanisms are susceptible to reconstruction of the original sensitive text. This is because the nearest neighbor to the noised input is likely to be the original input. To mitigate this empirical privacy risk, we propose a novel class of differentially private mechanisms that parameterizes the nearest neighbor selection criterion in traditional mechanisms. Motivated by Vickrey auction, where only the second highest price is revealed and the highest price is kept private, we balance the choice between the first and the second nearest neighbors in the proposed class of mechanisms using a tuning parameter. This parameter is selected by empirically solving a constrained optimization problem for maximizing utility, while maintaining the desired privacy guarantees. We argue that this empirical measurement framework can be used to align different mechanisms along a common benchmark for their privacyutility tradeoff, particularly when different distance metrics are used to calibrate the amount of noise added. Our experiments on real text classification datasets show up to 50% improvement in utility compared to the existing stateoftheart with the same empirical privacy guarantee.
2020
pdf
bib
Proceedings of the Second Workshop on Privacy in NLP
Oluwaseyi Feyisetan

Sepideh Ghanavati

Shervin Malmasi

Patricia Thaine
Proceedings of the Second Workshop on Privacy in NLP
pdf
bib
abs
On LogLoss Scores and (No) Privacy
Abhinav Aggarwal

Zekun Xu

Oluwaseyi Feyisetan

Nathanael Teissier
Proceedings of the Second Workshop on Privacy in NLP
A common metric for assessing the performance of binary classifiers is the LogLoss score, which is a real number indicating the cross entropy distance between the predicted distribution over the labels and the true distribution (a point distribution defined by the ground truth labels). In this paper, we show that a malicious modeler, upon obtaining access to the LogLoss scores on its predictions, can exploit this information to infer all the ground truth labels of arbitrary test datasets with full accuracy. We provide an efficient algorithm to perform this inference. A particularly interesting application where this attack can be exploited is to breach privacy in the setting of Membership Inference Attacks. These attacks exploit the vulnerabilities of exposing models trained on customer data to queries made by an adversary. Privacy auditing tools for measuring leakage from sensitive datasets assess the total privacy leakage based on the adversary’s predictions for datapoint membership. An instance of the proposed attack can hence, cause complete membership privacy breach, obviating any attack model training or access to side knowledge with the adversary. Moreover, our algorithm is agnostic to the model under attack and hence, enables perfect membership inference even for models that do not memorize or overfit. In particular, our observations provide insight into the extent of information leakage from statistical aggregates and how they can be exploited.
pdf
bib
abs
A Differentially Private Text Perturbation Method Using Regularized Mahalanobis Metric
Zekun Xu

Abhinav Aggarwal

Oluwaseyi Feyisetan

Nathanael Teissier
Proceedings of the Second Workshop on Privacy in NLP
Balancing the privacyutility tradeoff is a crucial requirement of many practical machine learning systems that deal with sensitive customer data. A popular approach for privacy preserving text analysis is noise injection, in which text data is first mapped into a continuous embedding space, perturbed by sampling a spherical noise from an appropriate distribution, and then projected back to the discrete vocabulary space. While this allows the perturbation to admit the required metric differential privacy, often the utility of downstream tasks modeled on this perturbed data is low because the spherical noise does not account for the variability in the density around different words in the embedding space. In particular, words in a sparse region are likely unchanged even when the noise scale is large. In this paper, we propose a text perturbation mechanism based on a carefully designed regularized variant of the Mahalanobis metric to overcome this problem. For any given noise scale, this metric adds an elliptical noise to account for the covariance structure in the embedding space. This heterogeneity in the noise scale along different directions helps ensure that the words in the sparse region have sufficient likelihood of replacement without sacrificing the overall utility. We provide a textperturbation algorithm based on this metric and formally prove its privacy guarantees. Additionally, we empirically show that our mechanism improves the privacy statistics to achieve the same level of utility as compared to the stateoftheart Laplace mechanism.