TY - CONF
T1 - Are Large Pre-Trained Language Models Leaking Your Personal Information?
AU - Huang, Jie
AU - Shao, Hanyin
AU - Chang, Kevin Chen Chuan
N1 - We thank the reviewers for their constructive feedback. This material is based upon work supported by the National Science Foundation IIS 16-19302 and IIS 16-33755, Zhejiang University ZJU Research 083650, IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) – a research collaboration as part of the IBM Cognitive Horizon Network, grants from eBay and Microsoft Azure, UIUC OVCR CCIL Planning Grant 434S34, UIUC CSBS Small Grant 434C8U, and UIUC New Frontiers Initiative. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the funding agencies.
PY - 2022
Y1 - 2022
N2 - Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner's name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.
AB - Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner's name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.
UR - http://www.scopus.com/inward/record.url?scp=85144996821&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144996821&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.findings-emnlp.382
DO - 10.18653/v1/2022.findings-emnlp.382
M3 - Paper
AN - SCOPUS:85144996821
SP - 2038
EP - 2047
T2 - 2022 Findings of the Association for Computational Linguistics: EMNLP 2022
Y2 - 7 December 2022 through 11 December 2022
ER -