Language models (LMs) today are widely used (1) in personalized contexts and (2) to build agents that can use additional tools. But do they respect privacy when helping with daily tasks like emailing?

While many studies have investigated LMs memorizing training data, a lot of private data or sensitive information is actually exposed to LMs at inference time (e.g., users allowing LMs to use content retrieved from their mailbox). To quantify the privacy norm awareness of LMs and their emerging inference-time privacy risks, in our paper accepted to NeurIPS 2024 D&B track, we propose PrivacyLens, a novel framework that extends privacy-sensitive seeds into vignettes and further into agent trajectories to enable multi-level evaluation of privacy leakage in LM agent’s actions.

Using our framework, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions. While GPT-4 and Claude-3-Sonnet answer nearly all questions correctly, they leak information in 26% and 38% of cases. In our paper, we explore the impact of prompting. Unfortunately, simple prompt engineering does little to mitigate privacy leakage of LM agents’ actions. We also examine the safety-helpfulness trade-off and conduct qualitative analysis to uncover more insights. Unfortunately, current language models have yet to fully occupy the crucial space that hits both safety and helpfulness.

Our paper, dataset, and code can be found at https://salt-nlp.github.io/PrivacyLens/.