A Hypothesis Testing Approach to Sharing Logs with Confidence

Yunhui Long, Le Xu, Carl A. Gunter

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Logs generated by systems and applications contain a wide variety of heterogeneous information that is important for performance profiling, failure detection, and security analysis. There is a strong need for sharing the logs among different parties to outsource the analysis or to improve system and security research. However, sharing logs may inadvertently leak confidential or proprietary information. Besides sensitive information that is directly saved in logs, such as user-identifiers and software versions, indirect evidence like performance metrics can also lead to the leakage of sensitive information about the physical machines and the system. In this work, we introduce a game-based definition of the risk of exposing sensitive information through released logs. We propose log indistinguishability, a property that is met only when the logs leak little information about the protected sensitive attributes. We design an end-to-end framework that allows a user to identify risk of information leakage in logs, to protect the exposure with log redaction and obfuscation, and to release the logs with a much lower risk of exposing the sensitive attribute. Our framework contains a set of statistical tests to identify violations of the log indistinguishability property and a variety of obfuscation methods to prevent the leakage of sensitive information. The framework views the log-generating process as a black-box and can therefore be applied to different systems and processes. We perform case studies on two different types of log datasets: Spark event log and hardware counters. We show that our framework is effective in preventing the leakage of the sensitive attribute with a reasonable testing time and an acceptable utility loss in logs.

Original languageEnglish (US)
Title of host publicationCODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy
PublisherAssociation for Computing Machinery
Pages307-318
Number of pages12
ISBN (Electronic)9781450371070
DOIs
StatePublished - Mar 16 2020
Event10th ACM Conference on Data and Application Security and Privacy, CODASPY 2020 - New Orleans, United States
Duration: Mar 16 2020Mar 18 2020

Publication series

NameCODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy

Conference

Conference10th ACM Conference on Data and Application Security and Privacy, CODASPY 2020
Country/TerritoryUnited States
CityNew Orleans
Period3/16/203/18/20

Keywords

  • hypothesis test
  • indistinguishability
  • log obfuscation
  • privacy

ASJC Scopus subject areas

  • Software
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A Hypothesis Testing Approach to Sharing Logs with Confidence'. Together they form a unique fingerprint.

Cite this