Hiding sensitive information when sharing distributed transactional data

Abhijeet Ghoshal, Jing Hao, Syam Menon, Sumit Sarkar

Research output: Contribution to journalArticlepeer-review

Abstract

Retailers have been sharing transactional data with supply chain partners for a long time to the benefit of all involved. However, many are still reluctant to share, and there is evidence that the extent of sharing would be greater if information sensitive to retailers is concealed before data are shared. Although there has been considerable research into methods to hide sensitive information from transactional data, extant research has focused only on sensitive information at the organizational level. This is rarely the case in reality; the retail industry has recognized and adapted their offerings to region-wide differences in customer tastes for decades, and when stores offer a mix of standardized and customized products, the differences in customer characteristics across regions lead to sensitive information that is region-specific in addition to sensitive information at the organizational level. To date, this version of the problem has been overlooked, and no effective methods exist to solve it; this paper fills that gap. Although some existing approaches can be adapted to this more realistic context, the existence of region-level requirements substantially increases the size of an already difficult (NP-hard) problem to be solved, making such adaptations impractical. Traditional decomposition-based approaches, such as Lagrangian relaxation, are not viable either as they require the repeated solution of NP-hard problems involving millions of variables multiple times. In this paper, we present an ensemble approach that draws intuition from Lagrangian relaxation to maximize the accuracy of a shared transactional dataset. Extensive computational experiments show that this approach not only identifies near-optimal solutions, it can do so even when other approaches fail. We also show that the precision of recommendations made using datasets that have been modified using the ensemble approach is not statistically different from that of recommendations made using the original datasets; this demonstrates that using the ensemble approach to hide sensitive information before sharing transactional data has negligible negative impact.

Original languageEnglish (US)
Pages (from-to)473-490
Number of pages18
JournalInformation Systems Research
Volume31
Issue number2
DOIs
StatePublished - Jun 2020

Keywords

  • Data quality
  • Ensemble
  • Itemset hiding
  • Lagrangian relaxation
  • Privacy

ASJC Scopus subject areas

  • Management Information Systems
  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Hiding sensitive information when sharing distributed transactional data'. Together they form a unique fingerprint.

Cite this