AID: Active distillation machine to leverage pre-trained black-box models in private data settings

Trong Nghia Hoang, Shenda Hong, Cao Xiao, Bryan Low, Jimeng Sun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents an active distillation method for a local institution (e.g., hospital) to find the best queries within its given budget to distill an on-server black-box model's predictive knowledge into a local surrogate with transparent parameterization. This allows local institutions to understand better the predictive reasoning of the black-box model in its own local context or to further customize the distilled knowledge with its private dataset that cannot be centralized and fed into the server model. The proposed method thus addresses several challenges of deploying machine learning (ML) in many industrial settings (e.g., healthcare analytics) with strong proprietary constraints. These include: (1) the opaqueness of the server model's architecture which prevents local users from understanding its predictive reasoning in their local data contexts; (2) the increasing cost and risk of uploading local data on the cloud for analysis; and (3) the need to customize the server model with private onsite data. We evaluated the proposed method on both benchmark and real-world healthcare data where significant improvements over existing local distillation methods were observed. A theoretical analysis of the proposed method is also presented.

Original languageEnglish (US)
Title of host publicationThe Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021
PublisherAssociation for Computing Machinery, Inc
Pages3569-3581
Number of pages13
ISBN (Electronic)9781450383127
DOIs
StatePublished - Apr 19 2021
Event2021 World Wide Web Conference, WWW 2021 - Ljubljana, Slovenia
Duration: Apr 19 2021Apr 23 2021

Publication series

NameThe Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021

Conference

Conference2021 World Wide Web Conference, WWW 2021
Country/TerritorySlovenia
CityLjubljana
Period4/19/214/23/21

Keywords

  • Deep learning
  • Disease risk prediction
  • Model distillation

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'AID: Active distillation machine to leverage pre-trained black-box models in private data settings'. Together they form a unique fingerprint.

Cite this