TY - GEN
T1 - Private DNA Sequencing
T2 - 2020 IEEE Information Theory Workshop, ITW 2020
AU - Mazooji, Kayvon
AU - Dong, Roy
AU - Shomorony, Ilan
N1 - Publisher Copyright:
©2021 IEEE
PY - 2021/4/11
Y1 - 2021/4/11
N2 - When an individual’s DNA is sequenced, sensitive medical information becomes available to the sequencing laboratory. A recently proposed way to hide an individual’s genetic information is to mix in DNA samples of other individuals. We assume these samples are known to the individual but unknown to the sequencing laboratory. Thus, these DNA samples act as “noise” to the sequencing laboratory, but still allow the individual to recover their own DNA samples afterward. Motivated by this idea, we study the problem of hiding a binary random variable X (a genetic marker) with the additive noise provided by mixing DNA samples, using mutual information as a privacy metric. This is equivalent to the problem of finding a worst-case noise distribution for recovering X from the noisy observation among a set of feasible discrete distributions. We characterize upper and lower bounds to the solution of this problem, which are empirically shown to be very close. The lower bound is obtained through a convex relaxation of the original discrete optimization problem, and yields a closed-form expression. The upper bound is computed via a greedy algorithm for selecting the mixing proportions.
AB - When an individual’s DNA is sequenced, sensitive medical information becomes available to the sequencing laboratory. A recently proposed way to hide an individual’s genetic information is to mix in DNA samples of other individuals. We assume these samples are known to the individual but unknown to the sequencing laboratory. Thus, these DNA samples act as “noise” to the sequencing laboratory, but still allow the individual to recover their own DNA samples afterward. Motivated by this idea, we study the problem of hiding a binary random variable X (a genetic marker) with the additive noise provided by mixing DNA samples, using mutual information as a privacy metric. This is equivalent to the problem of finding a worst-case noise distribution for recovering X from the noisy observation among a set of feasible discrete distributions. We characterize upper and lower bounds to the solution of this problem, which are empirically shown to be very close. The lower bound is obtained through a convex relaxation of the original discrete optimization problem, and yields a closed-form expression. The upper bound is computed via a greedy algorithm for selecting the mixing proportions.
KW - Additive discrete noise
KW - DNA sequencing
KW - Genetic privacy
KW - Worst-case noise distribution
UR - http://www.scopus.com/inward/record.url?scp=85113314484&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113314484&partnerID=8YFLogxK
U2 - 10.1109/ITW46852.2021.9457681
DO - 10.1109/ITW46852.2021.9457681
M3 - Conference contribution
AN - SCOPUS:85113314484
T3 - 2020 IEEE Information Theory Workshop, ITW 2020
BT - 2020 IEEE Information Theory Workshop, ITW 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 April 2021 through 15 April 2021
ER -